TWO MODIFIED CONJUGATE GRADIENT METHODS FOR SOLVING UNCONSTRAINED OPTIMIZATION AND APPLICATION

. Conjugate gradient methods are a popular class of iterative methods for solving linear systems of equations and nonlinear optimization problems as they do not require the storage of any matrices. In order to obtain a theoretically effective and numerically efficient method, two modified conjugate gradient methods (called the MCB1 and MCB2 methods) are proposed. In which the coefficient 𝛽 𝑘 in the two proposed methods is inspired by the structure of the conjugate gradient parameters in some existing conjugate gradient methods. Under the strong Wolfe line search, the sufficient descent property and global convergence of the MCB1 method are proved. Moreover, the MCB2 method generates a descent direction independently of any line search and produces good convergence properties when the strong Wolfe line search is employed. Preliminary numerical results show that the MCB1 and MCB2 methods are effective and robust in minimizing some unconstrained optimization problems and each of these modifications outperforms the four famous conjugate gradient methods. Furthermore, the proposed algorithms were extended to solve the problem of mode function.


Introduction
The optimization model is a requisite mathematical problem since it has been connected to different fields such as economics, engineering and physics.Today there are many optimization algorithms, such as Newton, quasi-Newton and bundle algorithms.Note that these algorithms fail to solve large-scale optimization problems as they need to store and calculate relevant matrices.In contrast, the conjugate gradient (CG) algorithm is successful due to its simplicity of iteration and low memory requirements.In this paper, the nonlinear CG method is studied for the following unconstrained optimization problem: where  is a smooth and nonlinear function.The CG method generates a sequence {  } ≥0 such that: where   is the current iteration point and   ∈ R  is the search direction defined by the following formula: where  +1 the gradient of  at  +1 and the parameter   is known as the conjugate gradient coefficient.The step length   is very important for the global convergence of conjugate gradient methods.One often requires the line search to satisfy the Wolfe conditions (WLS) and   +1   ≥      .
(1.5) Also, the strong Wolfe (SWLS) conditions consist of (1.4) and where 0 <  <  < 1.For the scalar   many formulas have been proposed.Some of the classical algorithms for   are Fletcher-Reeves (FR) method [12], Dai-Yuan (DY) method [5], Conjugate Descent (CD) method proposed by Fletcher [11], Polak-Ribière and Polyak (PRP) method [20,21], Hestenes-Stiefel (HS) method [13] and Liu-Storey (LS) method [17].where formulas for   , are given, respectively, by: ) where ‖•‖ denotes the Euclidean norm and   =  +1 −   .Although all nonlinear conjugate gradient methods should reduce to the linear conjugate gradient method when  is a convex quadratic and the line search is exact, their convergence properties may be quite different for nonquadratic functions.For example, Al-Baali [1] established the convergence of the FR method if the step length   satisfies (1.4) and (1.6) with  < 1 2 .Dai and Yuan [5] proved the global convergence result of the DY if the WLS is used with  < 1.In contrast, the PRP and HS methods have a drawback in that they may not globally be convergent even with the exact line search [22].This problem inspired numerous researchers to study the global convergence of the above methods under the inexact line search.Wei et al. [25] based on the PRP method, gave a new CG formula, called WYL method, where   is given by: The WYL method can be considered as a modification of the PRP method, which inherits good numerical results from the original method.Furthermore, Huang et al. [14] proved that the WYL method satisfies the sufficient descent condition and converges globally under the SWLS with the parameter  < 1 4 .Moreover, the Wei-Yao-Liu method may not be a descent method if the WLS is used.Yao et al. [26] extended this idea to the HS method, called the MHS.The parameter   in the MHS method is given by: For the SWLS under the Lipschitz continuity of the gradient, Yao et al. [26] established the global convergence of this computational scheme.Soon afterward, Zhang [27] took a little modification to the WYL method and constructed the NPRP method, such as the CG coefficient is computed by: Also, this author [27] has given a modified HS method, in which   is defined by: The NPRP and NHS methods possess sufficient descent conditions and converge globally if the SWLS is used and the parameter  is restricted in )︀ .Numerical results reported in [27] show that the NPRP method performs better than the WYL method and the NHS method better than the MHS method.Likewise, Du et al. [9] in 2016 give two modified CG methods, proposing the following formula: and The NVPRP * and NVHS * methods have sufficient descent conditions and are globally convergent if the SWLS is utilized with the parameter  < 1 4 and  < 1 3 , respectively.In 2012, Dai and Wen [6] proposed two modified CG methods, denoted by DPRP and DHS methods.The parameter   in the DPRP method is given by: And the scalar   in the DHS method is defined as: The convergence of two methods with the WLS established and numerical results show that these computational schemes are efficient [6].Recently, Zhu et al. [28] gave a modified CG method, called DDY1, where   in this method given by: The authors proved that this method possesses sufficient descent conditions and global convergence property when SWLS is employed [28].
The aim of this paper is to propose two new conjugate gradient methods.Under the SWLS, the convergence properties of MCB1 and MCB2 CG methods are established.Numerical results show that the two modifications are efficient and robust.Finally, an application of these methods in nonparametric mode estimator is also considered.
The most important and new thing in this work is the application of these methods in nonparametric statistics, where we are the first to use and access in this field.
-This work is organized as follows.In the next section, the two modified algorithms are introduced and the sufficient descent condition is proved.In setion three, the global convergence of the two proposed methods with an SWLS is proved.The numerical results are contained in section four.In section five, an application of the new methods in statistics nonparametric is focused.Finally, a paper summary is made.

New conjugate gradient methods
In this section, The two novel   are introduced, which are defined as  MCB1  and  MCB2

𝑘
, where MCB denoted Mehamdia, Chaib and Bechouat.First, the conjugate gradient parameter of MCB1 method is presented as follows where Second, the parameter   of the MCB2 method is defined as follows where  2 ∈ [0, 1] and The search direction   of MCB1 algorithm is given by: and the search direction   of MCB2 algorithm is defined by:

Algorithms
In this part, the MCB1 and MCB2 Algorithms with the SWLS are presented.

MCB2 Algorithm
The Algorithm of MCB2 is the same as the MCB1 Algorithm, but in Step 4, we replace formula (2.1) by formula (2.2) and in Step 5, we replace equation (2.3) by equation (2.4).

The sufficient descent direction
-The following lemma to prove the sufficient descent direction of proposed methods is needed.
Lemma 2.1.The following inequality always holds: Proof.Suppose that the   is the angle between the   and  +1 vectors and the   is the angle between the  +1 and   vectors, then We have The result can be achieved.
-First, the sufficient descent direction of the MCB1 is proved.
Theorem 2.1.Let the sequence {  } ≥0 and {  } ≥0 be generated by Algorithm MCB1, then for positive constant  1 we have Proof.The following proof is by induction.For  = 0,   0  0 = −‖ 0 ‖ 2 , thus the sufficient descent condition holds for  = 0. Now, it is assumed that (2.6) holds for  and prove that for  + 1. Multiplying (2.3) by   +1 from the left, we get From (1.6) and (2.6), it is obtained By (2.5), (2.7) and (2.8), it is got where Therefore, the proof is completed.-Second, the sufficient descent direction of the MCB2 method is proved.Theorem 2.2.Let the direction   be yielded by the MCB2 Algorithm.Then, we get where Proof.The following proof is by induction.For  = 0,   0  0 = −‖ 0 ‖ 2 , thus the sufficient descent condition holds for  = 0. Now, it is assumed that (2.11) holds for  and prove that for  + 1. Multiplying (2.4) by   +1 from the left, we obtain (2.12) From the (2.5) and (2.12), there is So there is a constant  2 > 0 with  2 > 1 +  2 .Hence, Theorem 2.2 is proved.

Global convergence
To establish the global convergence of proposed methods, the following basic Assumptions on the objective function are needed.
In some open convex neighborhood N of , the function  is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant  > 0, such that From Assumption 3.2, it is deduced that for all  ∈ N , there exists a positive constant Γ ≥ 0, such that -It follows from Dai et al. [7] proved the sufficient condition for the convergence of CG methods with strong Wolfe line search.
Lemma 3.1.Let Assumptions 3.1 and 3.2 hold.Consider the method (1.2) and (1.3), where   is a descent direction and   is obtained by the SWLS.If -This lemma is also needed to prove the convergence of MCB1 and MCB2 methods.Then Proof.See the proof of Lemma 3.2 in Lui and Li [16].

Numerical experiments
In this section, some obtained numerical experiments are presented with the two new proposed CG methods.The test problems have been taken to the CUTE library [2,4].All the algorithms have been coded in MATLAB 2013 and compiler settings on the PC machine (2.5 GHz, 3.8 GB RAM) with Windows XP operating system.The computational results of MCB1 method are compared with the NHS [27], NVHS * [9], DDY1 [28] and DHS [6].On the other hand, the computational results of the MCB2 method are compared with the NPRP [27], NVPRP * [9], DDY1 [28] and DPRP [6].In this numerical result, all algorithms implement the SWLS condition with  = 10 −3 and  = 10 −1 .The iteration is terminated if one of the following conditions is satisfied (i) ‖  ‖ ∞ < 10 −6 , where ‖•‖ ∞ is the maximum absolute component of a vector, (ii) The number of iterations exceeded 2000, (iii) The computing time is more than 500 s.The performance profile introduced by Dolan and Morè [8] is chosen to compare the performance according to the number of iterations and CPU time to rule as follows.Let  is the set of methods and  is the set of the test problems with   ,   is the number of the test problems and the number of the methods, respectively.For each problem  ∈  and solver  ∈ , denote  , be the number of iterations or CPU time required to solve problems  ∈  by solver  ∈ .Then a comparison between different solvers based on the performance ratio is given by Suppose that a parameter   ≥  , for all problems and solvers chosen, and   =  , if and only if solver  does not solve problem .The overall evaluation of the performance of the solvers is then given by the performance profile function given by where  ≥ 1 and size{ : 1 ≤  ≤   ,  , ≤ } is the number of elements in the set { : 1 ≤  ≤   ,  , ≤ }.This function is the distribution function for the performance ratio.The value of   (1) is the probability that the solver will win the rest of the solvers.
In this numerical study, Dim denotes the dimension of the problem, ITR denotes the number of iterations, TIME denotes the CPU time and Inf denotes the algorithm failed to yield a solution for the problem.
Figure 1 gives a performance comparison of the MCB1 method versus NHS, NVHS * , DDY1 and DHS methods.As this figure indicates, the new algorithm prevailed over all other Methods, with respect to CPU time, this clearly confirms the effectiveness of the MCB1 method.Generally, the DDY1 method and the DHS method are better than the NHS and NVHS * methods.It can be seen from Figure 2 that the MCB1 curve is mostly at the top of the NHS, NVHS * , DDY1 and DHS CG curves, so it is indicating that the MCB1 algorithm outperforms the NHS, NVHS * , DDY1 and DHS methods based on the number of iterations.In particular, the DHS method outperforms the other methods except for the DDY1 method.
From Table 1, it is clear that the average performance of the MCB1, NHS, NVHS * , DDY1 and DHS methods are very similar to the results obtained from Figures 1 and 2.
On the other side, Figure 3 is the performance profile of the MCB2, NPRP, NVPRP * , DDY1 and DPRP CG methods.From this figure, it is concluded that the MCB2 method performs better than the NPRP, NVPRP * , DDY1 and DPRP CG methods, from the viewpoint of the CPU time.Furthermore, although Figure 3 shows that DPRP method is also faster and more robust than DDY1 method when 1.5 <  < 3.5.Generally DDY1 is preferable to DPRP, NVPRP * and NPRP methods.The NVPRP * method behaves like the NPRP method, for the given test problems.
Figure 4 shows the performance profile for the number of iterations.Relative to this metric, MCB2 achieves the top performance, followed by DDY1 if  ≥ 9, then DPRP.The NVPRP * method behaves such as the NPRP method.
From Table 2, it is clear that the average performance of the MCB2, NPRP, NVPRP * , DDY1 and DPRP methods are very similar to the results obtained from Figures 3 and 4.

Application in mode function
The conjugate gradient method has played an important role in solving large scale unconstrained optimization problems that may arise in regression analysis [30], portfolio selection [3] and image restoration problems [18].
Estimation nonparametric has received a great deal of attention in both theoretical and applied statistics literature.For the historical and mathematical survey, we refer to the reader to Sager [23].In statistics, it is always interesting to study the central tendency of the data, that is usually quantified using the location parameters (mean, mode, median).The problem of estimating the mode function of a probability density function (p.d.f.) has taken considerable attention in the past for both independent and dependent data, and a number of distinguished papers deal with this topic.For example, Parzen [19] and Eddy [10] for estimation of the unconditional mode in the independent and identically distributed (i.i.d.) case.
In this section, it is considered that the problem of estimating the mode of a multivariate unimodal probability density  with support in R  from i.i.d.standard normal random variables  1 , . . .,   with common probability density function  .This problem has been investigated in numerous papers.To quote a few of them, Konakov [15] and Samanta [24].We assume that density  has an unique mode denoted by  and defined by  () = max ∈R   ().
(5.1)A kernel estimator of the mode  is defined as the random variable θ which maximizer the kernel estimator   () of  (), that is where The bandwidth (ℎ  ) is a sequence of positive real numbers which goes to zero as  goes to infinity and the kernel  is a p.d.f. on R  In this simulation, we choose between two different types of kernel: while standard Gaussian kernel defined by and Epanechnikov kernel obtained by The selection of the bandwidth ℎ is an important and basic problem in kernel smoothing techniques.In this simulation, the optimal bandwidth by the cross-validation method is chosen.In this context, the MCB1 and MCB2 methods are employed to solve the problem (5.2) under the SWLS technique and compare with NHS [27] and DDY1 [28] and NPRP [27] methods.According to Tables 3 and 4, it is clear that the MCB1 method is more efficient than NHS [27], DDY1 [28] methods and the MCB2 method is superior to the NPRP [27] and DDY1 [28] methods based on the number of iterations and CPU time for solving the problem (5.2).

Conclusion
This paper has presented two modified conjugate gradient methods, that is, MCB1 and MCB2 methods.Under the SWLS condition the sufficient descent condition of the MCB1 method has been established.An attractive property of the MCB2 method is that it generates a sufficient descent condition, regardless of the line search.The global convergence properties of the MCB1 and MCB2 methods have been established under the SWLS conditions.
From the statistical results obtained by the first comparison technique in Figures 1 and 3, it is clear that the average CPU time of the MCB1 and MCB2 methods are approximately equal.
From Figures 2 and 4, the MCB1 method is slightly more effective than the MCB2 method, with respect to the number of iterations.
The final conclusion is that the proposed methods are more efficient than some existing methods.The practical applicability of MCB1 and MCB2 methods is also explored in nonparametric estimation of the mode function.

Figure 2 .
Figure 2. Performance profile on the number of iterations (MCB1).

Figure 4 .
Figure 4. Performance profile on the number of iterations (MCB2).
The following theorem establishes the global convergence of the MCB1 method with the SWLS.Theorem 3.1.Suppose that Assumptions 3.1 and 3.2 hold.Consider any CG method in the form (1.2) and (1.3), with the parameter   =  MCB1  , in which the step length   is determined to satisfy the SWLS condition (1.4) and(1.6),where   is a descent search direction.Then, this method converges in the sense that satisfies(3.3).This indicates the step length   obtained in the MCB1 and MCB2 methods is not equal to zero, i.e., there exists a constant  > 0, such that   ≥ , ∀ 0.(3.4) - 3.3.It is assumed that  0 is a starting point for which Assumptions 3.1 and 3.2 hold.Consider any method in the form (1.2) and (1.3), where   is a descent direction and the step size   satisfies the WLS (1.4) and (1.5), then we have The following theorem is used to prove the global convergence of the MCB2 method.Theorem 3.2.Consider that Assumptions 3.1 and 3.2 hold.Let the sequences {  } ≥0 and {  } ≥0 be generated by MCB2 Algorithm.Then lim ∞ ∑︁ =0 ‖  ‖ 4 ‖  ‖ 2 < ∞.(3.12)-

Table 3 .
The simulation result of MCB1, DDY1 and NHS methods for solving problem (5.2).