NEW ITERATIVE CONJUGATE GRADIENT METHOD FOR NONLINEAR UNCONSTRAINED OPTIMIZATION

. Conjugate gradient methods (CG) are an important class of methods for solving unconstrained optimization problems, especially for large-scale problems. Recently, they have been much studied. In this paper, we propose a new conjugate gradient method for unconstrained optimization. This method is a convex combination of Fletcher and Reeves (abbreviated FR), Polak–Ribiere–Polyak (abbreviated PRP) and Dai and Yuan (abbreviated DY) methods. The new conjugate gradient meth-ods with the Wolfe line search is shown to ensure the descent property of each search direction. Some general convergence results are also established for this method. The numerical experiments are done to test the efficiency of the proposed method, which confirms its promising potentials.


Introduction
Optimization problem is the process of obtaining the minimization or maximization of an objective function. Unconstrained optimization is a branch of optimization in which we minimize an objective function that depends on real variables with the total absence of restrictions on their values of those variables. Consider the nonlinear following unconstrained optimization problem min{ ( ), ∈ R }, (1.1) where is a smooth function and its gradient is available [16]. Over the years the mathematicians have developed several numerical methods for solving this kind of problem (1.1), which include the Steepest Descent (SD) method; Newton method; CG and Quasi-Newton (QN) methods, In this paper, we will focus on the CG method because of its simplicity, lowest memory requirements [6], and especially its usability when the dimension is large.
In 1952, Hestenes and Stiefel [15] suggested a CG method for solving the unconstrained linear optimization problem as it is applied to quadratic functions. Then, in 1964, Fletcher and Reeves [13] extended the CG method for solving nonlinear unconstrained minimization problems. The basis of all CG methods is to generate Hager and Zhan (HZ) [14] 3 CD = ‖ +1‖ 2 − Conjugate Descent (CD) [12] 4 LS = +1 − Liu and Storey (LS) [17] 5 BA = ‖ ‖ 2 Al-Bayati and Al-Assady (BA) [2] a sequence starting from an initial estimate 0 ∈ R , using the following recurrence +1 = + , = , = 0, 1, . . . , (1.2) where > 0 is a step size obtained by carrying out a one dimensional search, known as the line searches [21], we usually use the inexact one [22,26], this is due to the fact that when we perform an exact linear search at each iteration, it is hardly feasible in practice and it is quite expensive in time and memory. Among them, the so-called strong Wolfe line search conditions require that [23,24] ( where scalars and satisfy 0 < ≤ < 1. And is the search direction generated by the rule: where is the gradient of at the point and is known as the CG parameter. The different choices for the parameter correspond to different conjugate gradient methods. Moreover, we are going to summarize some most popular formulas of the conjugate gradient methods in the following Table 1.
These methods are the same if is a strongly convex quadratic function and is obtained by exact line search since the parameters of these methods are equal and thus generate the same sequence { } =0 ∞ , but in the opposite case when applied to a general nonlinear function with inexact line searches, we get different sequences { } =0 ∞ , implying a range of different methods [16]. One of the most useful CG methods is the hybrid method which combines the classical CG methods [5] in order to have a good practical conjugate algorithm. Moreover, we are going to summarize some well known hybrid conjugate gradient methods in Table 2.
In this work we propose another hybrid CG method based on combination of FR, PRP and DY conjugate gradient algorithms for solving unconstrained optimization problems. The corresponding conjugate gradient parameters are where = +1 − and ‖.‖ stands for the Euclidean norm. The above methods correspond to those of Fletcher and Reeves [13], Dai and Yuan [8] and Polak-Ribiere-Polyak [19,20] respectively. The paper is summarized as follows: in Section 2, we present the new selected hybrid CG method, and we got the parameters , using some techniques, under mild conditions we prove that the selected method with the Wolfe line search generates directions satisfying the sufficient descent condition. Section 3 presents the algorithm.Convergence properties of the new selected method are analyzed in Section 4. In Section 5, we proved the efficiency of our method by giving some numerical comparisons against FR, PRP and DY methods using 30 different test problems from the CUTE [7]. Finally, a brief conclusion is drawn in Section 6.

Hybrid conjugate gradient algorithms
In this section, we will describe a new CG method for unconstrained optimization where the parameter in the proposed method, denoted by ℎ , is computed as a convex combination of FR , PRP and DY , i.e.
where , ∈ [0, 1] are named the hybridization parameters that will be determined in a specific way to be described later.
The direction ℎ , is given by From the discussion according to the values of the hybridization parameters and , we obtain the following cases -If = 1 and = 0, then ℎ = FR . -If = 0 and 0 < < 1, then ℎ = FR + (1 − ) DY i.e. ℎ is a convex combination between FR and DY . See [1].
-If 0 < , < 1, then we have a new hybrid CG method as a convex combination of three methods "FR, PRP and DY". Which we will focus on studying in the topic of our research.
According to the relation (2.1), the last form becomes Considering again the relation (2.4) and after adding and subtracting the value ( + )( +1 ), we get Finally, we get (2.3).
Our motivation to select the parameters , in such a manner that the search direction +1 satisfies the conjugacy condition i.e. ( +1 = 0).
We have Having in view the relations (1.6)-(1.8), the last relation becomes multiplying (2.6) by from the left and using the conjugacy condition, we obtain Supposing that is a descent direction ( 0 = − 0 ), then for the algorithm given by (1.2) and (2.6) we can prove the following result.
from the strong Wolfe condition, we get Then, > 0, (2.11) multiplying by , we obtain On the other hand, we have according to the relation (2.14), the relation (2.13) becomes (2.15) Denoting = + +1 ‖ ‖ 2 , to facilitate the theoretical proof, we must simplify as shown.
Now we can prove that ℎ +1 satisfies the sufficient descent condition in the following theorem.

Convergence properties
In this section, we study the global convergence properties of the proposed conjugate gradient method. For further considerations we need the following assumptions and lemmas.    Then, Proof. It follows (4.6), the Lipschitz condition, the Cauchy-Bunyakovsky-Schwartz inequality, it holds that  Proof. Suppose that ̸ = 0 for all . Then, we are going to prove (4.10). Suppose by contradiction that (4.10) is false. Then there exists > 0 such that ‖ ‖ ≥ , ∀ sufficiently large. (4.11) From the above Theorem 2.3, we have From the strong Wolfe conditions, we get multiplying (4.13) by > 0 and using (4.9), (4.12), we obtain On the other side, we have where is the diameter of the level set . We have (4.14) From (4.15) to (4.17), we can write (4.18) According to the relation (4.18), the relation (4.14) becomes This is a contradiction with (4.11), so we have finished the proof of (4.10).

Numerical experiments
In this section we present the computational performance of a Fortran implementation of the new algorithm on a set of 450 unconstrained optimization test problems. The test problems are the unconstrained problems in the CUTE [7] library, along with other large-scale optimization problems presented in [4]. We selected 30 large-scale unconstrained optimization problems in extended or generalized form. For each function we have considered numerical experiments with the increasing number of variables = 2, 4, 10, . . . , 25 000. In order to assess the reliability of our new proposed method, we have tested them against FR, PRP and DY algorithms using the same test problems. All these algorithms implement the Wolfe line search conditions with = 10 −3 and = 10 −4 , the same stopping criterion ‖ ‖ ∞ ‖ 0‖∞ < 10 −6 , where ‖.‖ ∞ is the maximum absolute component of a vector, and the hybridization parameter = 0.8. The comparisons of algorithms are given in the following context. Let ALG1 ( * ) and ALG2 ( * ) be the optimal value found by ALG1 and ALG2, for problem = 1, . . . , 450, respectively. We say that, in the particular problem , the performance of ALG1 was better than the performance of ALG2 if and the number of iterations, or the number function-gradient evaluations, or the CPU time of ALG1 was less than the number of iterations, or the number of function-gradient evaluations, or the CPU time corresponding to ALG2, respectively. All codes are written in Matlab and compiler settings on the PC machine with Intel(R) Core(TM) i3-4030U CPU @ 1.90 GHz processor and 4 GB RAM memory and windows 7 professional system. Figures 1 and 2 show the performance of these methods relative to the number of iterations (iter) and CPU time (time), which were evaluated using the profiles of Dolan and Moré [11]. Benchmark results are generated by running a solver on a set of problems and recording information of interest the number of iterations and CPU time and using parallel processing running a different CG method in each processor choosing in every step the result giving the least value of the function [18]. Let be the set of solvers in comparison. Assume that consists of solvers, consists of problems. For each problem ∈ and solver ∈ , denote , be the computing time (or the number of iterations) required to solve problem ∈ by solver ∈ , and the comparison between different solvers is based on the performance ratio defined by where size means the number of elements in set , then ( ) is the probability for solver ∈ that a performance ratio , is within a factor ∈ R . The is the (cumulative) distribution function for the performance ratio. The value of (1) is the probability that the solver will win over the rest of the solvers.
That is, for each method, we plot the fraction of problems for which the method is within a factor of the best time. The left side of the figure gives the percentage of the test problems for which a method is the fastest, the right side gives the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor of the best time.
From the two figures, we can see that the new method is superior to the other conjugate gradient methods on the testing problems.

Conclusion
There are many conjugate gradient methods for solving unconstrained problems, especially large scale ones. One of the most useful CG methods is the hybrid method, which combines the classical CG methods in order to create a new method that performs well. In this paper, we have proposed a new hybrid method where the parameter is computed as a convex combination of three parameters FR , PRP and DY , the sufficient descent and global convergence have been proved and the practical results shown that the selected method is faster and more efficient compared to the methods used.