A SURVEY ON THE DAI–LIAO FAMILY OF NONLINEAR CONJUGATE GRADIENT METHODS

. At the beginning of this century, which is characterized by huge flows of emerging data, Dai and Liao proposed a pervasive conjugacy condition that triggered the interest of many optimization scholars. Recognized as a sophisticated conjugate gradient (CG) algorithm after about two decades, here we share our visions and thoughts on the method in the framework of a review study. In this regard, we first discuss the modified Dai–Liao methods based on the modified secant equations given in the literature, mostly with the aim of applying the objective function values in addition to the gradient information. Then, several adaptive, sort of optimal choices for the parameter of the method are studied. Especially, we devote a part of our study to the modified versions of the Hager–Zhang and Dai–Kou CG algorithms, being well-known members of the Dai–Liao class of CG methods. Extensions of the classical CG methods based on the Dai–Liao approach are also reviewed. Finally, we discuss the optimization models of practical disciplines that have been addressed by the Dai–Liao approach, including the nonlinear systems of equations, image restoration and compressed sensing.


Introduction
It is needless to say that in the contemporary world we are grappling with a level of huge data that makes the modeling a critical task.Large data sets that we are inundated and swarmed with need efficient memoryless approaches to be handled, in a way that we do not carry on their complexity.
In a wide range of practical disciplines such as machine learning and signal processing, large scale continuous optimization models emerge, often as the unconstrained optimization problem min ∈R   (), (1.1) where the objective function  is here assumed to be smooth.Scholar studies reflect the value of the CG techniques among the various continuous optimization algorithms [95].Initially founded by Hestenes and Stiefel (HS) [59] in the midst of the previous century for solving positive definite systems of linear equations, and then adopted by Fletcher and Reeves [51] for the unconstrained optimization, CG algorithms benefit the low memory storage and simple iterations squarely as well as the second order information implicitly [57,83].
Successive approximations generated by the CG algorithms are in the form of starting from a given point  0 ∈ R  , where   =     , in which   > 0 is a step length determined by a line search along the CG direction   , often defined by  0 = − 0 ,  +1 = − +1 +     ,  = 0, 1, . . ., (1.3) where   is the CG parameter and   =  (  ) [97].There is a big bunch of formulas for   with completely different computational outcomes [8,16,57].
In the line search procedure of the CG algorithms, polynomial interpolation schemes are mostly used with the Wolfe conditions as the stopping criterion [86].The Wolfe conditions simultaneously consist of the Armijo condition, and the curvature condition, ∇ (  +     )    ≥ −     , with 0 <  <  < 1, for a descent direction   .Meanwhile, consisting of (1.4) together with the following (to some extent) strict version of (1.5), the strong Wolfe conditions are widely used to give more exact choices for   .Moreover, modified versions of the Wolfe conditions have been developed for the CG algorithms [114,117].It is also worth noting that sometimes backtracking line search schemes are also used in the CG algorithms, based upon the Armijo type conditions [121,122].As known, conjugacy conditions are the basis of the CG algorithms.The linear CG methods, i.e. the CG methods for minimizing a strictly convex quadratic function with the Hessian  ∈ R × by the exact line search, or equivalently, the CG methods for solving a system of linear equations with the positive definite coefficient matrix , generate a sequence of the search directions {  } ≥0 such that the following basic conjugacy condition holds [97]: = 0, ∀ ̸ = .However, for general nonlinear functions, mean value theorem ensures that there exists some   ∈ (0, 1) such that   +1   =     +1 ∇ 2  (  +       )  , where   =  +1 −   .Hence, the equation can be regarded as a conjugacy condition for general situations, which together with (1.3) yields the HS parameter: Note that by the Wolfe line search conditions we have      > 0, and so,  HS  is well-defined.At the beginning of the current century, Dai and Liao (DL) [40] put forward a conjugacy condition which gave rise to a great one-parameter class of CG algorithms.Their approach has been established upon the quasi-Newton (QN) aspects.As known, QN iterations are in the form of (1.2) in which the search direction  +1 is computed as a solution of the linear system  +1  = − +1 where  +1 ∈ R × is a symmetric (and often positive definite) approximation of ∇ 2  ( +1 ) [97].The successive Hessian approximations {  } ≥0 in the QN algorithms are classically updated based on the standard secant equation; that is (1.7) Now, from the algorithmic features of the QN methods, we can write being another conjugacy condition which reduces to (1.6) under the exact line search.In an effective extension scheme, Dai and Liao [40] embedded a parameter  ≥ 0 on (1.8) and suggested the following hybrid conjugacy condition: which together with (1.3) yields It is worth mentioning that when  = 0 or the line search is performed exactly, then (1.9) reduces to (1.6).
Alternatively, for  = 1, when the line search is approximately exact in the sense of   +1   ≈ 0, the equation (1.9) can be regarded as a conjugacy condition that implicitly meets the QN features.After introducing (1.10), Dai and Liao [40] showed that an iterative method of the form (1.2)-(1.3)with  DL  as the CG parameter is globally convergent for the uniformly (strongly) convex functions in (1.1).Then, to establish convergence for general functions, based on the analysis of [54], they proposed the following modified version of  DL  : Many researchers volunteered their efforts to study different aspects of the DL method so that now it can be regarded as a sophisticated CG algorithm.Here, we plan to share our vision of the DL method in the framework of a review study.We classify our review into the following sections.Firstly, in Section 2, we discuss the studies in which by employing the modified secant equations [106] in (1.9), modified versions of the DL method have been proposed.Several optimal adaptive choices for the DL parameter  are discussed in Section 3. As wellknown members of the DL class of CG algorithms, Section 4 is devoted to modified versions of the Hager-Zhang [55,56] and Dai-Kou [42] methods.Considering the similar transition from  HS  to  DL  , several DL type extended versions of the classical CG parameters are studied in Section 5. To discuss practical applications of the method, in Section 6 we review some researches that have addressed the nonlinear systems of equations, image restoration and compressed sensing by the DL method.Ultimately, the concluding remarks are given in Section 7.

Improved Dai-Liao methods based on the modified secant equations
As the results of conscious efforts to enhance validity and reliability of the QN approximations of the Hessian, modified versions of the standard secant equation (1.7) have been suggested in the literature.A holistic model of several essential modified secant equations has been given in [13] as follows: in which ,  and  are nonnegative constants,   ∈ R  is a vector parameter satisfying      ̸ = 0, ‖.‖ represents the Euclidean norm, and

.2)
A review of the literature reveals that the vector parameter   is often set as   =   , or, under the Wolfe line search conditions,   =   .Now, to see how the classical (modified) secant equations are special cases of (2.1), firstly note that if  =  = 0, then (2.1) reduces to (1.7).Alternatively, when  = 0, the equation (2.1) yields the modified secant equations proposed by Wei et al. [104] for  = 1, Biglari et al. [32] for  = 2, and Zhang et al. [119,120] for  = 3.All the equations given in [32,104,119,120] have been obtained based on the Taylor expansion with the aims of enhancing the accuracy of the Hessian approximation as well as benefiting the available function values in addition to the gradient information.It can be (loosely) stated that order of the accuracy of the mentioned modified secant equations gradually increases by the growth of .For  = 0, the equation (2.1) reduces to the modified secant equation proposed by Li and Fukushima [67,68] which is capable to guarantee global convergence of the QN methods needless to the convexity assumptions.Meantime, multi-step secant equations have been developed by Ford and Moghrabi [52], utilizing the information available from more than one previous iteration.Now, taking (1.8) into consideration, it can be observed that all the mentioned modified secant equations can be attached to the DL approach, as a measure to benefit the mentioned merits of the modified secant equations.In this context, Li et al. [70] used a version of the modified secant equation proposed by Wei et al. [104] in the sense of imposing a nonnegative restriction on   given by (2.2).More exactly, they let simply ensuring positiveness of      under the Wolfe conditions which is crucial both in the theoretical and numerical viewpoints.Yabe and Takano [107] employed the modified secant equation suggested by Zhang and Xu [119] to get another modified DL method.Babaie-Kafaki et al. [30] dealt with an extended version of the modified secant equation of [119] together with using a nonnegative restriction similar to (2.3), and proposed a modified DL algorithm with larger parametric convergence interval rather than that of [107].Peyghami et al. [89] studied improved versions of the DL type methods of [30] by tuning  adaptively and setting   as a convex combination of   and   in (2.1) with  = 0.That is, in [89] an adaptive version of the modified secant equation of [119] has been used.Dehghani and Bidabadi [44] put forward another DL type algorithm using the modified secant equation proposed by Yuan [112].Inherited from the corresponding modified secant equations, all the modified DL methods suggested in [30,44,70,89,119] benefit the objective function values in addition to the gradient information.Also, they have been shown to be globally convergent for uniformly convex functions with the modified versions of (1.10), while, for which global convergence regardless of the convexity has been established with the modified versions of (1.11).To get global convergence for general functions without any restriction on the CG parameter (1.10), Zhou and Zhang [127] applied the modified secant equation of [67].In a generalization scheme, to simultaneously take advantage of the objective function values as in [30,44,70,89,119], and to get global convergence without convexity supposition or the mentioned nonnegativity restriction, Arazm et al. [13] proposed a modified DL method using the extended secant equation (2.1).Another study with similar aim has been carried out by Dehghani et al. [45] with improving the method of [30] based on the modified secant equation of [67].Using modified structured secant equations, Kobayashi et al. [64] addressed the nonlinear least squares problems by the CG algorithms of the DL framework as well.
In most of the DL type methods discussed above, a version of the extended secant equation (2.1) was used with   or  +  respectively defined by (2.2) or (2.3).However, Ford et al. [53] utilized the multi-step secant equations [52] in the DL approach to develop a class of multi-step nonlinear CG algorithms, employing information available from more than one previous iteration.Also, based on the concept of the spectral scaling secant equation, Liu et al. [76] developed a spectral DL technique with an adaptive choice for .

Adaptive choices for the Dai-Liao parameter
At the beginning of the previous decade, Andrei [10] classified several problems in the CG algorithms that have remained open.The given list then served as the origin of significant studies in the field.Especially, the second item of the list is related to the optimal choices of the DL parameter , which mainly affects the theoretical aspects as well as computational behavior of the method.Here, we review several studies that addressed the adaptive, sort of optimal choices of .As will be discussed, several efforts targeted the modified secant equations to get appropriate choices for the DL parameter.It should be noted that in [30,40,53,70,89,127] promising computational outputs have been reported with constant settings of  determined by simple trial and error schemes.Especially, in [14], based on the holistic conjugacy condition (1.9), it has been numerically shown that for small values of   +1   , it is better to set  = 1 to benefit the QN aspects, while, otherwise, the setting  = 0 is more reasonable.
To conduct convergence analysis of the CG algorithms, it is often of great necessity for the search directions to satisfy the descent condition [43], Besides, sufficient descent condition may be pivotal to establish convergence of the methods [40,54]; that is where  > 0 is a constant.Also, (3.2) has been classically considered as a superiority of the CG methods [38,57].So, taking these facts into consideration and inspired by the Shanno's matrix viewpoint on the CG algorithms [93], Babaie-Kafaki and Ghanbari [20] noted that the DL search directions can be written as  +1 = − +1  +1 , for all  ≥ 0, where being nonsingular when  > 0 and      ̸ = 0 [19].Then, conducting an eigenvalue analysis on a symmetrized version of  +1 given by they obtained the following two-parameter choices for : which ensures the descent condition (3.1) with  > 1 4 and  < 1 4 • It is notable that the choices (, ) = (2, 0) and (, ) = (1, 0) respectively yield the CG parameters proposed by Hager and Zhang [55], and Dai and Kou [42].Hence, the CG algorithms of [42,55] lie within the DL family of CG methods.Moreover, as established in [15,57], by setting )︂ yields the minimizer of the Byrd-Nocedal measure function [34] of the matrix  +1 .In another research line, well-conditioning of the DL search direction matrix  +1 defined by (3.3) has been considered, to enhance stability of the method.As known, condition number is a crucial factor in matrix computations that should be preferably small to the possible extent [99].Initially, upon a singular value analysis on  +1 in the Euclidean matrix norm in the sense of minimizing the spectral condition number, Babaie-Kafaki and Ghanbari [19] suggested the following two adaptive choices for the DL parameter: , and which the first one is shown to be computationally outstanding.Then, carrying out simultaneous singular value and eigenvalue analyses, Zhang et al. [124] showed that the choice (, ) = )︂ in (3.4) minimizes another upper bound of the spectral condition number of  +1 , also ensuring the sufficient descent condition.Moreover, Babaie-Kafaki and Ghanbari [28] considered the DL parameter in the forms of and with the parameter  > 0, making  +1 similar to the scaled memoryless BFGS (Broyden-Fletcher-Goldfarb-Shanno) updating formula for the inverse Hessian [97], i.e.
in which   > 0 is called the scaling parameter.Through this initiative, the DL method may benefit the second order information more explicitly.Then, studying the spectral condition number of  +1 with the settings (3.5) and (3.6), they obtained optimal values of , yielding corresponding to (3.5), and corresponding to (3.6).Also, in an attempt to minimize the condition number of  +1 in the Frobenius norm, Babaie-Kafaki and Ghanbari [22] achieved another choice for  as In another study, Aminifard and Babaie-Kafaki [2] showed that , is a minimizer of an upper bound of the ℓ 1 -norm condition number of  +1 , while is the minimizer of an upper bound of the ℓ ∞ -norm condition number of  +1 .
Several other studies focused on the least squares models to get appropriate choices for the DL parameter, inspired by the approach of Dai and Kou [42].In this regard, to take advantages of the merits of the three-term CG algorithm proposed by Zhang et al. (ZZL) [123] with the search directions especially satisfying an equality form of (3.2) with  = 1 regardless of the line search and the objective function convexity, Babaie-Kafaki and Ghanbari [22] calculated  as the solution of the least squares problem min which can be also obtained as the minimizer of the distance between the maximum and the minimum singular values of  +1 [17].As put forward in [17],  ZZL  is shown to be the minimizer of an upper bound of the spectral condition number of  +1 given by Piazza and Politi [90].Also, Andrei [12] acquired  ZZL  as a result of clustering the eigenvalues of  +1 .In a similar scheme, Li et al. [72] used Andrei's adaptive three-term CG algorithm [11] and obtained (3.4) with an adaptive choice for  besides the fixed choice  = 1.In another attempt with the aim of taking advantage of the second order information provided by the BFGS update, by solving min , where ||.|| F stands for the Frobenius matrix norm and   +1 is defined by (3.7), Babaie-Kafaki and Ghanbari [25] obtained the following one-parameter choices for : By considering the DL parameter in the two-parameter framework of (3.4), in a similar approach Li et al. [73] proposed an adaptive setting for (, ).Moreover, Babaie-Kafaki [17] noted that it is reasonable to compute the DL parameter in a way that  +1 tends to an orthonormal matrix [97], being perfectly conditioned with respect to the Euclidean norm in the sense of having unit spectral condition number.Thus, by solving min +1 is obtained by the Sherman-Morrison formula [97],  F  given by (3.8) has been regained in [17].
Aminifard and Babaie-Kafaki [3] noted that when the gradient approaches the direction of the maximum magnification [99] by the search direction matrix  +1 , some computational difficulties arose alongside undesirable convergence behavior of the DL method.To resolve the issues, they determined the parameter  in a way to make the gradient orthogonal to the direction of the maximum magnification by  +1 .In the analysis of [3] a fixed point equation plays a pillar role.Thus, as a popular technique for solving fixed point equations, in an omnipresent scheme Babaie-Kafaki and Aminifard [29] used the functional iteration method to improve effectiveness of the adaptive choices of the DL parameter proposed in the literature.Based on the concept of the maximum magnification, Aminifard and Babaie-Kafaki [4] proposed a restart strategy for the DL method which is capable to advance the computational performance.
Fatemi [48] obtained another adaptive choice for  based on the following penalty model: in which  > 0 is the penalty parameter and  +1 is defined by (1.3).The model has been designed with the aim of achieving the sufficient descent condition (3.2) as well as the conjugacy condition (1.6) and the orthogonality of the gradient to the previous search directions as in the linear CG algorithms [97].He also claimed that it is reasonable to set the DL parameter in the interval ]︂ .In a similar framework, Fatemi [49] dealt with the following variant of (3.10) to get another adaptive choice for the DL parameter as well: )︃ .
In another initiative to employ higher order information of the objective function, Momeni and Peyghami [81] determined another adaptive formula for the DL parameter as a function of the step length obtained by the quadratic and/or cubic local models of the objective function.
Modified secant equations have been also employed to achieve proper choices for the DL parameter, benefiting their advantages presented in Section 2. For example, Zheng [125] used the modified secant equation of Yabe and Takano [107], and proposed an adaptive choice for  to be used in (1.11).Also, using the Newton direction in the sense of setting  DL +1 =  Newton +1 , Lotfi and Hosseini [78], and Lu et al. [79,80] dealt with the following equation: Performing inner product on both sides of the above equation by    ∇ 2  ( +1 ), they obtained Then, Lotfi and Hosseini [78] simplified the above formula using the modified secant equation of [67], while, Lu et al. [79,80] applied the modified secant equations of [67,70,104] in the framework of (2.1).

Modified versions of the Hager-Zhang and Dai-Kou methods
As a deep study at the first years of the 21st century, Hager and Zhang (HZ) [56] founded a great algorithm, called CG DESCENT, which today is entitled as a quite helpful tool to handle the large scale continuous optimization models.In particular, CG DESCENT can be regarded as a DL type algorithm [75] with a judicious choice for , i.e.
Although satisfying the sufficient descent condition [15], the DL method with  =  HZ  fails to guarantee the global converge without convexity supposition.To resolve the weak spot, Hager and Zhang [55,56] proposed the following CG parameter: where  > 0 is a constant and  HZ  is  DL  with  =  HZ  .Afterwards, Dai and Kou (DK) [42] suggested another choice for the DL parameter as well, with where, loosely speaking,   represents the scaling parameter of the scaled memoryless BFGS updating formula (3.7).They found  ZZL

𝑘
given by (3.9) as the best choice for   .To get the global convergence for general functions, they introduced the following restricted CG parameter: where  ∈ [0, 1) is a parameter and  DK  is  DL  with  =  DK  .Validity and reliability of the CG DESCENT algorithm triggered the interest of several scholars.For example, Li and Huang [69] proposed a modified CG DESCENT algorithm based on the Yabe-Takano modified secant equation [107].Thus, the method of [69] fulfills the sufficient descent condition while using the objective function values.Then, based on a singular value analysis, Babaie-Kafaki and Ghanbari [23] noted that large (absolute) values of the second term of , may lead to an ill-conditioned search direction matrix and so, in such situations it is better to ignore the term in the sense of using  HS  instead of  HZ  .As a result, in [23] a discrete hybridization of the HS and HZ methods has been suggested.
Improving performance of the DK method has also attracted significant attentions.As examples, Kou [65], and Faramarzi and Amini [47] developed improved versions of the DK method using the modified secant equations given in [30,107].Hence, the methods of [47,65] possess the mentioned properties of the method of [69], i.e. benefiting the function values while satisfying (3.2).To improve orthogonality of the gradient vectors generated by the DK method as an advantageous feature of the linear CG methods, Liu et al. [77] developed a special scaled version of the DK method using a matrix obtained based on a QN update.Huang and Liu [60] set   in (4.1) as a convex combination of the Oren-Luenberger [87] and Oren-Spedicato [88] scaling parameters and developed modified DK algorithms based on some new line search conditions.

Extensions of the classical conjugate gradient parameters based on the Dai-Liao approach
As a work in progress, scholars waged significant studies to expand DL approach on the classical CG parameters, achieving one-parameter extensions of the parameters.Such attempts have been devised to get the descent property or to enhance the efficiency.Amidst suchlike studies, Babaie-Kafaki and Ghanbari [18] proposed an extension of the Polak-Ribière-Polyak (PRP) [91,92] parameter as follows: where and  is a nonnegative parameter.Then, in light of the eigenvalue analysis carried out in [20], they acquired a two-parameter choice for  which ensures the descent property.It is worth mentioning that the DPRP method suggested by Yuan [113] is a member of the EPRP class of CG algorithms with ensuring the sufficient descent condition (3.2) for a constant setting of  in . Global convergence of DPRP has been analyzed by Yu et al. [110] under some modified line searches.To employ the objective function values in the PRP framework, Yuan et al. [115] made a hybrid modification on the DPRP parameter using a version of the modified secant equation proposed by Li et al. [70].Also, Babaie-Kafaki and Ghanbari [27] conducted a singular value analysis on a rank-two perturbation of the identity matrix, being a generalization of the DL and EPRP search direction matrices, to get an optimal choice for the EPRP parameter  as (sort of) minimizer of the spectral condition number of the updating formula.Making the EPRP search direction to bend to the direction of the efficient three-term CG algorithm proposed by Zhang et al. [121] in a least square model, another adaptive choice for the EPRP parameter has been given in [27] as well.Performing an eigenvalue analysis in light of the concept of the maximum magnification by a symmetrized version of the EPRP search direction matrix, Aminifard and Babaie-Kafaki [5] proposed another formula for  in (5.1).Their formula is capable to simultaneously improve the convergence and the numerical behavior of EPRP.Moreover, Babaie-Kafaki et al. [31] applied the modified PRP parameter proposed by Sun and Liu [98] to get another CG parameter in the DL framework.Andrei [9] studied a DL generalization of the Dai-Yuan CG parameter [39] with a mix of acceleration, taking into account the optimal choice of  given in [42].A similar extension of the modified Liu-Storey (LS) [74] CG parameter proposed by Yao et al. [94] has been analyzed by Cheng et al. [37].Also, Yao et al. [108] studied a DL generalization of the modified HS parameter given in [94].Aminifard and Babaie-Kafaki [6] put forward such extension on the effective hybrid CG parameter proposed by Jian et al. [63].In addition, Zheng and Zheng [126] targeted the modified CG parameters given by Dai and Wen [41], being improved versions of the HS and LS parameters, to suggest other CG parameters with the DL structure.Nakamura et al. [82] developed extended versions of several classical CG parameters based on the DL scheme, taking into account the choices of  in the HZ framework [57].Three-term CG algorithms weighed in acquiring generalized versions of the DL method as well, yielding the sufficient descent condition (3.2) by a simple but meaningful plan.Especially, Sugiki et al. [96] used the class of three-term CG algorithms established by Narushima et al. [84] as a role model to get a three-term generalization of the Yabe-Takano [107] CG parameter which is a member of the DL family of CG parameters.Then, founded upon the three-term CG method of [123], Babaie-Kafaki and Ghanbari [21] developed another three-term extension of the DL method in which the parameter  is determined based on the standard secant equation.In light of a matrix point of view, Yao et al. [109] put forward another three-term extension of the DL method in which  is determined based on the conjugacy condition (1.9).Babaie-Kafaki and Fatemi [50] developed a three-term version of the DL method based on a penalty model similar to (3.10).Besides, a fourterm generalization of the DL method has been studied by Babaie-Kafaki and Ghanbari [26], together with an adaptive formula for  given by approaching the search direction matrix of the method to the scaled memoryless BFGS updating formula (3.7) in the Frobenius norm.In another guide line, Babaie-Kafaki and Ghanbari [24] proposed a special symmetrized version of the DL search direction matrix  +1 given by (3.3) that contains the memoryless BFGS updating formula as a special case.Then, performing an eigenvalue analysis, they gained two adaptive formulas for the parameter , leading to two other generalized DL algorithms.

Applications of the Dai-Liao class of conjugate gradient methods in practical disciplines
Recently, as the well-known, well-studied and well-developed memoryless algorithms, DL type methods put to the test for solving several practical optimization problems.As a result, they are now technically recognized as an efficient tool to address real world optimization models, even being capable to reach their full potential in an era defined by broad-based, inclusive growth in the size of the data sets.Here, we briefly review several attempts around highlighting practical aspects of the DL algorithms.
Nonlinear systems of equations often straightly emerge in engineering applications [58,103,114,116].Especially, monotone cases of the model appear as the subproblems of the generalized proximal algorithms with Bregman distances [61].In the following, several studies for developing the DL algorithms (by derivative free schemes) to solve the nonlinear systems of monotone equations are reviewed.
As an initial study on the issue, Abubakar and Kumam [1] applied the DL method given in [20] to solve the problem.Then, based on eigenvalue analyses, Waziri et al. [100,103] developed several DL methods with the parameter choices in the framework of HZ [55,57] for solving the nonlinear systems of monotone equations.Especially, in [103] they assessed efficiency of their algorithms in compressed sensing.Waziri et al. [101,102] also used the DL algorithms given in [13,30,70,107] to address the model.Based on a scalar approximation of the matrix  ′ = ∇ yielding an adaptive formula for , Halilu et al. [58] proposed another CG algorithm of the DL family to solve the problem and evaluated its performance in robotics.Recently, in a similar scheme, an accelerated DL projection algorithm for solving the systems of monotone equations has been developed by Branislav et al. [62].

Image restoration
As a frequently reported issue, images may be corrupted via impulse noise, one of the fashionable noise models in which just a part of the pixels is contaminated.Generally, image restoration (reconstruction) approaches take on meaningful noise suppression schemes to recover the original image.A well-known image restoration model consists of minimizing a composite function [35], i.e. sum of a smooth function and a convex continuous but (often) nonsmooth function.Nonsmooth structure of the model makes the noise removal procedure challenging.So, smooth relaxations of the image restoration model have been devised in the literature that can be addressed by large scale optimization algorithms [111].As already mentioned, Babaie-Kafaki et al. [6,31], Lu et al. [79,80], Waziri et al. [103] and Branislav et al. [62] tackled the impulse noise removal of the images by the CG algorithms of the DL family.

Compressed sensing
As known, compressed sensing is a signal processing strategy for effectively acquiring and reconstructing signals in a sparse structure which makes it possible to get memoryless compact storage of the signals [33].Importance of the compressed sensing in real world applications such as machine learning, compressive imaging and radar, wireless sensors network, medical imaging, astrophysical signals and video coding has been referred in [7].
Also known as sparse recovery, compressed sensing principally deals with sparse solutions of an extremely underdetermined system  =  with  ∈ R × ( ≪ ) and  ∈ R  , for which it is mostly fashionable to address the following (composite) unconstrained optimization model: where  > 0 is the penalty parameter, embedded to balance the sparsity and reconstruction quality of the solution [46].The model (6.1) for the compressed sensing is called the basis pursuit denoising (BPD) problem which has been significantly analyzed in the literature.Technically, presence of the nonsmooth ℓ 1 penalty term in (6.1) makes the problem to some extent challenging.So, recognizing Nesterov's underpinning strategy [85] as a role model, Zhu et al. [128] proposed a relaxation of BPD which can be effectively solved by classic optimization tools.As already mentioned, Waziri et al. [103] developed two modified HZ methods for monotone nonlinear systems of equations and then, as a case study, investigated capacity of the methods for compressed sensing.Also, Aminifard and Babaie-Kafaki [6] investigated efficiency of some DL extensions of the hybrid CG parameter of [63] for solving the compressed sensing problem.

Conclusions
Since the turn of the century, one of the most outstanding conjugacy conditions has been proposed by Dai and Liao [40], being wellspring of broad-ranging deep studies.Actually, they laid foundation of a class of meaningful one-parameter conjugate gradient algorithms which contains the efficient CG DESCENT algorithm.Keeping on evolving to expose their potentials, the Dai-Liao methods are principally enriched by the quasi-Newton aspects.Although computational evidences are generally convincing, Dai-Liao algorithms technically face some major challenges such as generating uphill search directions and being vulnerable to improper settings of their parameter.To advance the algorithms both in theoretical and practical fields, researchers set out to improving the Dai-Liao method in several baselines.Here, we have classified such attempts to make them crystal clear and unveil their borders in a way that the readers can get an instant evaluation of the progress delivered on the Dai-Liao algorithms in different ramifications.
At the first step, because of the close merger between the Dai-Liao algorithms and the quasi-Newton aspects, we revealed how the modified secant equations have been attached to the Dai-Liao algorithms.However, the main part of our study has been devoted to review origins of the optimal values of the Dai-Liao parameter given in the literature.We also depicted how the scope of the Dai-Liao scheme has been widened to the other classical conjugate gradient parameters, yielding one-parameter extensions of the traditional parameters.Finally, we emphasized practical efficiency of the Dai-Liao algorithms when dealing with the practical issues such as signal and image processing models as the real world case studies.
This review is meaningfully capable to plant a seed for devising new modifications of the Dai-Liao algorithms.Future studies on the Dai-Liao methods may target global convergence under the nonmonotone line searches, or promoting the methods for the nonsmooth optimization problems.As a final note, we encourage the researchers to put a practical spin on their studies around the Dai-Liao algorithms in the sense of targeting state-of-the-art real world issues.Alongside the mentioned practical models, as examples, the nonnegative matrix factorization [71] and the Muskingum model [118] can be also addressed by the Dai-Liao algorithms.Moreover, the algorithms are capable to be used for designing the kernel methods [105] as well as the adaptive filtering techniques [36,66] which are significantly employed in signal processing and machine learning disciplines, such as support vector machine, support vector regression and extreme learning machine.