ON THE DERIVATIVE-FREE QUASI-NEWTON-TYPE ALGORITHM FOR SEPARABLE SYSTEMS OF NONLINEAR EQUATIONS

A derivative-free quasi-Newton-type algorithm in which its search direction is a product of a positive definite diagonal matrix and a residual vector is presented. The algorithm is simple to implement and has the ability to solve large-scale nonlinear systems of equations with separable functions. The diagonal matrix is simply obtained in a quasi-Newton manner at each iteration. Under some suitable conditions, the global and R-linear convergence result of the algorithm are presented. Numerical test on some benchmark separable nonlinear equations problems reveal the robustness and efficiency of the algorithm. Mathematics Subject Classification. 65K05, 65H10, 90C30, 90C53. Received December 27, 2020. Accepted October 9, 2021.


Introduction
Consider the problem of finding a solution of nonlinear system of equations where g = (g 1 , g 2 , . . . g n ) : R n → R n is a separable function. The separability here means each of the component g i depends on only one or a few components of the vector x. This structure has been studied and regarded as partial separability by Griewank and Toint in [6][7][8].
Problem (1) may arise from an unconstrained optimization problem, for example, let f (x) = n ∑ i=1 g i (x). Then the nonlinear system of equations problem (1) is equivalent to the unconstrained optimization problem min f (x), x ∈ R n .
For finding the solution of general nonlinear equations, quasi-Newton methods are famous and commonly used algorithms because of their derivative-free nature [17,21]. However, among these methods, some are not suitable for large-scale problems due to matrix storage requirements. As such, methods that considered nonlinear equations with structured functions are given much attention. Nevertheless, several matrix-free alternatives are given over the last decade (see for example [14,18,20,23,25]). The spectral gradient method initially introduced by Barzilai and Borwein has been successfully used as a derivative-free approach for solving large-scale nonlinear equations by La Cruz-Martínez-Raydan in [11,13]. Specifically, La Cruz et al. [11] presented a derivative-free spectral residual method (dfsane) for solving large-scale nonlinear equations. The algorithm uses a scalar multiple of identity for estimating the Jacobian of the function g. Moreover, some algorithms that uses a diagonal matrix to approximate the Jacobian of the residual function g have been studied in the literature. For details, interested reader may refer to the following references [5,10,14,24,26,27].
In this paper, we incorporate the diagonal Hessian approximation approach studied by Deng and Wan [2] and the spectral residual approach presented in [11] to propose, analyze and implement a derivative-free algorithm for separable problems, which can be seen as an improved version of the dfsane algorithm that used a positive definite diagonal matrix as the approximation of the Jacobian of the function g. A derivative-free line search is employed to analyze the convergence of the proposed algorithm.
The paper is organized as follows. Section 2 describes some preliminaries and the algorithm. Section 3 addresses the global convergence and rate of convergence results of the algorithm. Section 4 presents the numerical experiments, and conclusions are given in Sect. 5. Unless otherwise stated, throughout this paper we denote u i k to refer to the ith component of a vector u k . Also, · stands for the Euclidean norm of vectors and the induced 2-norm of matrices.

Preliminaries and Algorithm
In this section, we present the derivative-free quasi-Newton-type algorithm. We begin by briefly reviewing the conference paper by Deng and Wan [2].
Based on the idea of Shi and Sun in [22], Deng and Wan presented a spectral conjugate gradient method for solving unconstrained optimization problem (2), in which the spectral parameter is a specific diagonal matrix chosen such that it owns some quasi-Newton property. They considered a diagonal matrix Q k = diag(q 1 k , q 2 k , . . . , q n k ), and solved the following constrained optimization problem where L k and U k are given lower and upper bounds for q i k such that 0 < L k ≤ q i k ≤ U k , and so Q k is a safely positive definite matrix. The solution of the problem (3) is given by where L k = c 1 g k , U k = c 1 g k + c 2 and c 1 , c 2 > 0. Unfortunately, the authors in [2] do not present numerical implementation of the method. Next, to build our propose algorithm, we begin by assembling the diagonal matrix similar to the one proposed by Deng and Wan. The difference between the former and later is on the safeguard that ensure positive definiteness of the diagonal matrix. To construct the diagonal matrix of the proposed algorithm, we make use of the following Lemma (Lemma 1 in [19]). Lemma 2.1 Let D = diag(d) be a diagonal matrix in R n×n , and let u and v be vectors in R n . Then, the solution of the constrained linear least-squares problem with simple bounds min d∈R n is given by Based on the results of Lemma 2.1, the resulting diagonal matrix is positive semi-definite. However, to obtain a descent direction that will be used with a suitable line search technique, we define a positive definite diagonal matrix D k (k ≥ 1) with entries where . The search direction of the diagonal derivativefree method is obtained as a solution of the linear system: where is a diagonal matrix, whose entries are computed using Equation (5). Furthermore, we safeguard D k for very small and very large values by means of a projection of its entries into a given scalar interval [d, d] such that 0 < d < 1 and d ≥ 1. Hence, the i-th entry of the matrix D k is It can be seen from equation (8), the sequence {d i k } is uniformly bounded for each i and k. In fact, Consequently, D k is invertible for each k ≥ 0. In contrast to the diagonal matrix proposed by Den and Wan [2], the safeguard procedure here is simple, as it set the nonpositive entries of the generated diagonal matrix to a nonnegative parameter d, and the undefined entries to 1. Thus, at a certain iterate where some of the entries of the diagonal matrix becomes undefined, the entries are set to 1. Unlike Deng and Wan proposed diagonal matrix, where the undefined entries are set to the average of the lower and upper bounds L k and U k . The detail steps of the derivative-free quasi-Newton-type approach is given below.
Step 1 : Step 3 : Let α k = ρ j , where j is the least non-negative integer satisfying Compute s k = α k p k , x k+1 = x k + s k , and y k = g(x k+1 ) − g(x k ).
Step 4 : Set k = k + 1 and go to Step 1.

Remark 2.2
Since the matrix D k is diagonal, the product at Step 2 of Algorithm 1 when k = 0 is simply the product between the diagonal elements of D k and the corresponding components of g(x k ), computed in O(n) operations.

Remark 2.3 By the definition of the search direction in
Step 2, it can be deduce easily that,

Remark 2.4
The line search condition (9) has some similarity to the one used in [29]. The right hand side of the current line search in (9) has an additional term, a positive sequence that guaranty the well-definedness of the inequality. In fact, for sufficiently large k, the inequality (9) holds as the stepsize α k → 0 + . Thus, α k can be obtained by some backtracking approach such as Step 3 of Algorithm 1.

Convergence Results
In this section, we prove the global and R-linear convergence of Algorithm 1. First we assume that g(x k ) = 0 for any k ≥ 0 except at the solution. Furthermore, we assume the following: ii. The Jacobian J of g at x, denoted by J(x), is bounded and uniformly nonsingular on Θ, i.e., there exist nonnegative scalars ε 1 , ε 2 such that iii. The Jacobian J is Lipschitz continuous with Lipschitz constant γ on Θ. That is, Assumption 1 implies that there is constants M ≥ m > 0 such that

Assumption 2
The diagonal matrix D k approximate the Jacobian matrix J of the function g at x k along the direction p k , therefore, D k can be regarded as a good approximation of J(x k ). That is, where r ∈ (0, 1) is a very small constant.
Lemma 3.1 Let the sequence {x k } be generated by Algorithm 1, then for all k ≥ 0 On the other hand, Combining (16) and (17), we obtain (13).
The following Lemma is from [3]. then the sequence {a k } has a limit in R.

Lemma 3.3
Let {x k } be the sequence generated by Algorithm 1, then we have Proof (a) Setting a k = g(x k ) 2 and e k = k in Lemma 3.2, we have Since g(x k ) 2 ≥ 0 and (9), we can get for any k Summing both sides of (18) yields k is convergent and δ is a positive constant, it follows that Hence, lim Lemma 3.4 Suppose Assumptions 1 and 2 hold. Let {x k } be a sequence of iterates generated by Algorithm 1. Then for all sufficiently large k.
We now present the R-liner convergence of Algorithm 1.
Theorem 3.6 Suppose Assumption 1 holds. If the sequence {x k } generated by Algorithm 1 converges to x * , then for sufficiently large k, there exist constants C > 0 and µ ∈ (0, 1) such that Proof From the line search condition (9), it follows that where the second and third inequalities follow from (13) and (24) respectively. Since k → 0, without loss of generality, we assume that k ≤ δᾱ 2 1 2d for all k so that Inequality (26) and inductive process yields where µ = 1 − δᾱ 2 1 2d < 1. Using (11) together with (27) we have Thus, (25) holds with C = g(x 0 ) m . This means that Algorithm 1 converges R-linearly.

Numerical Experiments
In this section we report the results obtained with a preliminary MATLAB implementation of the proposed algorithm on the solution of some selected test problems. The set of the problems is made of ten almost separable nonlinear equations and can be found in the Appendix A. The detailed numerical results of this section can be found in Appendix B. Computations were carried out on an 8.00GB RAM Intel Core i7 personal computer at 2.30GHz. A failure is reported (denoted by 'F'), if the number of iterations is greater than 1000. We used five different dimension with ten different initial points as follows: • dimensions: n = 1000, 5000, 10000, 50000, 100000.
In Tables 2-11 of Appendix B, we reported the number of iterations (#iter), the number of function evaluations (# f val), the CPU time in seconds (time) and the norm of the residual at the termination point (Fnorm), for all the ten tested problems. In Table 2, dfnwt has the least #iter and # f val in all the problems. However, there was a tie between dfnwt and dfsane in Table 3, 4-6, 8 and 9 except for the the initial point x 9 0 and some some few cases in Table 8 and 9. In Table 7, dfsane has the best performance in terms of #iter and # f val except for some few cases where dfnwt performs better. The algorithm dfnwt has recorded 17 failures in Table 10, however, it outperforms dfsane and msgp algorithms in the remaining cases. Lastly, in Table 11, unlike dfsane and msgp, dfnwt managed to solve almost all the problems. However, for the few cases where msgp solved a problem, it has the least #iter and # f val. In addition, the summary of Table  2-11 is reported in Table 1. To visualize the numerical behaviour of the algorithms, we plotted three figures using the popular Dolan and Moré [4] performance profile based on the #iter, # f val and CPU time metrics. In Fig. 1, we compare the performance of the dfnwt algorithm with the dfsane algorithm and the msgp algorithm with respect to #iter metric. Fig. 1 shows that dfnwt performs better than msgp and dfsane having almost 70% success. In Fig. 2, the performance of the three algorithms was tested based on # f val metric. The figure shows that dfnwt performs better than msgp and dfsane having over 70% success. Fig. 3 shows that dfsane is faster than dfnwt and msgp for the fraction of τ ≤ 4. However, for τ > 4, dfnwt is faster than msgp and dfsane. Based on the performed experiments, we observe that, the good performance of the dfnwt algorithm may be due to the diagonal approximation of the Jacobian matrix associated with the search direction. Similar argument applies to the msgp algorithm.

Conclusions
We have presented, analyzed, and implemented a derivative-free quasi-Newton-type algorithm for solving nonlinear systems of equations with separable functions (dfnwt). Different from the existing algorithms such as dfsane algorithm that approximate the Jacobian of g using a scalar multiple of identity at each iteration, the proposed dfnwt algorithm uses a diagonal matrix in a quasi-Newton manner for such approximations. Among the attractive feature of the presented algorithm is that it does not require gradient or approximation of the gradient for its implementation, this makes it more suitable for large-scale separable problems. Furthermore, the global and R-linear convergence of the sequence generated by dfnwt algorithm is obtained. Based on the numerical results presented, the proposed dfnwt compete with the well-known and efficient algorithm for solving nonlinear equations, that is, dfsane. This good efficiency of the dfnwt algorithm is due to the additional information obtained from the diagonal matrix used for the approximation of the Jacobian of the problems. Investigation on the better approximation that exploits the structure of the problem and extensive numerical experiments that will unveil the effectiveness of the approach will be an interesting topic for future research.

Appendix A: List of test problems
We listed below the details of the test problems used in Section 4 where g = (g 1 , g 2 , ..., g n ) T .