ON DERIVATIVE BASED BOUNDING FOR SIMPLICIAL BRANCH AND BOUND ∗

. Simplicial based Global Optimization branch and bound methods require tight bounds on the objective function value. Re-cently, a renewed interest appears on bound calculation based on Interval Arithmetic by Karhbet and Kearfott (2017) and on exploiting second derivative bounds by Mohand (2021). The investigated question here is how partial derivative ranges can be used to provide bounds of the objective function value over the simplex. Moreover, we provide theoretical properties of how this information can be used from a monotonicity perspective to reduce the search space in simplicial branch and bound.


Introduction
Deterministic global optimization based on the concept of branch and bound looks for the set of global minimum points up to a certain accuracy by rejecting .
* This work has been funded by grant RTI2018-095993-B-I00 from the Spanish Ministry.
regions where it is guaranteed that the global minimum cannot be found. An overview of the knowledge in the nineties is the handbook [5] of Reiner Horst and Herbert Tuy in 1990. It describes various bounding techniques and partition sets based on cones, boxes and simplices. The latter is rather natural in blending type of problems [2], where the feasible set is given as the standard simplex. The focus of the further study of Reiner Horst has been on the convergence of several subdivision methods. Meanwhile, Paulavičius andŽilinskas elaborated on practical implementations and provide a nice summary in the handbook [10]. Their focus is among others on the simplicial division of the box into n! simplices as introduced in the seventies by Mike Todd [11] and bounds mainly based on Lipschitz constants.
Recently, there appeared a renewed interest in different ways to provide tight bounds over simplices. In [6], Karhbet and Kearfott take the point of view from Interval Arithmetic, where the paper provides a nice illustration based on 7 instances. The same illustration has been used in a contribution by Ouanes Mohand to this journal [9], that studies different ways to use a bound on the second derivative to create objective function bounding over a simplex.
As we have been working on simplicial bounds for some time, our curiosity is how first derivative ranges can be used to create objective function bounds over the simplex and how the different ways compare to earlier results for the 7 instances. The rest of this paper is organised as follows. Section 2 describes the mathematical question and instances we are investigating. Section 3 describes several ways to derive bounds of a function over a simplex based on derivative ranges. Section 4 describes how monotonicity considerations may play a role for practical simplicial branch and bound algorithms. Section 5 describes the findings of our investigation.

Mathematical description of the question
One of the questions in simplicial branch and bound algorithms as outlined in [10] is how to obtain good bounds over the simplicial partition set. Let f : R n → R be a twice continuously differentiable function over a simplicial set. Let V = {v 0 , v 1 , . . . , v n } be a set of n + 1 affinely independent vertices of n−simplex ∆ = conv(V). Let the function range be denoted by [f * , f ] := [min x∈∆ f (x), max x∈∆ f (x)]. Then, the question posed in [6,9] is to find a sharp enclosure F ⊃ [f * , f ]. Considering the minimization of function f , our question will focus on finding a tight lower bound ϕ ≤ f * .
We also introduce the minimum enclosing box X = [x, x] ⊃ ∆, given componentwise by x i = min v∈V v i and x i = max v∈V v i . For the notation, we will use as much as possible the index i for the component and index j for an element of a set when necessary. The set of vertices of the box X are denoted by W, i.e.
and we use w for any vertex of X. Our specific question is what happens when the function values at the vertices are available, but also the gradient range defined by [∇f, ∇f ]. Karhbet and Kearfott [6] discuss some variants for the enclosure F taking derivative bounds over box X rather than simplex ∆. Our focus is on how this information can be used to provide a lower bound ϕ ≤ f * . We will describe various methods in Section 3 and illustrate them on the instances introduced in [6,9].

illustrative instances
In total 7 instances where introduced based on 3 different 2-dimensional functions and varying vertex sets.
, convex It is typical that the introduced polynomial functions in paper [6] which focuses on Interval Arithmetic have fractional coefficients which can be represented exactly in a computer. This means that function evaluations are also exact, when points (vertices) are evaluated with the same property. We can observe this effect when we consider the data in Table 1. Up to the last instance, all vertices are numbers represented exactly in a computer and likewise the minima over the simplices. We first determined analytically the derivative ranges over the enclosing box X. One can observe that apart from the first instance, all functions are monotonic over the corresponding simplex ∆ = conv(V) given in Table 1. This implies there is no interior minimum point. However, one cannot conclude from the enclosures of the gradients that the minimum is attained at a vertex; one can conclude that it can be found on the boundary, which for the 2D cases is an edge of the simplex, see [4]. In a practical branch and bound context, one can reduce the dimension and work with lower dimensional simplices as partition sets. Specifically for all instances, the minimum f * is attained at a vertex. We will discuss the theoretical background for these observations in Section 4.
As mentioned in [6], the gradient range is of course tighter over the simplex ∆ than over the enclosing box X ⊃ ∆. In Table 1, the ranges over the simplex are given, computed in an analytical way.  [6] with the corresponding minimum function value f * and the gradient range over the enclosing box X and simplex ∆.

1-norm Lipschitz relaxation
In [10] Chapter 2, an overview is given of the use of lower bounds based on the so-called Lipschitz constant. In the context of differentiable functions, the tightest Lipschitz constant over the simplex is given by for different norms represented by the parameter p. From any vertex v ∈ V, we have the relaxation where p p−1 gives the dual norm of p, e.g.
A lower bound of f over ∆ can be found by solving As (3) enhances the minimization of the maximum over concave functions, this implies in general solving a global optimization problem. Instead of explicitly solving problem (3), one often takes as lower bound the best relaxation over all the vertices, e.g. [3]. Written in terms of 1-norm distance [10], this is This is one of the bounds used in [10] to derive simplicial function bounds. Notice that for the 7 instances, the expression in (1) to be maximized is a convex function, such that L ∆∞ can be found by evaluating the infinite norm of the gradient at the vertices. The corresponding lower bound ϕ L is given in Table 2 as decimal number for the 7 instances. For a box shaped partition set as X, Meewella and Mayne in [7] use the 1-norm distance, where (3) becomes an LP problem. Notice that the Lipschitz constant should then also be taken over X: We add to the lower bounding LP of [7] the restriction of optimizing over the simplex ∆. To capture the 1-norm distance from a box vertex without using the absolute value function, we introduce the vector I w for all vertices w ∈ W: Using the function values f (w) on the vertices w ∈ W and the Lipschitz constant L X∞ over the box (5), the lower bound is given by The obtained lower bounds for the 7 instances are given in Table 2. Notice that the determination of the lower bound ϕ M requires solving a small LP problem which may imply calling an external routine. The computation of lower bound ϕ L Table 2. The minimum f * over ∆ rounded to a decimal number, ∞−Lipschitz constant over the enclosing box X and simplex ∆ and lower bounds ϕ L based on (4), ϕ M based on (6). will be less time consuming, although the efficiency depends on the implementation of course. One can observe that both lower bounds are far from the minimum and that in all cases ϕ L is worse than the more sophisticated ϕ M , i.e. ϕ L < ϕ M . This tendency is general, but it is not a mathematical truth. It is possible to design degenerate cases (see Example 1) where ϕ L > ϕ M , which requires the gradient range over the simplex ∆ to be much smaller in absolute value than the gradient range over the box X.

Example 1.
Consider the function f (x) = x 2 1 x 2 2 over the simplex defined by vertex set {(−1, −1) T , (2, 0) T , (0, 2) T }. The minimum of 0 is attained at a minimizer set with infinitely many elements, where x 1 = 0 or x 2 = 0. The bounds for this instance are given in Table 3. For this specific case, we have that ϕ L = 1 − 2.37 × 4 = −8.48. Solving the LP (6) with the data of this instance provides ϕ M = −39.5. What we observed in the Lipschitz way of thinking is that the same global derivative bound is taken in all directions. Another way of thinking is to focus on the directional derivative and to use the gradient range explicitly over the enclosing box as suggested in [8] and apply this over the simplex. Given any base point b ∈ X, a valid lower bound is given by the Taylor view The difficulty here is that given the gradient range [∇f, ∇f ], the lower bound of the directional derivative in direction x − b, is not very tight. The complexity of the overestimation becomes less if we know that given base point b and variable x the term x i − b i is always positive or always negative. This is the case, if we consider the box vertices w ∈ W as base points like in [7] and [8]. To extend the idea for a simplicial lower bound, we again introduce directions of interest for each box vertex. Let vectors G w be defined by Using a linear relaxation from each box vertex and extending towards the simplex, we obtain the following LP: The underestimation is illustrated in Figure 1 by a simplex around the minimum point of f 3 along with the contours of underestimating function z( The reached lower bounds for the 7 instances are reported in Table 4. As one can observe, they are always tighter than the Lipschitzian lower bounds. This is not a coincidence.

Proposition 1.
Consider an instance of f * := min x∈∆ f (x) with enclosing box X ⊃ ∆ and derivative range [∇f, ∇f ] X . The lower bounds ϕ M according to (6) and ϕ F according to (9) Proof. Since both bounds are based on the maximum of linear lower bounding functions, ϕ M and ϕ F are valid lower bounds. Given that L X∞ ≥ max{∇ i f, ∇ i f } implies that −L X∞ I iw ≥ G iw for all w ∈ W, i = 1, . . . , n. Thus, ∀x ∈ X, 14 as red square. At the right, contours of lower bounding function z of (9) .
Given the maximum is taken over w ∈ W using the same function evaluations f (w) in both underestimations, we have that ϕ F ≥ ϕ M .
In [6], several variants are elaborated around what is called the center form, or mean value form. In this concept, one considers for base point b the center or midpoint m = 1 2 (x + x) of the enclosing box in (7). As the corresponding lower bounding function z(x) = f (m) + (x − m) T ∇f is concave over the enclosing box, its minimum over the simplex is attained in one of the vertices defining a lower bound Notice that using such a lower bound implies evaluating f at the midpoint of an enclosing box and not at its vertices. Due to the concave shape of the underestimating function, such lower bound may work well for instances where f is concave over the simplex in contrast to the convex shaped underestimating function in (9). The distance of the base point to the farthest vertex is smaller and consequently the underestimation may be less when using the centroid c := 1 n+1 n j=0 v j . This provides lower bound Table 4 shows the evaluation of the center form lower bounds ϕ m and ϕ c using the directional derivative lower bound (8). The corresponding bounds are not tight for the 7 instances, as the instances are mostly convex and one can observe that ϕ m and ϕ c are less tight than ϕ F . Also here, one can construct an instance where ϕ m > ϕ F , although this is not the general tendency. concave and has a minimum of f * = −1 at the two vertices. Now we have that ϕ m = 0 − 1 × 1 = −1 = f * and ϕ F = −2, such that ϕ m > ϕ F . Notice that this function f fits exactly the underestimating function (7), such that the lower bound equals the minimum. A similar phenomenon can be observed in [9], where a careful quadratic lower bounding function is constructed. For instance 7, it even coincides with f 3 , such that the lower bound ϕ = f * for this case. Notice that there are some numerical errors, as several reported bounds ϕ in [9] are even ϕ > f * , which is impossible.
We compare all the discussed lower bounds with the best results from [6,9] in Table 4. For the used instances, the lower bounds reached by exploiting the derivative range and function evaluations in vertices or midpoints of partition sets are in general tighter than the ones reported in [6], which use less information. For instance 1, [6] shows that a naive interval enclosure provides a tighter bound than using other information. We observe the same in Table 4. Using more information on second derivatives may lead to even sharper bounds as sketched in [9], although also most lower bounds based on derivative ranges reported in Table 4 are tighter. Table 4. The minimum f * over ∆ rounded to a decimal number, the best lower bound ϕ K found in [6] and the value ϕ O found in [9] using second order information. Lower bounds ϕ L based on (4), ϕ M based on (6) and ϕ F based on (9), ϕ m according to (10) and ϕ c based on (11). In grey, the sharpest lower bounds are emphasized. The instances used all exhibit a form of monotonicity. When the simplices represent a partition set, then usually lower bounding is not necessary, because the monotonicity information can be used to either eliminate the partition set or reduce it to lower dimensional faces. We discuss some theoretical results in Section 4 and illustrate them with the provided instances.

Monotonicity
In case we have monotonicity over a simplicial partition set, we may consider a dimension reduction towards a facet of the simplex which is contained in a face of the feasible area. Dimension reduction is of interest, because it tightens bounds on the function value and its derivatives. We first introduce some notation in Section 4.1. Then Section 4.2 discusses theoretical results on monotonicity questions and Section 4.3 on the directional derivatives. An illustration with the instances is provided in Section 4.4.

Consider box constrained domain
We first discuss the concepts of faces, facets and relative interior for this domain. Consider one of the 3 n −1 faces b k of box D. A face is determined by the assignment of the variables (component-wise) represented by index i ∈ {1, . . . , n} to one of three sets L k , U k and J k defined in the following way.
• Indices i ∈ L k represent variables x i on the lower bound x i = x i . • Variable coordinate i ∈ U k means it takes the upper bound x i = x i . • The free coordinates are given by The relative dimension (degree of freedom) of box face b k is m k = |J k |. The relative interior of b k is given by A facet b of face b k is a face that has one degree of freedom less; typically it is defined by adding one of the free variables i to either L k or U k . We will denote the set of facets of face b k by F(b k ). A global minimum point can be in the interior of D, in one of the vertices of D or in a relative interior of one of its other faces. The challenge for boundary solutions is that beforehand, we do not know in which (relative interior of) faces the minimum points can be found.
We consider a simplicial partition P of D, i.e. ∪ ∆∈P ∆ = D and the intersection of two simplices can only consist of a face of both simplices; ∆, Ξ ∈ P implies Ξ ∩ ∆ = ∂∆ ∩ ∂Ξ. The analysis becomes more complicated when we consider simplices as subsets which are not full dimensional, i.e. they can be m-simplices with m < n. Moreover, we focus on a further dimension reduction towards facets of the subset ∆.
A facet F = conv(V \ {u}) of m-simplex ∆ = conv(V) is given by removing a vertex u from vertex set V. The relative interior of a simplex ∆ is given by The relative boundary of simplex ∆ is ∂∆ := ∆ \ rint(∆). Notice that the relative boundary consists of the set of facets F(∆), that is ∂∆ = ∪ F ∈F (∆) F , |F(∆)| = |V|.

Monotonicity properties
A simplicial branch and bound on the box D keeps a list Λ of simplices, where the optimum can still be located; min x∈D f (x) = min ∆∈Λ min x∈∆ f (x). The idea is to refine partition sets ∆ and to eliminate ∆ if it can be shown that it cannot contain a global minimum point. So far, this paper focused on lower bounding to eliminate partition sets. Our question here is how derivative range [∇f, ∇f ] ∆ can be used to help the refinement process. The first observation is close to the concept of a monotonicity test.
Then rint(∆) does not contain a minimum point.
Proof. A necessary condition for a minimum point x * in the relative interior of face b k is that The condition of the proposition implies that i.e. in each point x ∈ rint(∆) there exists an improving direction in component i such that x cannot be a minimum point.
Basically, for a practical use, this means we can remove the relative interior of ∆ and refine by storing its facets in Λ. However, if facets F ∈ F(∆) are in the relative interior of b k we are not interested in keeping them either.
then F can be removed from consideration in branch and bound list Λ. Consequently, if ∀F ∈ F(∆), b ∈ F(b k ), F ⊂ b then ∆ can be removed from consideration in branch and bound list Λ.
The consequence for a practical refinement is that if f appears monotonic in ∆ in the sense of Proposition 2, then we only have to save those facets in F(∆) that are included in a facet of b k , i.e. ∃b ∈ F(b k ) such that F ∈ b . It also means that we can eliminate all of ∆ in some circumstances. This means that ∆ may have a local optimum on its boundary, but this boundary is also included in other partition sets and can therefore be removed. On the other hand, this says that we still should keep facets F that are in a facet b of b k .
There is an interesting case where it can be shown that the relative interior of F cannot contain a minimum point. For this we introduce the so-called normal vector p with respect to facet F . Let ∆ = conv(V), w ∈ V, W = V \ {w}, facet F = conv(W). Consider u ∈ W and construct a set of vectors Y := {v − u, v ∈ W} \ {0} with corresponding matrix representation Y . The normal vector of F in the direction of w can be expressed by defining (see e.g. [1]) Vector p points into the same half-space where ∆ is located. From each point in rint(F ) direction p points into rint(∆). Proof. For any point y ∈ rint(F ) we have that coordinate unit vector e i is a feasible direction according to e T i p = p i > 0 in which we know that the function value goes down.
For the ease of reasoning, Proposition 3 is posed in an increasing partial derivative way. The same applies of course the other way around.
Corollary 2. Consider the same conditions of Proposition 3, but having ∃i ∈ J k , (∇ i f ) ∆ > 0. If p i < 0, then rint(F ) cannot contain a minimum point of min x∈∆ f (x).

Directional derivatives
A bound on the directional derivative as in (8) can be used as follows.
. So, minimum point x either does not exist, or z is also minimum point of min x∈∆ f (x) and located on facet F . This proposition implies an extension of Corollary 1. We can do a dimension reduction from ∆ to F , or alternatively remove ∆ from the list. Corollary 3. Let ∆ := conv(V) ⊂ b k with |V| = m k + 1 be a partition set in list Λ and u ∈ V and ∃b ∈ F(b k ) such that facet F := conv(V \ {u}) ∈ b . If ∃y ∈ F such that ∀x ∈ ∆ the directional derivative (u − y) T ∇f (x) ≥ 0, then ∆ can be replaced in Λ by F . Corollary 4. Let ∆ := conv(V) ⊂ b k with |V| = m k + 1 be a partition set in list Λ and u ∈ V and b ∈ F(b k ) such that facet F := conv(V \ {u}) ∈ b . If ∃y ∈ F such that ∀x ∈ ∆ the directional derivative (u − y) T ∇f (x) T ≥ 0, then ∆ can be removed from Λ.
A special case of the lowest dimensional faces with a relative interior is an edge.
The use of this theoretical result to generate a test is not easy when only the partial derivative ranges [∇ i f, ∇ i f ] ∆ are available, as (8) is not very tight. Focusing on the direction u − y with y = m j=1 λ j v j , we can demonstrate there is a (maximum) positive directional derivative solving the LP and checking the result is positive. Notice that for j ∈ L k ∪ U k , we have that u j = v ij , i.e. the coordinates have the same values and we do not have to take the corresponding partial derivative into account.

Monotonicity on the illustrative instances
For the analysis of simplicial partition sets over a box constrained region, it is good to realize that all initial partition sets in the subdivision introduced by [11] have facets in faces of the box. In future implementations, we will make use of that. Here we go step-wise through the consequences of the theoretical results for the 7 instances. A minimum can be attained at a relative interior point of a face, or at a vertex. Notice that all instances attain their minimum at a vertex.
Consider that in fact [∇f, ∇f ] ∆ in Table 1 shows us that all 7 instances have boundary optima, given that ∇ 2 f ≥ 0. For instance 1, this means that it could Table 5. Monotonicity analysis of the 7 instances. Faces left over expressed as vertex set after using theoretical results.

Inst.
Proposition 3 Proposition 4 Corollary 5 contain an interior optimum point, but that the optimum is also attained on the boundary. Proposition 2 shows that in fact for all instances, ∆ can be replaced by its facets in reduced dimension in the list of subsets Λ before considering the lower bound over ∆. The next step is to consider what Proposition 3, or in our case Corollary 2 concludes about which faces we do not have to consider. Focusing on the normal vector (14), it can be shown that the (normalized) normal vectors p among instances 1-3 and among instances 4-6 are the same. A sharp test may conclude that for instance 1, we do not have to consider rint(conv({v 0 , v 2 })) and for instances 2-6 rint(conv({v 0 , v 2 }) and rint(conv({v 1 , v 2 }) can be left out of consideration. Due to the numbering of vertices, this is rint(conv({v 0 , v 1 }) and rint(conv({v 1 , v 2 }) for instance 7.
The last step is to exploit Proposition 4 on the directional derivative. For instances 1-6 the focus on u = v 2 provides the insight, that the minimum is attained at facet conv({v 0 , v 1 }). For instances 1-3 one can take y = v 1 and for instances 4-6 consider y = v 0 . For instance 7, the facet with a minimum is conv({v 0 , v 2 }), so u = v 1 and one can consider y = v 0 .
Continuation of this monotonicity analysis using Corollary 5 is only successful for instance 7 to indicate the optimal vertex v 0 . For the other instances, the underestimation of the directional derivative using (8) over the edge does not provide a proof of an increasing direction over the edge.

Conclusions and discussion
The bounding in simplicial branch and bound has gotten a renewed attention recently. Our contribution focuses on the question how the knowledge of the derivative range of either the simplicial partition set or the enclosing box can be used to derive bounds and to perform a monotonicity analysis. We presented several theoretical results and used an earlier set of 7 instances introduced in literature to illustrate them.
From the analysis we learned that bounds are in general sharper when more detailed information is used; in our case gradient ranges are more detailed than a Lipschitz constant. Bounds are in general tighter for the 7 instances than the ones published before in literature. We also remark that all instances have a certain monotonicity. We show that the simplicial sets can be reduced in dimension using the introduced theoretical monotonicity results.
Bounds based on solving an LP in general require more computational time for a partition set. In our experience, sharper bounds pay off in a branch and bound run, as one avoids the management of larger trees. In future work, we use the derived theory for tests in a simplicial branch and bound over a large set of numerical instances. However, it is known that the efficiency of various bound and monotonicity calculations depends on the order in which tests are done and also on the phase the algorithm is in; the beginning has a global search character, whereas in the end when the algorithm approaches the minimum points, that algorithm behaves in another way. Such an experimental investigation, requires a systematic design of experiments to evaluate the bound and monotonicity calculations introduced here.