INVENTORY MANAGEMENT THROUGH ONE STEP AHEAD OPTIMAL CONTROL BASED ON LINEAR PROGRAMMING

. A one step ahead optimal strategy is proposed for the inventory control and management problem, and rewritten as a linear programming problem, permitting practical implementation. Important novel aspects of the proposed solution are that it uses economic value added (EVA), a comprehensive performance index commonly used in business management, instead of regulation to a set point or to a interval of stock values; it does not require knowledge or prediction of the demand distribution; it achieves good efficiency with respect to a globally optimal value, defined in this paper, and no significant bullwhip effect, while being robust to demand and lead time variations. The proposed one step ahead optimal controller is compared with the classical ( 𝑠, 𝑆 ) controller, as well as with a representative of the inventory and order-based production controller family. In order to make a fair comparison, this paper also proposes a tuning method for the latter two controllers. Numerical experiments based on average performance of the three controllers for a set of normally distributed demands show the superiority of the proposed one step ahead optimal controller, in terms of EVA as well as in terms of other measures proposed in the paper


Introduction
This paper considers a model of a single divisible good inventory management problem with an economic value added (EVA) performance index, inspired by a similar model presented in Page 179-187 of [14].A one step ahead optimal (OSAO) strategy, for the EVA index, is proposed for this problem and it is shown that it can be rewritten as a linear programming problem, which permits practical implementation.The interesting novel feature of the proposed OSAO strategy is that it does not need knowledge of the demand distribution and neither does it require demand forecasting, which is usually the case [8,27,29].The related problem of finding the optimal ordering sequence, that maximizes EVA, under the hypothesis that the entire demand is known over a given simulation horizon, can also be recast as a linear programming problem.The solution to this hypothetical optimal control problem in which the demand is known for all time is referred to in this paper as the omniscient solution and its main purpose is to serve as a benchmark.It should, of course, be kept in mind that the benchmark value of optimal omniscient performance index is unattainable by any implementable controller, since the omniscient solution is computed under the assumption that future demand is known.Thus the value of the optimal omniscient index should be thought of as an upper bound or benchmark for the value of the performance index attainable by any implementable control policy.In addition, the efficiency and robustness of the proposed one step ahead optimal controller will be compared to commonly employed control techniques in supply chain management (SCM), namely the (, ) policy introduced by Arrow et al. [1] and a generalized version of the APIOBPCS controller presented in [29], which is chosen as a representative of the class of Inventory and Order-based Production Control Systems (IOBPCS) introduced by Simon et al. [21].We use a control engineering definition of robustness."A system is robust when the system has acceptable changes in performance due to model or parameter changes and moderate modeling errors [6].So, a robust supply chain should be designed to function properly even in the presence of uncertain parameters (for instance, the lead-time)" [22].There is a vast literature on inventory control and management and a review is given in the following section, focusing only on the papers judged to be most relevant, in the interests of brevity.This paper also presents a novel methodology to make a fair comparison of robustness between the proposed controller, which is parameter-free, with classical controllers, which depend on certain tunable parameters.This methodology is inspired by the training set-testing set paradigm common in machine learning.The training set may be composed of recorded past demands or, if these are unavailable, of randomly generated demands from some hypothesized distribution (usually normal) and serves to optimize the tunable parameters of the classical controllers at some central point of a grid in the mean-standard deviation demand space.Performance of all controllers is then compared at grid points within a neighborhood of the central point, and the resulting surface portrays the robustness of each controller.The parameter tuning problem is nonconvex, thus the optimization involves using black box or pattern search methods.In a different context, the Simultaneous Perturbation Stochastic Algorithm (SPSA) is used to optimize model predictive controllers in [19] and a Particle Swarm Optimization (PSO) algorithm is used to tune the parameters of an inventory controller in [27].

Literature review
Ivanov et al. [9] identify one of the main contributions of control theory as being the application of dynamic feedback control to production-inventory systems and point out that a wide range of control-theoretic tools have been used in this context, ranging from the classical PID control to model predictive control.The paper by Lin et al. [12] also surveys control-theoretic approaches to the inventory control problem.
An early idea for inventory control, that is still commonly used, was proposed in [1] and, roughly speaking, consists of monitoring the inventory level and replenishing the stock, whenever it falls below a certain level.Since this control, usually referred to as (, ) control, uses the stock or inventory level itself, it can be thought of as feedback control, based on the threshold levels  and , as explained in Section 4.2.
Most control-theoretic methods are based on linear time-invariant mathematical models of a single echelon supply chain, using either transfer function or state space methods.This permits the use of classical ideas such as set point control, in which targets are set for certain variables, such as inventory level and work in progress.Many of these methods can be interpreted as parametrized versions of standard algorithms from the SCM literature, such as order-up-to-inventory control.The acronym IOBPCS, which stands for inventory and order based production control system identifies a popular class of such controllers, with subclasses denoted by prefixes, such as AP (for automatic pipeline) and MP (for matched parameter).A major focus of papers using this paradigm is to study the effect of the controller on the so called bullwhip effect (= amplification of system variables due to demand uncertainties) and on stability with regard to step changes in the demand: see, for example, [8,11,13] and the references therein.
Another approach that has been proposed more recently is based on model predictive control (MPC), or, if a performance index based on economic objective is specified, the terminology becomes economic MPC.A model of the plant is used, together with available demand data, to predict future demand and use the predictions to find an ordering sequence that optimizes the specified performance index.Finally, one or more of the computed optimal controls is applied, before a new predict-optimize-control sequence is embarked upon.The reader is referred to [17,26,31] and the references therein, for examples of MPC and economic MPC.This paper uses an approach that is related to the MPC approach; however, it uses only past values of the demand and system state, does not use a predictor, and uses a stage cost that is economic, which is the so called Economic Value Added (EVA) performance index, commonly used in the business and financial world [14].A slight extension of the greedy control idea [10,18] is necessary for the inventory control problem on account of the lead time or delay present in the system.The proposed OSAOC (One Step Ahead Optimal Control) approach is to optimize EVA using past demands, for just one step ahead of the current time, apply this optimal control to the system, update its state and repeat the cycle of one step ahead optimization of EVA, thus generating a real-time feedback control.The main differences between the approach of this paper and existing results are as follows: (i) an economic stage cost function is used, similar to the one in [26], but also including discounting of the profit to its present value and not including any target or tracking costs, (ii) using only past demands to compute the present value of the control, in contrast with [17,31] and other papers using the MPC approach which use more elaborate demand predictors, (iii) a study of the robustness of the proposed controller, with respect to uncertainty in demand as well as lead time uncertainty.This is another novel contribution of this paper, since very few papers on this topic discuss the question of robustness of control to uncertainties [28,32]; in particular, time-varying lead time uncertainty.The result is a control that is implementable in real time, computationally cheap and, as will be shown, efficient when compared with the globally optimal omniscient solution, which will be defined in the sequel.Comparisons are also made with a classical (, ) controller and with a performance index, different from EVA, that is also used in the control literature, and are favorable to the proposed one step ahead controller.
This paper is organized as follows.Section 2 introduces the basic notation, the discrete-time dynamical system model of the single echelon supply chain used as the main example in the paper and the economic value added as the objective function, to be maximized, showing how to formulate the omniscient optimal control problem as a linear programming problem.Section 3 shows how to transit from the omniscient optimal control problem to an OSAOC problem, also linear.Section 4 makes a brief introduction of the APIOBPCS controller and the (, ) policy.Section 5 starts with a simple example of a demand with two step changes in order to exhibit the differences between the omniscient and one step ahead controllers, as well as the classical APIOBPCS and (, ) controllers.A data driven optimization based methodology that uses training and test sets of normally distributed demands in order to tune the APIOPBCS and (, ) controllers is proposed and is another novel contribution of this paper.The tuned APIOPBCS and (, ) controllers are then compared with the proposed OSAO controller.Section 6 showcases the greater robustness of the OSAO controller to demand uncertainty while Section 7 expands the single echelon supply chain model to consider variable lead time and also highlights the greater robustness of the OSAO controller to delay uncertainty.Section 8 contains concluding remarks and suggestions for future work.

A single echelon supply chain model
This section presents a single echelon supply chain (SESC) model in order to define the inventory control problem precisely.The model uses the notation in [4] and is based on [14], which refers to the SESC as a store supply chain.However, the model presented in this section is quite general and represents any single echelon supply chain.This model essentially refers to a single source shipping a single divisible good, with a certain delay or lead time, to some location, which shall be referred to generically as a warehouse, in which there is a stock (inventory).Sales of this good, in response to a demand, are assumed to occur at the warehouse.
A single echelon supply chain thus has two main components: a delay between ordering and receiving, corresponding to a block named conveyor or pipeline and an accumulator, which receives the ordered good after the stipulated delay, corresponding to the warehouse (Fig. 1).It is assumed that there are capacity constraints, both on the warehouse and on shipping.Selling of the good occurs in accordance with the demand and it is assumed that all but the quantity  of the good (the amount in the safety stock) can be sold.Note that since it is assumed that the good is divisible, it is described by a real-valued variable, instead of an integer-valued variable.This is a reasonable approximation when the quantity of the good is large.The manager's objective is to choose Figure 1.A: Block diagram of single echelon supply chain to define the inventory control problem, together with the one step ahead optimizer, showing how the overall scheme functions as a feedback controller.Continuous lines with arrows denote the flow of the good through the system, while dash-dotted lines show the flow of information regarding system variables.The optimizer finds the ordering that maximizes the economic value added, (•), in accordance with (35) (one step ahead case) or (32) (omniscient case).B: The shipping pipeline in detail: it can be thought of as a conveyor, causing a delay of  units between ordering a certain quantity of the good and receiving it at the warehouse.
an ordering sequence, in response to the demand, that maximizes cumulative profit over a given time horizon, assuming that there are costs and constraints associated to handling, shipping, storage and stockouts.It is also assumed that there is no backordering; thus stockouts imply that customer demand is left unfulfilled, and the negative effect of this on sales is modeled by a high cost for lost sales.Finally, profits (= sales revenues minus costs) are discounted using a fixed interest rate and then added over the given horizon to obtain the cumulative profit, referred to as Economic Value Added (EVA) [24].The notation used to formulate the mathematical model is summarized in Table 1.
The delay between ordering and receiving is written by introducing a string of  unit delays, which can be thought as a conveyor, as follows: Thus the quantity of material in transit (i.e., being shipped) at time , denoted by  ℎ (), can be written as: while the quantity of material being received at the warehouse is  1 (), which is no longer considered as material being shipped, and corresponds to an order placed  time units in the past, so that ordering, denoted () is just  +1 ().
Evolution of the stock level in the warehouse at time  is determined by the quantity of material being received ( 1 ()) plus the quantity currently in the warehouse   () (setting aside a safety stock  not to be sold) less the quantity sold (  ()) in response to the demand.The corresponding equation is as follows.
In order to describe the sales   () at time , observe that sales can only occur if the inventory level   () plus the received material  1 () less the safety stock  is greater than the demand ().Thus sales can be written as follows: Thus the quantity of sales matches the demand () if it is less than the effective available inventory level   () −  +  1 (), otherwise it is just equal to the latter.The costs are expressed as follows: st () =  st   () (storage cost) (7) () =   () +  st () +  ℎ () +  sh () (total cost).(10) Discounted profit, also referred to as economic value added (EVA), is the revenue from sales less the total cost (all at time ), appropriately discounted: The cumulative profit (), also called cumulative EVA, accumulates the discounted profit: The objective function to be maximized is the cumulative profit at the end of the time horizon and thus, given the demand (),  = , . . .,   − 1 over the whole simulation horizon, the optimization problem to be solved is in fact an optimal control problem, which will be referred to as the omniscient optimal control problem (since the demand over the entire simulation horizon is assumed to be known), and can be written as follows: maximize (  ) In optimal control terminology, this is the Mayer problem [20] of choosing an optimal ordering sequence that maximizes the final cumulative profit (  ).More realistically, the following constraints are also needed: () ≤  ,max ,  = 0, . . .,   − 1.
Observe that equations (1) through ( 12) are all linear with the exception of (5).Since (5) is piecewise linear, it will now be shown how to write the optimal control problem described in (13), as a linear program.The function min can be written equivalently as: The variable  can be written as the difference of two nonnegative functions  + and  − so that: Denoting the effective available inventory level less the demand as (): and using (21), equation ( 5) is rewritten as follows: where  − () is defined from (22) i.e., which is now recognizably a linear equation involving the additional nonnegative variables: In order to ensure that equation ( 20) is satisfied, this quadratic constraint is modeled as the two mutually exclusive logical statements: Let  be any upper bound for  + ,  − .Then, it is easy to check that statements ( 27) and ( 28) are equivalent to the following linear constraints: where  1 and  2 are binary variables.
For future reference, the state vector of the single echelon supply chain at instant  is denoted as z() and defined as: From ( 1), (2) it follows that the state vector z() can be written equivalently as:  (20) and, thus, transcribing the omniscient optimal control problem into a simpler LP problem yielded exactly the same results obtained from (32).Furthermore, constraint (20) was verified to be true for all  instants despite not being explicitly enforced by the LP problem.This was also the case for the OSAOC problem discussed in Section 3.
Remark 2.3.The omniscient cumulative EVA, even though it is unattainable, serves as a global optimum and as a reference against which the cumulative EVA achieved by causal ordering sequences, obtained by implementable controllers, can be evaluated.The omniscient optimal solution requires knowledge of future peaks and troughs in the demand and uses it, when required, to make anticipative (i.e., noncausal) ordering, ensuring that cumulative profit is maximized.Such noncausal behavior is not possible for any real world controller, which must necessarily be causal or nonanticipative, since it cannot have access to future demands.
Figure 2. The one step ahead optimal control scheme in the case where the actual delays and demands differ from the values used in the one step ahead optimizer.In Section 7, the actual delays are chosen from a random distribution, while the one step ahead optimizer uses upper bounds.Similarly, the actual demands are used to update the supply chain state z(), defined in (34), while the one step ahead optimizer uses only past demands and estimates the current demand d() = ( − 1).

One step ahead optimal control
In practice, of course, only past demands are known to the supply chain manager, who has to make ordering decisions based on this information and observation of the current variables, such as inventory level, amount of goods in the shipping pipeline, sales levels, etc.This section formulates the so called one step ahead optimal control (OSAOC) problem, that respects practical information constraints, and is shown schematically in Figure 2.
As pointed out in (34), the state z() can also be expressed as z() = [(−), . . ., (−1),   ()].Thus, after the optimization step is carried out, the state is updated to z( +1) = [( − +1), . . ., ( −1),  * (),   ( +1)], where   ( + 1) is obtained by evaluating (4).From this new initial condition, a new optimization step can be performed.With this notation in place, the iterative one step ahead control (OSAOC) scheme over the backward horizon can be written as shown (Algorithm 1).Discard all computed optimal orderings except  * ( − ) Evaluate RHS of (4) to obtain ( + 1) z0 ← [( −  + 1), . . ., ( − 1), osao(), ( + 1)] 10: ←  + 1 11: end while Remark 3.1.Algorithm 1 relies only on past values of the demand and the current demand estimate d() = ( − 1): at each instant , it solves a linear program with ( + 3)( + 1) decision variables, ( + 4)( + 1) equality constraints describing the SESC dynamics, 3( + 1) nonnegativity constraints, 4( + 1) upper bound constraints and ( + 1) equality constraints that ensure the binary variables are mutually exclusive.This is a linear program that is significantly smaller in terms of the number of decision variables and constraints than the corresponding omniscient optimal control problem (32), whenever the simulation horizon   is much larger than the lead time .For example, assuming  = 7 there would be a total of 80 decision variables, 88 equality constraints describing the dynamics, 24 nonnegative constraints, 32 upper bound constraints and 8 equality constraints that ensure the binary variables are mutually exclusive.In contrast, for   = 30, the omniscient optimal control problem would have 230 decision variables, 253 equality constraints, 69 nonnegative constraints, 92 upper bound constraints and 23 equality constraints involving the binary variables.Thus, even though OSAOC has to be solved at each step, it is a computationally tractable method, which is desirable for real time inventory control.Furthermore, if ordering is made only at the end of the current day, then, of course, current demand () is known and the estimate d() = ( − 1) does not need to be used.

Classical inventory control schemes.
This section summarizes the classical control theory based APIOBPCS controller generalized in [29] and the well known (, ) control policy [1], chosen as candidates for comparison with the proposed OSAOC controller.These two controllers are chosen because the first is a representative of a family of controllers much discussed and cited in the literature [21], while the second is widely used by practitioners, and commonly discussed in textbooks, e.g., [15].Another reason for the choice is that they are feedback controllers with roughly the same level of simplicity as the proposed controller, although, in contrast to it, the first does require demand forecasting as well as tuning of three controller parameters (  ,   ,   ) and the second requires tuning of the low () and high () stock level parameters.It is crucial to note, however, that good or optimal parameter tuning for a given demand, will not, in general, be optimal for some other demand.Thus, for fair comparisons, a suitable approach to parameter tuning for the APIOBPCS and (, ) controllers is required and one is proposed in Section 5.3.

The APIOBPCS controller
The ordering policy of the APIOBPCS controller Page 2888, Equation (6) of [29] is rewritten as follows (using the notation presented in Section 2): where d() is the estimated demand. ref () is the inventory level reference, the choice of which is based on some heuristic.Parameter   is the time constant associated to the adjustment of inventory level, parameter   is the lead time or shipping delay, and parameter   is the time constant associated to shipping level adjustment.
Remark 4.1.The APIOBPCS controller was originally designed for WIP (Work-In-Process) instead of pipeline inventory i.e., shipping.Note, however, that both WIP and shipping share similar dynamics, so there is no loss of generality in using shipping instead of WIP.

The (s, S) inventory controller
The classical (, ) controller [1] is also chosen for purposes of comparison with the proposed one step ahead optimal control.This is an inventory control policy that is extremely simple to state, and has also proven to be practical and effective, in the seventy years since it was first proposed [2].It involves the choice of two levels of stock, denoted by  and  >  and proposes ordering to bring the inventory level back in the vicinity of  as soon as it drops below , and not ordering when it exceeds .This can be expressed as follows: making it clear that it is a threshold based output feedback control policy (i.e., based only on the component   of the entire state feedback vector z), switching between 0 and the deficit between  (desired stock level) and the actual current stock level   , whenever the latter goes below the threshold value .It is also obvious that, for a given demand, the performance of an (, ) controller depends on the choice of the parameters  and .This implies, in turn, that these parameters can be optimized for a given demand, but, of course, there is no guarantee that these values will remain optimal for a different demand.This is discussed further in Section 5.3, where comparisons between the different controllers are made.

Numerical examples of OSAOC versus classical controllers
A demand sequence is generated from a normal distribution, "by far the most popular distribution for making inventory management decisions" [16,23].In the first numerical example, this normally distributed demand is used to show the solution of the one step ahead optimal control.For the purpose of a fair comparison, both OSAO and APIOBPCS controllers use the same (and very simple) demand forecast d() = ( − 1).
Next, the parameters of the APIOBPCS and (, ) controllers are optimized using a data-driven methodology.The previous numerical example is repeated using the optimized values obtained for these parameters.
In all cases, the cumulative profit  osao obtained for the OSAOC is compared to the final value  omni attained by the omniscient solution for the same demand sequence.Furthermore, these results are also compared to the cumulative profits  apiobpcs and  ss attained respectively by the APIOBPCS controller proposed in [21], derived from classical control theory, and the (, ) controller proposed in [1].
The ratio between the cumulative profits   / omni is referred to as the efficiency   , where the subscript  specifies the control applied (i.e.,  ∈ {, , }).
The supply chain to be controlled is the single echelon plant described by the dynamics presented in Section 2. Ordering is constrained to be nonnegative, meaning that stock can not be returned to the supplier.The shipping delay  is taken to be 7 time units and   is chosen to be equal to 44 time units.All comparisons are carried out over the interval [,   − 1 − ] to exclude the effect of the final transient due to the turnpike effect [30] that occur for the omniscient solution i.e., the comparison interval is [7,36], which is equivalent to a period of 30 time units.
For all numerical examples, the pipeline was considered initially empty and the other parameters were set as indicated in Table 2.The values of parameters   ,   ,   (of the APIOBPCS controller),  and  (of the (, ) controller) are specified in each related subsection since their values change for each numerical example.

Software implementation details
The omniscient solution and the OSAO, APIOBPCS and (, ) controllers were implemented in the scientific computing language Julia [3] using JuMP [7], an acronym for Julia Mathematical Programming, which is an algebraic modeling language for mathematical optimization embedded in Julia with a wrapper for the CLP (Coin-OR linear programming) and CBC (Coin-OR branch and cut) solvers.JuMP allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, semidefinite, and nonlinear) since it supports different types of solvers.It also defines variables, constraints and the objective using a simple syntax that closely resembles the mathematical formulation shown in Algorithm 1.
The optimization of the APIOBPCS and (, ) tuning parameters discussed in Section 5.3 was obtained by MATLAB implementations of these two controllers that uses the Optimization Toolbox, which is set up to carry out black box optimization of a function, using interior-point, sequential quadratic programming (SQP) and active set methods, amongst others.The black box in question is the system model ( 1)-( 5) of the SESC dynamics, and the fmincon command was used to optimize the tuning parameters, with the algorithm being chosen as interior point.

Illustrative example: single normally distributed demand
Experiments in this subsection are carried out with a demand  that has a normal distribution with  = 10 and  = 3.This demand is generated by the Julia command d = rand(Normal(10,3),K f ) using the Distributions package.Note that for this distribution there is a 99.7% probability of 1 ≤ () ≤ 19 for  = 0, . . .,   .If, however, a demand with negative values is generated, it is discarded and replaced with a new one until a nonnegative demand sequence is generated.The OSAO controller uses the approach detailed by Algorithm 1.Both OSAO and APIOBPCS controllers use d() = ( − 1) as an estimate of the unknown current demand.For the APIOBPCS, the gains were set as:   =   =   =  = 7, which is based on the setting used by [29].Lastly, the setpoint  ref was chosen to be equal to 10, which is the average demand over the interval [7,36].For the (, ) controller, the stock levels were set as  =  −  +  = 8 and  =  +  +  = 14, which are the deviation from the mean plus the safety stock, noting that this presupposes knowledge of the demand distribution used in the numerical experiment.Such prior knowledge is not required for the proposed OSAO controller.
The attained cumulative profits (12) and the respective efficiencies with regard to the global optimum (omniscient solution) for each controller are presented in Table 3. Figures 3-10 show plots of the important variables for each of the three controllers, in order to make a comparison.Further observations are made in Subsection 5.2.1 Figure 3. Ordering and selling for the omniscient solution with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36, in order to remove the turnpike effect, for a normally distributed demand.Thus, the simulated horizon is equivalent to a period of 30 time units.

Discussion of numerical example 1
Figure 3 depicts the ordering and selling for the omniscient solution.Since it assumes knowledge of the demand over the entire simulation horizon, it can anticipate the ordering of stock by  instants and overcome the potentially adverse effects of shipping delay.Note that the demand is not matched at instants  = 9 to  = 14 due to the initially empty pipeline, but, if the warehouse inventory level is sufficiently large, lost sales (Fig. 9) can be avoided.
Figures 4-6 show the ordering and selling responses for the APIOBPCS, OSAO and (, ) policies respectively.As expected, all controllers exhibit the same pattern of lost sales due to initialization as the omniscient solution.The OSAO and (, ) controllers also have lost sales after instant  = 14.The APIOBPCS was the only causal (nonanticipative) controller to avoid lost sales after instant  = 14, but this was achieved at the cost of a higher usage of the storage space and violation of both maximum warehouse ( ,max = 50) and shipping ( ℎ,max = 100) capacities as can be seen in Figures 7 and 8. Unlike the anticipative noncausal omniscient solution, the causal controllers are not able to keep the inventory level at the minimum, which is the specified safety stock of  = 1, after the initial pipeline transient.
Figure 10 depicts the evolution of the cumulative profits.The efficiencies of the causal controllers, at the end of the simulation horizon, with respect to the global optimum (noncausal omniscient solution) are:   apiobpcs = 62.80%,   osao = 85.69% and   ss = 57.84%.

Data driven methodology to optimize the APIOBPCS and (𝑠, 𝑆) controllers
The tuning parameters of the APIOBPCS controller and the stock levels of the (, ) controller chosen heuristically in Section 5.2 yielded results that were considerably inferior to the OSAO controller, raising the question of tuning or optimizing these parameters to improve their performance.Note that, in order to compare the controllers with regard to a specific normally distributed demand  spec , it would not be fair to optimize the APIOBPCS and (, ) controllers with respect to  spec , because this amounts to giving these two controllers omniscient information (i.e., the demand over the whole period is assumed to be known and is used for the parameter optimization), while the proposed OSAO controller does not use this omniscient information.A reasonable alternative is to optimize the parameters of the APIOBPCS and (, ) controllers using a set of normally distributed demands similar to the demand  spec .The specific demand  spec , however, can not be included in this set.The following methodology is used.First, a training set, consisting of  randomly generated and normally distributed nonnegative demands with a given mean  and standard deviation  is generated.Evidently, if a dataset of real demands is known it should be used and the optimization is then data driven, with real data, as opposed to synthetic data.With the training set in hand, the parameters of the APIOBPCS and (, ) controllers are optimized for the demands in the training set using the Matlab Optimization Toolbox and the solver fmincon, set to the default interior-point algorithm.
The problem of optimizing the parameter triple (  ,   ,   ) of the APIOBPCS-based controller and the stock levels of the (, ) controller is now formulated as a black box optimization problem.The black box, which contains the dynamics and constraints, receives either the triple (  ,   ,   ) (when optimizing the APIOBPCSbased controller) or the pair (, ) (when optimizing the (, ) controller), in addition to a given demand sequence, and returns the cumulative profit  for this demand.The goal of the optimization is to find the solution (triple (  ,   ,   ) or pair (, )) that maximizes the average cumulative profit  of the given demand.For the APIOBPCS controller, the optimization problem to be solved is unconstrained with respect to its parameters.However, given the definition of the (, ) controller, which requires  > , it is necessary to impose the inequality constraint:  The capacity constraints on shipping and the warehouse are treated using a penalty function,  , that adds a cost to the cumulative functions  and  whenever   () >  ,max or  ℎ () >  ℎ,max and is defined as follows: where Even though it is possible to select the weights  1 and  2 to be large enough in order to find an optimal solution that generates feasible ordering sequences for all training demands, there are no guarantees that this will remain true for any demand with the same distribution, but outside the training set.Also note that this methodology not only requires the type of distribution to be known in advance, but also its parameters, i.e., the mean () and standard deviation () in the case of a normal distribution, and such information may not always be available Figure 8. Shipping for all controllers with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − ) = 37 for a normally distributed demand.Note that the APIOBPCS controller violates the maximum shipping capacity ( ℎ,max = 100) at instant  = 14.

Figure 9.
Lost sales for all controllers over the simulated horizon, for a normally distributed demand.All controllers lead to lost sales in the initial segment, up to  = 14, because the pipeline is initially empty.Both the noncausal omniscient solution and the APIOBPCS controller manage to avoid lost sales after  = 14.
in practice.It is important to emphasize that, in contrast, no information about the demand distribution is required for the OSAO controller.Moreover, if the value of the demand distribution parameters expected in the training set differ from the ones in the test set, then the solution found by this method does not guarantee good results for the APIOBPCS and (, ) controllers, which depend on good tuning of their parameters unlike the parameter-free OSAO controller.See Section 6.
The previous example in Section 5.2 considered a normally distributed demand of  = 10 and  = 3.By randomly generating a training set of  = 10 demands with the same  = 10 and  = 3, the fixed gains of the APIOBPCS-based controller that maximized the average profit  of the demands in the training set were found to be:   = 5.03,   = 16.71 and   = 15.63.Analogously, the optimal stock levels for the (, ) controller were found to be  = 9.27 and  = 10.06.The weights were set as  1 =  2 = 20.All other parameters remained the same as indicated in Table 2.  show the results of replacing the parameters used for the APIOBPCS and (, ) controllers in the example of Section 5.2 with the optimal values obtained in this section.Table 4 summarizes the results Figure 10.Cumulative profit for all controllers over the simulated horizon, for a normally distributed demand.The efficiencies of the causal controllers, at the end of the simulation horizon, with respect to the global optimum (noncausal omniscient solution) are:  apiobpcs = 62.80%,  osao = 85.69% and  ss = 57.84%.and demonstrates the significant improvement in performance of the APIOBPCS and (, ) controllers tuned by the proposed method.The result for the OSAO control, which is parameter free, is, of course, the same as in Table 3 and is repeated in Table 4 only for the purposes of convenient comparison.Further comments and observations are made in Subsection 5.3.1 below.

Discussion of numerical results
Figures 11 and 12 show the ordering and selling for the optimized APIOBPCS and (, ) controllers respectively.There is a significantly lower bullwhip effect in the inventory (stock) level produced by the optimized APIOBPCS controller (compare Figs. 7 and 13).Furthermore, Figures 13 and 14 show that the APIOBPCS controller reaches lower peaks in the warehouse inventory and shipping levels and no longer violates the upper bounds in maximum warehouse ( ,max = 50) and shipping ( ℎ,max = 100) capacities, in contrast to Figures 7 and 8 respectively.As for the optimized (, ) controller, a reduction in the warehouse inventory level is noticeable in Figure 13, compared to Figure 7. Unlike the anticipative omniscient solution, the implementable controllers are not able to keep the inventory level at the minimum, which is the specified safety stock of  = 1, after the initial transient.Figure 15 shows a slight decrease in the lost sales of the optimized (, ) controller.Lastly, Figure 16 shows that the efficiencies increased from the previous  apiobpcs = 62.80% to 83.44% for the optimized APIOBPCS and from  ss = 57.84% to 66.06% for the optimized (, ) controller, showing the effectiveness of the proposed optimization (tuning) method.Ordering and selling for the (, ) controller with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure 6.

Comparison of average performances for test sets of normally distributed demand
A common problem associated with many inventory control policies is the bullwhip effect of demand uncertainty resulting in larger amplitudes of system variables, such as inventory level.To examine this behavior, two measures are proposed in this paper: the normalized average used capacity  st and the normalized peak stock level  st .The normalized average used capacity  st measures the average level of stock in the warehouse and is defined as the ratio of the mean of the warehouse inventory level   over the simulation horizon to the maximum storage capacity  ,max :  st = average use of warehouse capacity maximum warehouse capacity • The normalized peak warehouse level  st measures the peak level of stock in the warehouse and is defined as the ratio of the peak value of the warehouse inventory level   over the simulation horizon to the maximum Figure 13.Inventory level for all controllers with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − ) = 37 for a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure 7.
Figure 14.Shipping for all controllers with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − ) = 37 for a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure 8.
storage capacity  ,max : Both these coefficients are calculated over the interval [2 + 1,   − ], so as to exclude the initial transient due to pipeline initialization as well as the turnpike effect, and are used to quantify the effect of the variation of demand on the inventory level.Low  st and  st indicate a maximum storage capacity that is larger than required by the current demand level.Low  st and high  st indicate a bullwhip effect in the inventory.Lastly, high  st and  st indicate the inventory is being used close to its full capacity.These inferences are collected in Table 5.
Remark 5.1.The bullwhip effect is commonly defined in inventory control literature as the ratio of the coefficients of variation of the ordering and the demand.Measures greater than the unity are an indication of bullwhip effect due to the amplification of demand variability.However, in our experiments, we found some of these measures to be occasionally misleading, in particular, for the omniscient solution, whose optimal ordering Lost sales for all controllers over the simulated horizon, with a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure 9.
Figure 16.Cumulative profit for all controllers over the simulated horizon, with a normally distributed demand.The efficiencies of the causal controllers, at the end of the simulation horizon, with respect to the global optimum (noncausal omniscient solution) are:  apiobpcs = 83.44%, osao = 85.69% and  ss = 66.06%.Compare with the results obtained with the unoptimized control in Figure 10.anticipates the demand by  time units (as discussed in Sect.5.2).For this reason, we decided to characterize the bullwhip effect using indexes ( 43) and ( 44), as explained in Table 5.

Training and test set methodology
This section utilizes the approach illustrated in the previous section, for a single test demand, to a set of test demands, in order to compare average performances.Ten samples of normally distributed demands with  = 10 and  = 3 were generated and grouped in a separate set, referred to as test set.The training set, discussed in the previous section (and henceforth denoted as  1 ), and the test set (henceforth denoted as  2 ) are two distinct sets and it is ensured that  1 ∩  2 = ∅.The parameters for the APIOBPCS controller were optimized using the methodology described in Section 5.3 and the solutions found were   = 5.03,   = 16.71,  = 15.63.Analogously, the optimal stock levels for the (, ) controller were found to be  = 9.27 and  = 10.06.The average efficiencies (resp.standard deviation) considering the cumulative profit with respect to the globally optimum (noncausal omniscient solution) are presented in Table 6, along with the average used capacities and peak warehouse values for these ten normally distributed demands.These results indicate that all controllers are operating well below the lower bound specified on the right hand side of (16) and are not being subjected to stressful operating conditions.
Thus, in order to test the controllers under more extreme circumstances, a test set,  2 , comprised of ten samples of normally distributed demands with  = 20 and  = 3 was generated, thus increasing the mean demand by one hundred percent.Once again, the tuning parameters of the APIOBPCS controller and the stock levels of the (, ) controller were adjusted using the methodology described in 5.3.Using the same parameters in Table 2 and defining the setpoint  ref = 20 i.e., the mean of the normal distribution, led to the stock levels  = 16.57and  = 17.67 of the (, ) controller and the fixed gains   = 2.86,   = 174.42,  = 10.63 of the APIOBPCS controller.Table 7 summarizes the average results (± standard deviation) obtained for a test set of 10 normally distributed demands with  = 20 and  = 3.The proposed OSAO controller maintains a higher efficiency than the others (after they have been tuned).

Robustness to demand uncertainty
This section reports on an experimental investigation of the robustness of the APIOBPCS, OSAO and (, ) controllers with respect to normally distributed demands of uncertain mean and standard deviation.The experiment is designed as follows.The means (resp.standard deviations) of the normally distributed demands are allowed to vary in the intervals [10,20] (resp.[1,3]), in steps of size 1 (resp.0.25).At each point on this grid in the (, )-space, a test set of ten normally distributed demands is generated.The parameters of the APIOBPCS and (, ) controllers are optimized, using a training set of ten normally distributed demands, at the central point (15,2) of the (, )-grid and held fixed for the entire experiment.As before, the test and training sets are disjoint.The average efficiency over the test set of each of the four controllers is computed at each grid point and gives rise to the average efficiency surfaces plotted in Figure 17.
At the central point  = 15,  = 2 of the (, ) grid, the optimized stock levels of the (, ) controller were found to be  = 14.44 and  = 15.21while, the optimized parameters of the APIOBPCS-based controller were found to be   = 2.52,   = 172.29 and   = 80.98, and these optimized parameters were kept fixed throughout the experiment.The setpoint was defined as  ref = 15, the central value of , and all other parameters used in this numerical experiment were the same as indicated in Table 2.As can be seen in Figure 17, the (, ) controller proved to be less robust as the value of the mean decreased ( < 15).On the other hand, the APIOBPCS-based controller proved to be less robust as the value of the mean increased ( > 15).Since the OSAO controller does not require the setting of control parameters (such as stock levels, fixed gains or setpoint), it is seen to be robust over the entire space defined by the intervals of  and , attaining an average profit of at least 80% of the (unattainable) omniscient global maximum profit, and this is one of the significant advantages of the scheme being proposed in this paper.

Robustness to lead time (delay) uncertainty
So far, all experiments in this paper assumed a constant lead time .However, the time elapsed from the placement of an order up to the instant the good is received at the warehouse can vary in real-life inventory control problems.Unlike the case with demand uncertainty, the introduction of lead time (delay) uncertainty requires a few adjustments to be made in the single echelon supply chain dynamics discussed in Section 2.
First, a sequence of lead times, denoted  var , is defined as follows: where the -th element of  var , denoted   , corresponds to the actual lead time for an order placed at time instant , for  = 1, . . .,   .As with the demand sequence,  var is assumed to be known in advance only while computing the omniscient solution and not known by the implemented controllers OSAO, APIOBPCS and (, ).
The length of the conveyor, described by equations ( 1) and ( 2), needs to be increased taking into account the maximum lead time, denoted  max and defined as: which can be greater than the expected constant lead time  used previously.
Remark 7.1.As noted in Page 507 of [25], knowledge of lead times and lead time variation is needed for management to be effective.Therefore, the assumption of  max being known beforehand is reasonable.Furthermore, equation ( 46) provides the minimum required length of the conveyor (i.e., the number of conveyor states) in order for the SESC dynamics to be able to account for all lead times in  var .For real-life applications where lead time variation may not be known,  max only needs to be greater than the maximum expected lead time i.e.,  max >  [max ( var )].The simple choice of  max = ( var ) is made below.
The number of goods being shipped at a time can be rewritten as: Hence, the omniscient solution can be obtained by solving the following MILP problem:
For the particular case where  =  max , equations ( 52) and (53) become identical to (1) and ( 2) respectively, since the position  +1 () lies outside of the conveyor's boundary and, therefore, is equal to zero.
The OSAOC problem adapted for variable delay, and denoted as (z(), ), can be transcribed in the following MILP problem: maximize ( + 1) {()}  =− subject to (52),(53),( 4),( 12),( 24),(25) ∀  ∈ [ − , ], and ( 14),( 15),( 16), (26) ∀  ∈ [ − , ] and ( 29),( 30), (31) ∀  ∈ [ − , ] and z 0 = z() (specified initial conditions). (54) The optimal ordering sequence for the OSAO controller can be obtained by following similar steps as the ones described in Section 3. By solving (54) at a given instant , we find the optimal ordering  osao ().This procedure corresponds to the OSA optimizer block in Figure 2 and assumes no knowledge of  var , using the conveyor state dynamics described by (52) and (53).After the optimal ordering  osao () has been placed, and its value can no longer be changed, the actual conveyor is updated according to equations (47) and (48), using the now known information about the actual lead time   ∈ [1,  max ] at time instant .Lastly, the SESC dynamics are evaluated using the now known actual demand and the state vector z( + 1) is updated.From this new initial condition, a new optimization step can be performed.Algorithm 2 summarizes the necessary changes made to the OSAO controller in order to account for lead time uncertainty.
This section reports on an experimental investigation of the robustness of the APIOBPCS, OSAO and (, ) controllers with respect to ten randomly generated and uniformly distributed delay sequences, with values in the interval [5,9] i.e.,   ∈ [5,9] for  = 1, . . .,   .In this experiment, each controller has its cumulative profit Evaluate RHS of (4) to obtain ( + 1)

9:
Update the conveyor state (47, 48) according to   10: z0 ← [1( + 1), . . .,  max ( + 1), ( + 1)]  and efficiency   evaluated for all ten different delay sequences.The overall performance of each controller is, then, calculated by taking the average (and standard deviation) of the ten cumulative profits obtained for each delay sequence.Throughout the experiment, the system parameters are set to the values indicated in Table 2 and the same demand profile, used in the first three numerical examples of Section 5, is adopted for all performance evaluations.The tuning parameters of the APIOBPCS and (, ) controllers are set to the optimized values obtained in Section 5.3 for  = 7 time units, which are   = 5.03,   = 16.71,  = 15.63 for the APIOBPCS controller and  = 9.27 and  = 10.06 for the (, ) controller.
Table 8 shows the performance (average efficiency and standard deviation) evaluated for each controller with respect to ten uniformly distributed delay sequences.As can be seen, the OSAO controller achieved both the highest average efficiency (above 70%) and lowest standard deviation (below 5%), meaning that its performance suffered less from different delay sequences.For the purpose of convenient comparison, Table 8 also shows the efficiency of the controllers with respect to the constant lead time  = 7 (first presented in Tab. 4).

Conclusions
This paper proposed a novel one step ahead optimal control (OSAOC) scheme for a single echelon supply chain model, that does not require prediction.The proposed control scheme requires the solution of a lowdimensional (in terms of the number of decision variables) optimization problem at each time step, thus being perfectly adequate for real time inventory control.The one step ahead control is shown to be efficient, robustly attaining an average profit of at least 80% of the (unattainable) omniscient global optimum for different normally distributed demands.Furthermore, the one step ahead control proved to be robust with respect to delay (lead time) uncertainty, attaining 70% of the globally optimal omniscient solution for different delay sequences.In addition, it should be emphasized that the use of a controller with a fixed structure such as the APIOBPCS controller requires the tuning of parameters (setpoints and gains) associated to its structure.It is difficult to find fixed choices of parameters that work well over a range of demands.In other words, if a set of parameters is chosen to be adequate or even optimal for a given nominal demand  nom , it will, in general, be worse for some other demand , even if  is close, in some appropriate metric, to the demand  nom .The degradation of performance will increase as the demand  moves further away from  nom in the given metric.This is to be contrasted with the proposed OSAO controller, which is always locally optimal and does not need any tuning, as changes occur in the lead time or in the demand, for example, in its distribution or in its mean and standard deviation values.This is one of the main advantages of the proposed approach.It is also important to observe that, for the single echelon case, it was shown how to rewrite the OSAOC optimization problem as linear programming problem with a relatively small number of decision variables.From a computational viewpoint, this implies that the simple examples in Section 5 can easily be scaled to much larger dimension.It should also be noted that a large number of supply chain problems, including the multi-echelon case, are in fact described by linear or piecewise linear dynamics.Thus, the proposed scheme is scalable and can: (i) be generalized to large multi-echelon supply chains and (ii) used in a planning mode, to choose inventory size, shipping capacity requirements and prices that ensure profit margins, since these choices can be recast as feasibility problems.It should also be pointed out that, although this paper used the example of a single echelon supply chain, with dynamics such that the resulting OSAO problems were linear programs, the OSAO approach is not limited to dynamics and objective functions that necessarily lead to linear programming problems.The generalization of the model with a single good presented in this paper to a model with multiple goods is straightforward, if it is assumed that some upper bound on the stock of each good is given, together with some lower bounds on their safety stocks.The upper bound would be constrained by the physical capacity of the warehouse.Similarly, if several goods are shipped together, the model could put a common upper bound on the shipping of these goods, and so on.These topics are the subject of ongoing work.The fact that the proposed OSAOC approach does not require demand prediction makes it a strong contender for a cheap and universal control scheme that could be adopted by practitioners who might not be comfortable with the considerably greater sophistication of schemes that require prediction, such as the class of model predictive control schemes.

Figure 4 .
Figure 4. Ordering and selling for the APIOBPCS controller with d = ( − 1) and  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.Note that selling tracks demand exactly after the initial transient i.e., for  ≥ 15.

Figure 5 .
Figure 5. Ordering and selling for the OSAO controller with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.

Figure 6 .
Figure 6.Ordering and selling for the (, ) controller with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.

Figure 7 .
Figure 7. Inventory level for all controllers with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − ) = 37 for a normally distributed demand.Note that the APIOBPCS controller violates the maximum warehouse capacity ( ,max = 50) at instant  = 27.

Figure 11 .
Figure 11.Ordering and selling for the APIOBPCS controller with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure 4.

Figure 12 .
Figure 12.Ordering and selling for the (, ) controller with  = 7, assuming that demands from  = 0 to  = 6 are used as the initial past demands, to  = (  − 1 − ) = 36 for a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure6.

Figure 15 .
Figure 15.Lost sales for all controllers over the simulated horizon, with a normally distributed demand.Compare with the results obtained with the unoptimized control in Figure9.

Figure 17 .
Figure 17.Efficiency surfaces for each controller  with respect to the objective function .Each point on a surface represents the average efficiency    for 10 normally distributed demands from the test set.Observe that the efficiency of the OSAO controller stay at or above levels of 80% of the (unattainable) omniscient global optimum profit for all combinations of 10 ≤  ≤ 20 and 1 ≤  ≤ 3, while the efficiencies of the (, ) and APIOBPCS controllers fall off considerably in certain areas of the (, )-space.

Figure 18 .
Figure 18.Conveyor update.The ordering (), placed at time instant , is inserted at the position () of ( + 1) according to the lead time value   ∈ [1,  max ], yielding equation (48).The first position of the conveyor,  1 , informs the quantity of goods that arrive at the warehouse while the other positions,  2 up to  max inform the goods currently in transit (being shipped) and their respective stages.

Table 1 .
Notation adopted to formulate the mathematical model.

Table 3 .
Cumulative profits and efficiencies.

Table 4 .
Cumulative profits and efficiencies (normal demand of Section 5.2) after optimizing APIOBPCS and (, ) controllers.Compare with the results of Table3.

Table 5 .
Inferences from combinations of the average used capacity  st and peak warehouse level  st values for a given set of demands and simulation horizon