A RANKING FRAMEWORK BASED ON INTERVAL SELF AND CROSS-EFFICIENCIES IN A TWO-STAGE DEA SYSTEM

. The evaluation of the performance of a decision-making unit (DMU) can be measured by its own optimistic and pessimistic multipliers, leading to an interval self-efficiency score. While this concept has been thoroughly studied with regard to single-stage systems, there is still a gap when it is extended to two-stage tandem structures, which better correspond to a real-world scenario. In this paper, we argue that in this context, a meaningful ranking of the DMUs is obtained; this outcome simultaneously considers the optimistic and pessimistic viewpoints within the self-appraisal context, and the most favourable and unfavourable weight sets of each of the other DMUs in a peer-appraisal setting. We initially extend the optimistic-pessimistic Data Envelopment Analysis (DEA) models to the specifications of such a two-stage structure. The two opposing self-efficiency measures are merged to a combined self-efficiency measure via the geometric average. Under this framework, the DMUs are further evaluated in a peer setting via the interval cross-efficiency (CE). This methodological tool is applied to evaluate the target DMU in relation to the most favourable and unfavourable weight profiles of each of the other DMUs, while maintaining the combined self-efficiency measure. We, thus, determine an interval individual CE score for each DMU and flow. By treating the interval CE matrix as a multi-criteria decision making problem and by utilising several well-established approaches from the literature, we delineate its remaining elements; we show how these lead us to a meaningful ultimate ranking of the DMUs. A numerical example about the efficiency evaluation of ten bank branches in China illustrates the applicability of our modelling approaches.


Introduction
Data Envelopment Analysis (DEA) is a benchmarking technique for comparing the relative efficiency of a Decision Making Unit (DMU) with the best observed efficiency [7].The evaluation of a DMU is based on the comparison between the amount of input(s) consumed and the amount of output(s) produced [8] by DMUs.
One of the undeniably attractive features of DEA is its weight flexibility.This allows each DMU to be allocated its most favourable set of weights to be assigned to inputs and outputs for determining its relative efficiency.
Hence, in the conventional DEA, the overall assessment of a DMU is based on the optimistic viewpoint and on the concept of the efficiency frontier [56].On the other hand, if the performance of a DMU is based on the pessimistic viewpoint, then an inefficiency frontier is defined.
Optimistic and pessimistic perspectives illustrate two extreme cases for each DMU.Taking only one scenario into account limits the examination of the performance of a unit.The obtained results might be unreasonable [3].Therefore, it is thought to be valuable to consider the two distinctive efficiencies together.
Research on exploring both aspects of viewing the efficiency of a DMU within a single-stage structure is relatively extensive.Wang and Luo [48] evaluated each DMU in terms of the optimistic and pessimistic viewpoint, by introducing an input-oriented virtual Ideal DMU (IDMU) and an output-oriented virtual Anti-ideal DMU (ADMU).The two separate efficiencies were combined into the Relative Closeness (RC) index to obtain a unique ranking order.Wu [51] identified a weakness in Wang and Luo's [48] paper dealing with the ADMU for DEA modelling.Wu argued that it is inconsistent to aggregate an input-oriented IDMU and an output-oriented ADMU into the RC index.
Wang and Yang [49] proposed an alternative way of measuring the performance of DMUs.The efficiencies of DMUs are measured within the range of an interval, in which the upper bound is 1 and the lower bound equals to the performance of a virtual ADMU, which is the worst among all DMUs.This approach, which only considers the performance of the lower bound, was extended by Azizi and Jahed [4], who suggested a pair of improved bounded models for the target DMU.Wang et al. [50] combined optimistic and pessimistic efficiencies into a geometric average efficiency to measure the overall performance of a DMU.The geometric average efficiency was deemed effective, as it was simultaneously an efficiency measure and a ranking index.Toloo and Tichy [45] proposed a multiplier model to identify the maximum efficiency scores and applied the envelopment model to attain the maximum discrimination among efficient DMUs .Khodabakhshi and Aryavash [23] used a double frontier DEA procedure to introduce a new cross-efficiency method; the merit of their approach lied on the nonuse of any alternative secondary goal.Based on the ideal and anti-ideal DMUs, Liu and Wang [29] developed the normalised efficiency metric and then formulated two DEA models to obtain its lower and upper bounds.Örkcü et al. [35] proposed a non-cooperative game like iterative optimistic-pessimistic DEA approach to fully rank the DMUs.Badiezadeh et al. [5] were, to our knowledge, the first to conceive the idea of considering optimistic-pessimistic DEA models under a network DEA context to evaluate the performance of a sustainable supply-chain management.
With the exception of Badiezadeh et al. [5], the majority of the existing studies on the double frontier DEA models are concerned with a system handled as a whole unit, ignoring its internal structure.Several studies illustrate that this condition might produce misleading results [22].In reality, systems can be composed of two sub-stages operating interdependently.In this paper, we will extend our selected optimistic-pessimistic ranking procedure to a two-stage tandem system to not only measure the efficiency of the overall system and its individual stages' efficiencies; thus, the stage that causes inefficiencies can be identified.
The optimistic and pessimistic self-efficiency scores can be unified via the geometric average efficiency.As shown in Wang et al. [50], this score has a better discriminating power than either of the opposing efficiencies.Yet, this feature has not been explored in a network environment, implying the possible existence of a non-unique ranking.It also considers the effects of the optimistic and pessimistic standpoints only within the self-appraisal context.The integration of the geometric average score in a peer-appraisal context would contribute to the assessment of a DMU in terms of the weight sets of other players, leading to a more logical ranking.These points make us infer that this framework could be further extended by the use of the cross-efficiency (CE) to ensure more fairness in the evaluation outcomes.
The CE concept is based on the peer-evaluation notion [42].As stressed by Anderson et al. [1], CE improves the probability of obtaining a unique ranking.A shortcoming of the CE is the non-uniqueness of DEA optimal weights, leading to the non-uniqueness of cross-efficiencies.Remedial actions have been suggested towards the adoption of secondary goals in an aim to select unique optimal multipliers [10,[26][27][28]52,57].The non-uniqueness issue is also critical in a two-stage (network) system [18,22,[32][33][34].Kao and Liu [22], for instance, developed an aggressive CE model to measure the efficiency in two basic network structures.Örkcü et al. [34] came up with a neutral CE model in a two-stage system, which is indifferent to the preference choice between the aggressive and benevolent formulations.Doyle and Green [10] introduced an aggressive and a benevolent secondary goal model to remedy the nonuniqueness of the optimal weights.The former ensures the minimisation and the latter the maximisation of the cross-efficiencies of all other DMUs, whilst both maintaining the optimistic self-efficiency of the target DMU.The use of any formulation of the two may be subject to an individual judgement, possibly leading to an irrational selection of either model.There is also no confirmation that these formulations will result in the same ranking or that their optimal set of multipliers are unique [46].
To alleviate these deficiencies, Yang et al. [53] suggested the "interval CE" for the exploration of the crossefficiencies in a weight space considering all the weight profiles, within the single-stage DEA structure.In such a peer-appraisal setting, the base DMU is assessed regarding the most unfavourable and favourable weight profiles of each of the other DMUs.The aggressive and benevolent models of this process were, however, keeping only the optimistic self-efficiency value of each DMU fixed.
In summary, this paper adapts an optimistic-pessimistic DEA approach in the light of the two-stage tandem system, in order to then support the interval CE method in such a network system.Using the proposed framework as shown in Figure 1, a meaningful evaluation and ranking of the considered DMUs is attained.Decision makers will be enabled to simultaneously consider: (i) both the optimistic and the pessimistic viewpoints within the self-appraisal context, and (ii) the most favourable and unfavourable weight sets of each of the other DMUs in a peer-appraisal setting.We believe that the combination of the methods that compose our framework has not been considered before in the literature; in our view, this could lead to a meaningful ranking in addition to it being adjusted to a two-stage tandem DEA structure.
The procedures implemented in the first three steps of our proposed framework (Fig. 1) have been applied in several studies (e.g., [50]) that focus on double frontier DEA models to evaluate DMUs in a self-appraisal context in a single-stage structure.As for these steps, our study differs in that our optimistic-pessimistic DEA models, which are inspired by the studies of Wang and Luo [48] and Wu [51], are built towards the two-stage tandem (network) system.
The remaining steps of the proposed framework pursue to support the peer-evaluation of the considered DMUs via the customisation of the interval CE method to the specifications of the two-stage tandem structure while embedding the respective combined self-efficiency measure (that considers the effects of both opposing standpoints).To rank the DMUs in the interval CE matrix of the corresponding flow, this paper views this matrix as a multi-criteria decision-making problem.To solve this problem, we implement the goal programming method of Wang and Elhag [47] to obtain the interval local weight of each criterion.To delineate the interval global weight of each alternative, we suggest a pair of linear programming models, introduced by Entani and Tanaka [13].Finally, we apply the grey relational analysis [25] for ranking the interval global weights.To our knowledge, the aforementioned well-established approaches have not been previously considered for extracting valuable information from an interval CE matrix.We have also shown that our proposed framework offers a more informative assessment of the units under consideration than particular existing methods in network DEA-relevant literature.
The remainder of the paper is organised as follows.Section 2 shortly describes the preliminaries and the methodological background.Section 3 proposes the framework to meaningfully rank DMUs.Section 4 illustrates the methods with a numerical example.Section 5 presents conclusions and further research.

Methodological background
We assume that each DMU  ( = 1, 2, . .., ) uses  inputs ( = 1, 2, . .., ) to produce  outputs ( = 1, 2, . .., ).Let   be the input value of  ∈  for DMU  ∈  and   be the output value of  ∈  for DMU  ∈  .We estimate the optimistic self-efficiency for each DMU, based on determining an optimal set of the most favourable input and output weights.The conventional input-oriented CCR DEA model [7], that assesses the efficiency of the target DMU  , is illustrated as follows: where   ,   are the th output and the th input weights for DMU  , respectively.If the optimal (optimistic) self-efficiency  *  = 1, then DMU  is called DEA efficient; otherwise it is said to be DEA inefficient.A significant challenge of the conventional single-stage DEA model, is to distinguish the efficient DMUs and thus to acquire a unique ranking of the DMUs.A potential remedy to overcome this inability is the implementation of the CE concept [42].Let  *  and  *  be the optimal set of multipliers of model (2.1).Then,  *  = ∑︀  =1  *    is the optimal self-efficiency score of DMU  and reflects its desire to be assessed only on the basis of its own most favourable weights.On the other hand, CE, in which peer-appraisal is the main notion, evaluates each DMU, considering the weight profiles of all DMUs.The ratio   = ∑︀  =1  *    / ∑︀  =1  *    denotes the individual cross-efficiency of DMU  , based on the optimal weight scheme of DMU  .A CE matrix (Tab. 1) can be a valuable tool to integrate both the peer-efficiency scores   (,  = 1, 2, . .., ) and the selfefficiency scores   (in the leading diagonal column).The ultimate cross-efficiency can be defined by averaging all individual cross-efficiencies of the corresponding DMU being evaluated.The ultimate score in this case is ê . The existence of multiple optimal weights from model (2.1) can deteriorate the theoretical usefulness of the results obtained via the cross-efficiency concept.To tackle this issue, Doyle and Green [10] proposed two opposed secondary goals to choose their weights, favourable or unfavourable, among the optimal solutions.
Considering the DEA-related literature, there is not a well-established methodological approach to guide the decision-maker in reasonably selecting either the benevolent or the aggressive strategy.In addition, the selection of either the former or the latter model might not provide the same ranking or a unique optimal set of weights.To overcome these obstacles, Yang et al. [53] suggested the simultaneous use of the two extreme cases in the context of a single-stage structure.Model (2.2) is an aggressive-based model to obtain an optimal set of multipliers and thus to identify the minimum individual cross-efficiency value of DMU  based on DMU  .3) make use of unfavourable and favourable multipliers, respectively, to identify the individual cross-efficiencies towards the single-stage structure.In either case, only the optimistic self-efficiency measure is involved to accommodate their purpose.
In Section 3.1, a combined self-efficiency score is obtained indicating the merger of the optimistic and pessimistic self-efficiencies.That score is embedded to the adjusted cross-efficiency models (Sect.3.2) to explore the effect of both opposing viewpoints.The above-mentioned processes are part of a broader framework presented herein to reasonably rank DMUs towards the two-stage tandem structure.

Models development
The exploration of the internal processes taking place in the core of a DMU sets the foundation for the transition from a single-stage to a two-stage DEA structure.Each DMU  ( = 1, 2, . .., ) consumes  inputs ( = 1, 2, . .., ) in the first stage to generate  intermediate products ( = 1, 2, . .., ).The outputs (intermediate measures) of the first stage are converted into inputs in the second stage to produce  final outputs ( = 1, 2, . .., ).Let   be the input value of  ∈  ,   be the intermediate product of  ∈ , and   be the output value of  ∈ , for DMU  ∈  [21].The above process is illustrated in the exploratory Figure 2.
According to the relational model of Kao and Hwang [21], to measure the performance of the overall system it is necessary to consider not only its operations, but also the operations of its individual sub-stages.In model (3.1) these operations are described by the constraints, which indicate that the aggregate output can not exceed the aggregate input.)︁ .It is obvious that the overall efficiency is the product of the efficiencies of the stage efficiencies.

Optimistic & pessimistic models in basic two-stage structure
The above model can set the basis for the exploration of the optimistic and pessimistic self-efficiencies and, in turn, their integration into a geometric average efficiency score within the two-stage tandem system.
Sub-stage 1 consumes inputs to generate intermediate products.The following input-oriented CCR model (3.2) [21] examines the performance of sub-stage 1: With reference to sub-stage 1 of a basic two-stage DEA structure, two fundamental concepts, the IDMU and the ADMU, are introduced, following the principles of Wang and Luo [48] [16], the performance of the IDMU cannot be worse than any of the actual DMUs, and the performance of the ADMU cannot be better than that of the worst performing actual DMU.
The best and worst relative efficiency scores in terms of sub-stage 1 can be defined by the following two CCR models, respective to the IDMU and the ADMU; they are related to Wang and Luo [48] and Wu's [51] models: where  IDMU(1)* is the optimal optimistic score of IDMU in terms of sub-stage 1, obtained in model (3.3).Model (3.4) ensures that the best relative efficiency of sub-stage 1 is fixed at a value greater than or equal to  IDMU(1)* .By the same token, we establish the definitions as well as formulate the appropriate optimisation models for the IDMU and the ADMU, regarding sub-stage 2 of the basic two-stage structure.Note that sub-stage 2 focuses on the consumption of intermediate products for the generation of the final outputs.
The next stage concerns the determination of the optimistic and pessimistic efficiency scores of the IDMU and the ADMU, respectively, in terms of the overall system.The reference model is the relational twostage DEA model (3.1).The efficiency of the IDMU for the entire system can be defined as . The factor weights   and   are assigned to the th output and the th input, respectively.We thus construct the following LP model that aims to maximise the efficiency of the IDMU.
The efficiency of the ADMU for the entire system can be illustrated as ).The associated optimisation model is formulated as follows: Model (3.6) aims to minimise the pessimistic efficiency measure of the ADMU, while keeping the optimistic efficiency of the IDMU for the overall system no less than  IDMU()* .It should be noted that the second and third sets of constraints in both models imply that the overall efficiency of DMU cannot exceed 1.
The next point to focus on in this paper is the examination of the highest and the lowest relative efficiency of each DMU, considering their self-evaluation.In model (3.7), the optimistic relative efficiency of DMU  for the sub-stage 1 is examined while  IDMU(1)* is kept fixed; it is related to Wang and Luo's [48] framework: In the same manner, we construct the counterpart model for measuring the highest relative efficiency of DMU  for the sub-stage 2, considering  IDMU(2)* as the fixed parameter.
The overall optimistic efficiency score of DMU  can be determined as It is clear that this measure is the product of the optimistic efficiencies of the DMU  of the two sub-stages, adopting the principle of the multiplicative efficiency decomposition approach [21].Thus, we propose model (3.8), that maximises the above ratio.
The fourth and fifth constraints indicate that  IDMU(1)* and  IDMU(2)* , respectively, remain unchanged.Let , be an optimal solution to model (3.8).For DMU  , , which are referred to as optimistic self-efficiency measures with respect to the overall system and its sub-stages, respectively.
Then, model (3.9) evaluates the worst relative efficiency of DMU  , in terms of sub-stage 1, while the parameter  ADMU(1)* takes the value as determined previously from model (3.4).This model is related to Wu's [51] framework.
Similarly, we formulate the counterpart model for measuring the lowest relative efficiency of DMU  for the sub-stage 2, considering  ADMU(2)* as the unchanged parameter.
The overall pessimistic score of DMU  can be determined as and denotes the product of the pessimistic efficiencies of the DMU  of the two sub-stages.Thus, we suggest model (3.10), whose purpose is to minimise the above ratio. ADMU(1)* and  ADMU(2)* are maintained.
, be an optimal solution to model (3.10).For DMU  , , which are referred to as pessimistic self-efficiency measures with respect to the overall system and its constituent parts, respectively.
Consequently, in a two-stage DEA structure, a self-efficiency interval is formulated for each DMU under consideration, both for the overall system and its constituent stages.For instance, considering the overall system, an efficiency interval denoted by There is a clear need to integrate both optimistic and pessimistic self-efficiency measures to provide an overall assessment of the performance of each DMU in a two-stage DEA process.This study adopts the geometric average efficiency measure, proposed and verified by Wang et al. [50], to meet this requirement.Let be the combined self-efficiency measure of DMU  , where  =  (overall system) or 1 (sub-stage 1) or 2 (sub-stage 2).We easily prove that the combined self-efficiency score of DMU  for the overall system is the product of the combined self-efficiency measures of DMU  for the two sub-stages: . The geometric average efficiency is an approachable efficiency measure that leads to a fairer ranking index [50].However, we should consider that it sheds light on the effects of the optimistic and pessimistic standpoints only within the self-appraisal context.In other words, each DMU is assessed, based on its own most favourable and unfavourable weights, without considering the weight scheme of each of the other DMUs.This score also ensures a better discriminating power than either of the optimistic and pessimistic efficiencies [50].Yet, this feature has not been explored in a more complex network structure.To this end, in the next section, the double frontier DEA models are further extended by the use of the interval CE within a two-stage tandem system, to ensure a more logical ranking order.

Interval cross-efficiencies in basic two-stage structure
In this section, we will propose the customisation and simultaneous use of the traditional aggressive and benevolent secondary models in the context of the basic two-stage DEA structure with combined self-efficiencies, obtained in Section 3.1.Their purpose is the determination of the minimum and maximum individual crossefficiencies of DMU  , with respect to the optimal weight scheme of DMU  (,  = 1, 2, . .., ), respectively.A fruitful aspect we believe, is the integration of the combined self-efficiency score for the corresponding system/stage within the CE process.This is irrespective of the type of multipliers, favourable for a benevolent or unfavourable for an aggressive strategy, that are used to capture the cross-efficiencies.
We initially adopt an aggressive strategy to establish the following minimisation model: are the crisp combined self-efficiency measures of the system and the sub-stage 2 for DMU  , respectively, obtained from Section 3.1.The second and third constraint maintain combined system and sub-stage efficiencies for DMUs.Model (3.11) pursues to minimise the cross-efficiency value of DMU  under the condition that the combined self-efficiency scores for the overall system and its constituent parts remain unchanged.At optimality, the minimum individual cross-efficiencies of DMU  based on DMU  ( ̸ = ) for the overall system, the stage 1, and the stage 2, are determined as  )︁ , respectively.By the same token, a benevolent strategy is implemented to construct the following maximisation model: subject to the same constraints as in model (3.11).(3.12)This model seeks to maximise the cross-efficiency of DMU  given that the combined self-efficiency measures are kept fixed for the overall system and its sub-stages.Similarly, we define the maximum individual cross-efficiencies of DMU  for the system and its stages.
In terms of , where  =  (overall system), 1 (stage 1) or 2 (stage 2), for DMU  , its cross-efficiency rated by is the lower bound and

U(𝜖) 𝑘𝑗
is the upper bound.Therefore, three generalised interval CE matrices (based on the concept of Tab. 2) are shaped for the  DMUs, in regard to the overall system, the stage 1, and the stage 2, respectively.The diagonal column in each of these matrices demonstrates the special case in which ∀, where  = , 1 or 2. The recently created interval CE matrices can be viewed as MCDM problems.Taking that into consideration, we will set the scene for the determination of the interval local weights of criteria and the interval global weights of alternatives (ultimate cross-efficiencies) to fully rank the DMUs, in a basic two-stage DEA structure.

Interval cross-efficiencies and MCDM context
Each generalised interval CE matrix (see Sect. 3.2) can be treated as a multi-criteria decision making (MCDM) problem with  = 1, 2, . ..,  DMUs that act as alternatives.Each DMU  is assessed considering the weight profile of  = 1, 2, . ..,  DMUs that act as criteria.Interestingly, the former intuition is attributed to the novel study of Cook et al. [8], according to which each DEA-related problem could be viewed as a multi-criteria evaluation problem.This has also been consolidated by Rakhshan [37], who argues that the combination of the MCDM and the DEA tools could mitigate their drawbacks when applied as stand-alone techniques.
Our primary target is to estimate the interval ultimate cross-efficiency scores, which are the interval global weights for the evaluated DMUs.To this end, our approach is twofold as it requires not only the local weights of alternatives with respect to a certain criterion, but also the local weights of criteria.The former are the elements  ]︁ .These elements have been obtained in Section 3.2.The latter illustrates the local weight of criterion , that is manifested as an interval value with lower bound  L  and upper bound  U  .The existence of this interval value is due to dealing with two diametrically opposed strategies for the overall system and its constituent stages.
Wang and Elhag [47] suggest a goal programming (GP) method to elicit normalised interval local weights from an interval comparison matrix.In our scenario, the interval CE matrix is committed to undertaking the role of the interval comparison matrix.Their method captures the lower and upper limits of the local weight of criterion  ( = 1, 2, . .., ) without ignoring the interval individual cross-efficiencies and the potential existence of uncertainty.We will provide their optimisation model as we would apply this within the basic 2-stage series structure: where ]︁ .Their approach might make sense in our study for two reasons.It has a greater scope for action due to its compatibility with any interval comparison matrix, and involves less constraints than other methods such as that of Sugihara et al. [43].This enables it as an easier-to-use method for the DM.The fewer number of constraints was owed to its practice, putting more emphasis on the matrix as a whole rather than on each element individually.Wang and Elhag's [47] technique has, to our knowledge, not received attention on eliciting interval local weights from an interval CE matrix.Therefore, this section intends to use their approach to achieve this goal., Taking the interval local weight for each criterion  and the interval local weight of each alternative  with respect to criterion  into account, we determine the interval ultimate cross-efficiencies for the alternatives.We recommend using the practical method of Entani and Tanaka [13] that is based on a pair of linear programming (LP) models.Their approach treats the local weights of criteria as decision variables to be optimised and intends to determine the global weights for each DMU.The pair of LP models is described as follows: ]︁ for the entire system and its sub-stages.Table 3 illustrates the synthesis of the interval cross-efficiencies.

Grey relational analysis for ranking DMUs
In Section 3.3, we obtained an interval ultimate cross-efficiency score for DMU  ( = 1, 2, . .., ).It is apparent that there is a significant need to identify a simple yet efficient ranking approach for comparing and ranking different DMUs, whose performance is expressed in the form of interval values.In this study, the Grey Relational Analysis (GRA) is applied to obtain a unique ranking order for the DMUs, whose ultimate cross-efficiencies are illustrated within certain boundaries, and thus to determine the most desirable alternative.GRA is based on the grey system theory proposed by Julong [20].It has proved to be a worthy methodological tool when uncertain information emerges.GRA has fruitfully examined complex interconnections among several factors [6] as well as obtained the optimal alternative among several alternatives [25,41].
GRA consists of four main steps: grey relational generating, reference sequence definition, grey relational coefficient calculation, and grey relational grade (GRG) calculation.In a first step, GRA translates the existing performance of all alternatives into comparability sequence.According to the comparability sequence, an ideal target sequence (reference sequence) is defined in the reference sequence definition (second step).In a third step, a grey relational coefficient is calculated to illustrate the distance between the comparability and the reference sequence.In a final step, the GRG between the reference and every comparability sequence is calculated, based on the grey relational coefficient.If the comparability sequence of an alternative has the highest grey relational grade, then this alternative is deemed as the most desirable one [25].Below, we will provide an overview of the GRA as we would apply this to ranking interval ultimate cross-efficiencies.
To start with, we collect the data to be evaluated from the mathematical viewpoint.The interval ultimate cross-efficiency scores, defined in Section 3.3, are gathered into a  ⊗ 2 matrix, setting out the appropriate conditions for translating the DMUs into alternatives and the two extreme cases (lower bound, upper bound) into criteria.Hence, we form another MCDM problem with  = 1, 2, . ..,  alternatives that are assessed by  = 1, 2 attributes.Finally, the GRG Γ , which is the weighted average of the grey relational coefficients, is estimated as Γ , ∀, where   is the weight of the criterion  and can be more prone to subjective modifications by a DM.Nevertheless, it is possible to delineate it with the use of an objective method [19].Besides, ∑︀ 2 =1   = 1.We should emphasise that GRG only ranks the alternatives; thus, it is not an efficiency measure.The DMU with the highest GRG is placed first.
To conclude, GRA is considered as an efficient ranking tool not only for traditional MCDM problems [25], but also for efficiency evaluation DEA problems as a MCDM context in disguise [41].Nevertheless, GRA has, to our knowledge, not yet received explicit attention on ranking interval values and, in particular, interval ultimate cross-efficiencies within an interval CE matrix.Hence, this section has aspired to attain this target, in the light of a meaningful prioritisation of the DMUs.

Numerical application
This section illustrates the use of the mathematical models presented in Section 3 to meaningfully evaluate and rank the DMUs.There are two salient factors that evaluate each DMU within the two-stage tandem structure herein: (i) the optimistic and pessimistic efficiency scores within a self-evaluation context, and (ii) the most favourable and unfavourable weight sets of each of the other DMUs, in a peer-appraisal setting that integrates the combined self-efficiency measure.
The numerical example drawn from Zhou et al. [55] is used for illustrative purposes.In Table 5, ten bank branches of China Construction Bank in Anhui are assessed within the two-stage tandem structure (see Fig. 2).The employee ( 1 ), the fixed assets ( 2 ), and the expenses ( 3 ) are the input resources of the first stage to be consumed to produce the intermediate products; the credit ( 1 ) and the inter-bank loans ( 2 ).The latter are used as inputs in the second stage to generate the final outputs; the loan ( 1 ) and the profit ( 2 ).For modelling, running, and analysing our data, we have utilised the programming language Python 3.7.6 and in particular the version 2.1 of PuLP as the free linear programming library.The experiment ran on a computer with 16GB RAM.
In our framework, we first consider determining the best and worst relative efficiencies of the IDMU and the ADMU, respectively, for the overall system and its individual stages.Table 6 exhibits the corresponding scores from solving models (3.3)-(3.6),introduced in Section 3.1.
Then, models (3.7)-(3.10)are used to obtain the highest and the lowest relative efficiency scores of the target DMU  in terms of the overall system, the stage 1, and the stage 2.These scores are given in Table 7. Recall that these relative self-efficiency scores indicate their distance from the respective IDMU and ADMU efficiencies, Table 6.Highest and lowest relative efficiency scores for the overall system, stage 1, and stage 2.
IDMU (1)  2.41405  ADMU(1) 0.05162  IDMU (2)  10.92813  ADMU(2) 0.00550  IDMU(s)  2.41405  ADMU(s) 0.00469 presented in Table 6.These scores are also accompanied by the combined self-efficiency ratings for each DMU and system/stage.The numbers in parentheses illustrate the rankings of the corresponding bank branches for each type of efficiency measure.
In Table 7, no matter what point of view efficiency is measured from, DMU 1 is certainly the best unit and DMU 10 is the worst unit, in terms of the entire system (second expanded column).Considering stage 1, regardless of the viewpoint, DMU 1 and DMU 3 are the most and least desirable units, respectively (third expanded column).In stage 2 (fourth expanded column), DMU 10 is deemed as the least promising unit.However, there is no correspondence between the optimistic and pessimistic perspectives regarding the best unit.Notably, none of the 10 bank branches perform efficiently in both stages and viewpoints.This is, for instance, seen in the non-efficient overall optimistic self-efficiency scores (︁ , where the highest score is 0.8132 occurring at DMU 1, followed by 0.3490 occurring at DMU 6. The next focal point of the framework is the geometric aggregation of the optimistic and pessimistic perspectives, to build a combined self-efficiency measure for each DMU, with respect to the system (︁  )︁ .In Table 7, DMU 1 has the best performance among all units, reflecting the two opposed standpoints.This is completely verified by the consistent results of the overall system and the stage 1.Nevertheless, regarding stage 2, there is a significant inconsistency between the optimistic and the pessimistic efficiency.In detail, DMU 1 receives a moderate rating (0.8132) with respect to the optimistic aspect.This rating is compensated by its exceptional pessimistic performance (0.0760).The overall performance of bank branch 1 is also grievously higher than the corresponding performance of all others.For instance, in stage 1 the combined self-efficiency score of DMU 1 approximates 0.51, whereas the corresponding rating of DMU 2 (in the second place) equals to 0.2733.The geometric average efficiency also indicates that DMU 10 has the worst performance in terms of the overall system and the stage 2.
The combined self-efficiencies calculated for every DMU, satisfy the sound mathematical property that the overall system combined self-efficiency score is the product of the two sub-stages, as discussed in Section 3.1.As an example, the combined self-efficiencies calculated for DMU 1 satisfy 0.1267 = 0.5099 * 0.2486.Since this property is satisfied, every  comb()  is no greater than its corresponding  .However, after implementing the Wilcoxon's matched-pairs signed-ranks test [9] we found that there is not sufficient evidence to say that the average efficiency measures of these two sub-stages are not equal.This may be due to the limited sample under examination.In addition, it is noteworthy that the difference between ratings and ranks of the combined self-efficiency measures in all stages is quite significant for several bank branches.For instance, the rank of DMU 3 for the overall system, the stage 1, and the stage 2, is 8, 10, and 2, respectively, indicating that at least 6 ranks difference exists.A large difference may reveal the source that causes the inefficiency of the overall system.For example, DMUs 2 and 5 perform satisfactorily in stage 1 (as compared to stage 2) whereas DMUs 3 and 8 perform satisfactorily in stage 2 (as compared to stage 1).Decomposing the overall system combined self-efficiency score into the product of its two sub-stages, may assist the respective bank branch in identifying the sub-stage that triggers inefficiency.
The combined self-efficiency measures obtained with our proposed method (see the respective columns of Tab.7) are also compared with the respective scores (Tab.8) obtained with Kao and Hwang's [21] approach.As mentioned in Section 3, the latter approach aims to explore the efficiency decomposition in a two-stage production process by taking into consideration the series relationship of the two sub-stages.Their relational model (see model (3.1)) was found to be reliable in terms of measuring overall and division efficiencies along with the better identification of the causes of inefficiency.Our study has applied their relational model to further analyse and validate the dataset provided in Table 5, by measuring the efficiencies of the whole process and its constituent sub-stages for the ten DMUs.In Table 8, the self-efficiency scores along with their ranks of the overall system, the stage 1, and the stage 2, are depicted in the second, third, and fourth column, respectively.The rankings of the two models with respect to the overall system are quite similar, showing that the largest difference is 1 occurring at the bank branches 2, 3, and 8.The rankings of the two models with respect to sub-stage 1 are also quite close to each other.In the latter case, the largest difference occurs at DMU 7 with a rank difference of 4. The second largest difference occurs at DMUs 9 and 10 with a rank difference of 2. For the remaining 7 bank branches, their rank differences are less than 2. The rank differences look very similar even with the case of sub-stage 2. Correlation analysis suggests that there is a highly strong association between the ranks of these two approaches, as indicated by the Spearman coefficients [9] of 0.985 (overall system), 0.806 (stage 1), and 0.841 (stage 2), which are significant at the 0.01 level (two-tailed).This can be demonstrated even by the fact that both our method and Kao and Hwang's method identify DMU 1 as the best performer.However, our approach is more informative within the self-appraisal context, in that it not only considers the most favourable self-efficiency scores (as in [21]), but also the most unfavourable ones to obtain a more accurate and less misleading overall assessment for each DMU and flow.As a result, it puts emphasis on both sides of the same coin simultaneously.The above points further validate the rationale of our approach.
As discussed in Section 1, the geometric average efficiency is an easy-to-use measure with a good discriminating power amongst the evaluated DMUs.However, it may not be sufficiently strong in terms of leading to a unique ranking in this two-stage process.As a matter of fact, there is no absolute discrimination of some inefficient DMUs considering the combined self-evaluation results at each stage, presented in Table 7.In particular, in the overall system the DMUs 2 and 6 tied (0.0528) in the second place.Similarly, at stage 2 the DMUs 3 and 6 also tied (0.2094), sharing the second place.In such results, each DMU is self-assessed ignoring the weight profile of each of the other DMUs.Embedding the geometric average score into a peer context, would possibly contribute to a more comprehensive ranking.To this end, the proposed framework is further extended by the use of the interval CE.
The next step in our proposed approach concerns the implementation of the interval CE towards the evaluated network structure, as discussed in Section 3.2.Tables 9-11 showcase the interval individual cross-efficiencies of DMU  based on the optimal weight scheme of DMU  for the overall system, the stage 1, and the stage 2, respectively.In this case, each DMU is evaluated considering simultaneously an aggressive (model (3.11)) and a benevolent (model (3.12)) strategy; this originally creates an atmosphere of neutrality.
To make the content of Tables 9-11 comprehensible to the reader, it should be ideal to present a few examples.In the second column of Table 9, DMU 1 is assessed based on the weight profile of all other DMUs, except its own weight set.The minimum and maximum individual cross-efficiencies of DMU 1 based on the optimal weight scheme of DMU 2 are 0.1216 and 0.2371, respectively, for the overall system.In the fifth column of Table 10, DMU 4 is also peer-appraised with respect to the weight profile of all other DMUs.The minimum and maximum individual cross-efficiencies of DMU 4 based on the weight profile of DMU 10 are 0.1475 and 0.2281, respectively, for sub-stage 1. Table 11 determines in a similar manner the individual cross-efficiencies for each DMU, for the sub-stage 2. The diagonal leading column in each of these three matrices demonstrates the special case in which ∀, where  =  (overall system), 1 (stage 1) or 2 (stage 2).These are the combined self-efficiency scores.Clearly, the property of maintaining the combined self-efficiency measure for each DMU is satisfied both for the overall system and its individual stages; this accomplishes our efforts towards a more reasoned peer-appraisal setting that entails the effects of the optimistic and pessimistic viewpoints.Table 8.Self-efficiency ratings and ranks of the overall system, the stage 1, and the stage 2, with Kao and Hwang's [21] method.Recalling the discussion in Section 3.3, we view each interval CE matrix as a MCDM problem.In Tables 9-11, the ten DMUs (alternatives) located in their last 11 columns, are evaluated by the weight profiles of the ten DMUs (criteria) presented in their first column.To designate the interval global weights (interval ultimate crossefficiencies) in the last row of each of these matrices, it is required to determine the interval weight per criterion except the known interval individual cross-efficiencies.To start with, the interval weight of each criterion is determined in the second, third, and fourth column of Table 11, with respect to the overall system, stage 1, and stage 2, respectively.The interval weights are obtained via the GP model (3.13), and the interval global weights, according to the pair of optimisation models (3.14) and (3.15), as stated in Section 3.3.
For instance, in the second column of the last row of Table 9, we obtain the interval ultimate CE of DMU 1: [0.1000, 0.2611], where 0.1000 is the minimum and 0.2611 is the maximum CE score.The minimum score of DMU 1 for the overall system is estimated via solving model (3.14).The basic prerequisites of this model are to recognise the minimum individual cross-efficiencies of DMU 1 based on the weight profile of all ten DMUs (left side of column 2 of Tab. 9) and the interval weights per criterion for the overall system (column 2 of Tab.12).The maximum ultimate cross-efficiency of DMU 1 for the overall system is estimated via solving model (3.17).The basic prerequisites of this model are to identify the maximum individual cross-efficiencies of DMU 1 based on the weight profile of all 10 DMUs (right side of column 2 of Tab. 9) and the interval weights per criterion for the overall system (column 2 of Tab.12).
The final step of our methodological approach seeks for a unique and reasonable prioritisation of the interval ultimate cross-efficiencies via the established GRA, as discussed in Section 3.4.This step continues to allow the DMUs, located in the columns of the interval CE matrices mentioned above, to act as alternatives and to be assessed by two attributes; the first attribute concerns the minimum (worst condition) and the second attribute is pertinent to the maximum (best condition) ultimate CE of each DMU, towards the corresponding system/stage.The interval ultimate cross-efficiencies (last row of each of the Tabs.9-11) form the appropriate matrix, as shown in Table 4.
The data of performance values of the two attributes are subsequently normalised through the greater-thebetter equation (3.16); this choice reflects the necessity of pushing up the peer-efficiency of each DMU.The results are depicted in the second column of Appendix A for the overall system, of Appendix B for stage 1, and of Appendix C for stage 2. The grey relational distance calculation is also utilised to measure the distance between the reference sequence and the comparability sequence (normalised values), see the third column of each of the appendices.In addition, we compute the grey relational coefficient to explore how close the reference and the comparability sequences are.In this formula, the value of  may affect the size of the correlation degree distribution interval, thereby affecting the results of the correlation analysis.The value of  can be determined considering the DMU's tendency towards optimism-pessimism.Following [21], we have set  = 0.5 implying that the DMU has neither an optimistic nor a conservative attitude.The respective results are portrayed in the last column of each of the appendices.
The GRG and the rank for each DMU with respect to the overall system, the stage 1, and the stage 2, are illustrated in the second, third, and fourth column of Table 13, respectively.It is important to make two remarks about the process of obtaining the GRG: firstly, the relative importance weights of the two performance attributes were assumed to be equal ( 1 =  2 = 0.5) illustrating that the two extremes are of the same importance, and secondly, the GRG is just an index that only captures the rank rather than an efficiency measure.The unique final rank in Table 13 reflects the improvement of the discriminating power, as compared to the original rank derived from the combined self-efficiency measures in Table 7.This practically means that the non-dominated bank branches, which cannot be fully discriminated by the self-evaluation notion, can be discriminated by the methodologies followed in peer notion.In detail, DMU 10 is without a doubt the least desirable unit in all three cases.DMU 1 is also considered to be the most promising bank branch for the overall system and stage 1, while DMU 3 is the best unit according to stage 2. Generally, one can deduce that the ranking results for all branches (except DMU 10) are not consistent and may show a higher degree of uncertainty and inefficiency in specific stages.
The GRG grades obtained with our proposed framework (see Tab. 13) are also compared with the respective ultimate cross-efficiency ratings (Tab.14) obtained via the Kao and Liu's [21] approach.In their study, they applied the concept of cross-evaluation to measure the efficiency of basic (parallel & series) network structures.Their proposed aggressive-based secondary goal model was particularly able to decompose the cross-efficiency score of the overall system into the product of those of the internal sub-stages for the series structure.Our study has applied their aggressive-based model under the two-stage tandem series structure and the peer-appraisal setting to further analyse the dataset provided in Table 5.In Table 14, the peer-efficiency scores along with their ranks of the overall system, the stage 1, and the stage 2, are respectively presented in the second, third, and fourth column.Firstly, we have noticed that the multiplicative mathematical relationship between the overall system and its sub-stage efficiencies is indeed satisfied.For example, the ultimate cross-efficiency score of DMU 6 (0.446) is equal to the product of its sub-stage 1 (0.574) and sub-stage 2 (0.778) efficiencies.Secondly, the rankings of the two methods with respect to the overall system and the stage 1 are not significantly different based on a Spearman rank order correlation test with statistics of 0.964 and 0.830, respectively.These are significant at the 0.01 level (two-tailed).However, it is worthwhile to mention that DMU 10 has a difference of 3 ranks in terms of the evaluation of stage 1.Thirdly, as for the stage 2, the rankings from the two methods are not so close.The bank branch 2 is the extreme case with a rank difference of 6.The second largest difference occurs at DMU 8 with a rank difference of 4. All the remaining bank branches have a rank difference of no Table 13.Grey Relational Grade and ranks of the overall system, the stage 1, and the stage 2.

DMU GRG Γ (𝑠) 𝑗
Rank overall system GRG Γ more than 3. Statistically, this situation is even further validated by the Spearman coefficient of 0.503, which implies a moderate association between the rankings of the two methods.Finally, Kao and Liu's [22] approach only considers the most unfavourable weight sets of each of the other DMUs, while keeping the optimistic self-efficiency score constant.However, our study is more multi-dimensional since it simultaneously takes into account the most favourable and unfavourable weight sets of each of the other players, while integrating the respective combined self-efficiency measure.Finally, it can be statistically inferred that the rankings of the DMUs obtained from the combined selfefficiency measures (self-appraisal), and the grey relational grades after showing peer-appraisal, are similar with respect to the overall system and its sub-stages.As an example, for the overall system, according to the Spearman correlation test [9], the   = 0.948.This indicates that under the significance level of 0.01, there is a strong positive association between the ranking values of the DMUs obtained by the two separate conditions (self-appraisal & peer-appraisal), confirming the validity of our framework.Exceptions are considered the DMUs 1, 6, and 8 within the evaluation of the second sub-stage, where there is a larger rank difference of 3.This could be justified by the nature of the self-appraisal setting to let each bank branch to be evaluated based only on its own (favourable and unfavourable) standpoint, while the peer-appraisal setting expects the bank branches to be evaluated from the (favourable and unfavourable) standpoint of all branches.

Conclusions & future research
This paper has provided new insight into the attainment of a meaningful and unique ranking of DMUs under a two-stage tandem (network) structure.In particular, it extends the selected optimistic-pessimistic DEA models into the two-stage tandem system, to then complement the interval CE method within such a system.Decision makers are offered with the chance of evaluating the performance of the DMUs by considering: (i) the optimistic and pessimistic self-efficiency scores, and (ii) the most favourable and unfavourable weight profiles of each of the other DMUs in a peer-appraisal setting.In this study, we have introduced a 7-step methodological approach, as shown in Figure 1, which combines existing methods from the literature in a novel way.This approach supports the aforementioned conditions and ensures more multi-dimensional evaluation outcomes.
The procedures implemented in the first three steps of our framework indicate how the optimistic and pessimistic DEA models, which are inspired by the studies of Wang and Luo [48] and Wu [51], are built towards the more realistic two-stage tandem system that better reflects the complex interconnections among its internal sub-systems.The DMUs are initially evaluated, based on their own most favourable (optimistic) and unfavourable (pessimistic) optimal multipliers, and then are aggregated into a combined self-efficiency measure via the geometric average.
The remaining steps of our framework ensure the peer-evaluation of the DMUs via the customisation of the interval CE method to the specifications of the two-stage tandem structure while keeping the combined selfefficiency measure unchanged.To rank all DMUs in the interval CE matrix of the corresponding flow, the study introduces an alternative novel use of the GP method of Wang and Elhag [47], the LP models by Entani and Tanaka [13], and the GRA of Kuo et al. [25].The combination of such well-established techniques for extracting valuable insights from an interval CE matrix has not been considered before.This combination underpins the wider MCDM context to which the elements of the interval CE matrix belong.
We envisage that our study could be applicable in several areas.In the non-life insurance industry [21], for example, operations consist of the insurance service and the capital investment.Customers pay direct written and reinsurance premiums, which are then invested in a portfolio to earn underwriting profit.Another promising area would be the evaluation of the performance of the high-technology industry that is decomposed into the technology development and the economic application [54].In this two-stage tandem network, raw data and knowledge are processed into technological achievements, which are then transformed into economic development.A third application connects our study's methodological framework with the operational activities of the international shipping industry; these could be divided into the supervision of the ship dispatching management and the control of the working time in the port [15].Finally, the efficiency evaluation of two-stage (food) supply chains of different factories or farming communities [24] could also serve the goals of our paper.For instance, the process of the refinement of selected cocoa beans into milk/dark chocolate and the production of black tea through withering, fermentation, drying, and sieving across a number of specialised factory branches could further highlight the importance of our evaluation and ranking framework.
This paper treats the two sub-stages of a DMU equally.In reality, however, there might be a certain degree of leader-follower relationship between the upstream and downstream of a particular DMU.We acknowledge this as a limitation of our study and we believe that the introduction of relative weights for the different stages when calculating overall efficiency could accommodate such an issue.In addition, one of the main steps of the grey relational analysis methodology, used to rank the interval ultimate cross-efficiencies within an interval cross-efficiency matrix, is the calculation of the grey relational grade.It is defined as the weighted average of the grey relational coefficients, where the weight of the respective criterion is subjectively determined by the decision maker.To better reflect the reality, we would have taken advantage of an existing powerful multi-criteria decision-making method, such as the analytic network process [39] or the best-worst method [38], to identify in an objective manner the weights.We have also recognised that the grey relational grade is just an index that can only capture the rank rather than an efficiency measure.In other words, there is no sufficient information that would allow the identification of the DEA-efficient DMUs that constitute the best-practice frontier.However, we acknowledge that the GRA technique has not received attention on ranking interval cross-efficiencies within an
, which act as lower-level and upper-level local weights of alternative  in reference to criterion  for  = , 1 or 2, respectively, and overall compose [ is a  ⊗  unit matrix whose elements on the diagonal are 1, and  L and  U are the minimum and maximum individual cross-efficiency matrices, whose elements are in the form of  L()  and  U()  respectively.The deviation vectors ∆ + , ∆ − , Γ + , Γ − , that appear in the first two constraint sets, pursue to eliminate the uncertainty and connect the lower level criteria  L with the upper level criteria  U .The third and fourth sets of constraints ensure the normalisation of the local interval weights, whereas the fifth constraint set determines their lower and upper bounds.Model (3.13) should, in effect, run three times, based on the investigation of the interval CE matrix of the respective system and stage to compose [ subject to the same constraints as in model(3.14),(3.15)where  ()  is the decision variable of the th local criterion weight ( = 1, 2, . .., ) for  =  (overall system), 1 (stage 1) or 2 (stage 2).The above pair of LP models(3.14)and (3.15) results in the interval global weight for each alternative  ( = 1, 2, . .., ), denoted by [︁  L.B.()  ,  U.B.() to be noted is that most bank branches have a smaller  comb

Table 2 .
[53]rval cross-efficiency matrix[53]. of the basic properties of the traditional CE concept intact.Overall, in this peer-evaluation procedure an interval individual cross-efficiency score of DMU  in terms of DMU  is formed and lies in the . IDMU is a hypothetical DMU that utilises the least amount of inputs to generate the most intermediate products.An ADMU, on the other side, uses the most inputs to produce the least intermediate products.The IDMU can be expressed with the vectors ( min ,  max ), where  min  = min  {  } and  max  = max  {  }, ∀ , .The ADMU can be determined with the vectors ( max ,  min ), where  max  = max  {  } and  min  = min  {  }, ∀ , .As stressed in Hatami-Marbini et al.

Table 3 .
Synthesis of interval cross-efficiencies.

Table 5 .
[55]e 4 depicts what we described above.The th alternative can be expressed as  The numerical application of Zhou et al.[55]. is used to expand or squeeze the range of the grey relational coefficient.

Table 7 .
Self-efficiency ratings and ranks of the overall system, the stage 1, and the stage 2, with the proposed method.

Table 9 .
Interval cross-efficiencies for the overall system.

Table 10 .
Interval cross-efficiencies for the stage 1.

Table 11 .
Interval cross-efficiencies for the stage 2.

Table 12 .
Interval weights per criterion for the overall system, the stage 1, and the stage 2.

Table 14 .
[22]-efficiency ratings and ranks of the overall system, the stage 1, and the stage 2, with Kao and Liu's[22]method.