Leveraging reinforcement learning and evolutionary algorithm to solve Bi-level combinatorial optimization problem

CoDIT 2024-DO_TAP

Open Access

Issue		RAIRO-Oper. Res. Volume 60, Number 2, March-April 2026 CoDIT 2024-DO_TAP


Page(s)		1103 - 1126
DOI		https://doi.org/10.1051/ro/2026026
Published online		15 April 2026

M. Abbassi, A. Chaabani, L.B. Said and N. Absi, An approximation-based chemical reaction algorithm for combinatorial multi-objective bi-level optimization problems, in Proceedings of 2021 IEEE Congress on Evolutionary Computation (CEC) (2021) 1627–1634. https://doi.org/10.1109/CEC45853.2021.9504711. [Google Scholar]
L.G. Acuna, D.R. Rios, C.P. Arboleda and E.G. Ponzon, Cooperation model in the electricity energy market using bi-level optimization and Shapley value. Oper. Res. Perspect. 5 (2018) 161–168. [Google Scholar]
V. Audigier, F. Husson and J. Josse, MIMCA: multiple imputation for categorical variables with multiple correspondence analysis. Stat. Comput. 27 (2017) 501–518. [Google Scholar]
A. Azzouz, A. Chaabani, M. Ennigrou and L.B. Said, Handling sequence-dependent setup time flexible job shop problem with learning and deterioration considerations using evolutionary bilevel optimization. Appl. Artif. Intell. 34 (2020) 433–455. [Google Scholar]
H. Bai, R. Cheng and Y. Jin, Evolutionary reinforcement learning: a survey. Intell. Comput. 2 (2023) 0025. https://doi.org/10.34133/icomputing.0025. [Google Scholar]
L. Breiman, Random forests. Mach. Learn. 45 (2001) 5–32. [CrossRef] [Google Scholar]
H.I. Calvete, C. Gale and M.-J. Oliveros, A hybrid algorithm for solving a bilevel production-distribution planning problem, in Modeling and Simulation in Engineering, Economics, and Management MS 2013. Vol. 145 of Lecture Notes in Business Information Processing. Springer, Berlin (2013) 138–144. [Google Scholar]
A. Chaabani, EVORL CODBA: a new efficient evolutionary reinforcement learning algorithm for bi-level combinatorial optimization problem. GitHub repository (2025). https://github.com/AbirUser/JavaFBLOP/tree/main/approaches/QCODBA. [Google Scholar]
A. Chaabani, S. Bechikh and L.B. Said, A co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization, in 2015 IEEE Congress on Evolutionary Computation (CEC) (2015) 1659–1666. https://doi.org/10.1109/CEC.2015.7257084. [Google Scholar]
A. Chaabani, S. Bechikh and L.B. Said, A memetic evolutionary algorithm for bi-level combinatorial optimization: a realization between Bi-MDVRP and Bi-CVRP, in 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE (2016) 1666–1673. https://doi.org/10.1109/CEC.2016.7743988. [Google Scholar]
A. Chaabani, S. Bechikh and L. Ben Said, A co-evolutionary hybrid decomposition-based algorithm for bi-level combinatorial optimization problems. Soft Comput. 24 (2020) 7211–7229. [Google Scholar]
L. Cheng, Q. Tang, L. Zhang and Z. Zhang, Multi-objective Q-learning-based hyper-heuristic with bi-criteria selection for energy-aware mixed shop scheduling. Swarm Evol. Comput. 69 (2022) 100985. [Google Scholar]
A. Cheraghalipour, M.M. Paydar and M. Hajiaghaei-Keshteli, Designing and solving a bi-level model for rice supply chain using evolutionary algorithms. Comput. Electron. Agric. 162 (2019) 651–668. [Google Scholar]
S.S. Choong, L.P. Wong and C.P. Lim, Automatic design of hyper-heuristic based on reinforcement learning. Inf. Sci. 436 (2018) 89–107. [Google Scholar]
C. Cortes and V. Vapnik, Support-vector networks. Mach. Learn. 20 (1995) 273–297. [Google Scholar]
P. Cunningham and S.J. Delany, K-nearest neighbour classifiers-a tutorial. ACM Comput. Surv. 54 (2021) 1–25. [Google Scholar]
J. Derrac, S. García, D. Molina and F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1 (2011) 3–18. [Google Scholar]
J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo and D. Tao, A survey on self-supervised learning: algorithms, applications, and future trends. IEEE Trans. Pattern Anal. Mach. Intell. (2024). https://doi.org/10.1109/TPAMI.2024.3415112. [Google Scholar]
M. Hammami, S. Bechikh, A. Louati, M. Makhlouf and L.B. Said, Feature construction as a bi-level optimization problem. Neural Comput. Appl. 32 (2020) 13783–13804. [Google Scholar]
M. Hammami, S. Bechikh, C.C. Hung and L.B. Said, Class-dependent weighted feature selection as a bi-level optimization problem, in Neural Information Processing, ICONIP 2020, edited by H. Yang, K. Pasupa, A.C.S. Leung, J.T. Kwok, J.H. Chan and I. King. Vol. 1333 of Communications in Computer and Information Science. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63823-8-32. [Google Scholar]
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). [Google Scholar]
Z. Hu, W. Gong and S. Li, Reinforcement learning-based differential evolution for parameters extraction of photovoltaic models. Energy Rep. 7 (2021) 916–928. [Google Scholar]
Z. Hu, L. Wang, J. Qin, B. Lev and L. Gan, Optimization of facility location and size problem based on bi-level multi-objective programming. Comput. Oper. Res. 145 (2022) 105860. [Google Scholar]
M. Jerbi, Z.C. Dagdia, S. Bechikh and L.B. Said, Android malware detection as a bi-level problem. Comput. Secur. 121 (2022) 102825. [Google Scholar]
M. Käärik and K. Pärna, On the quality of k-means clustering based on grouped data. J. Stat. Plan. Infer. 139 (2009) 3836–3841. [Google Scholar]
K. Kontolati, D. Loukrezis, D.G. Giovanis, L. Vandanapu and M.D. Shields, A survey of unsupervised learning methods for high-dimensional uncertainty quantification in black-box-type problems. J. Comput. Phys. 464 (2022) 111313. [Google Scholar]
H.W. Kuhn and A.W. Tucker, Nonlinear programming, in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press (1951) 481–492. [Google Scholar]
F. Legillon, A. Liefooghe and E.-G. Talbi, CoBRA: a cooperative coevolutionary algorithm for bi-level optimization, in 2012 IEEE Congress on Evolutionary Computation (CEC). IEEE (2012) 1–8. https://doi.org/10.1109/CEC.2012.6256620. [Google Scholar]
K. Li, T. Zhang and R. Wang, Deep reinforcement learning for multiobjective optimization. IEEE Trans. Cybern. 51 (2020) 3103–3114. [Google Scholar]
T. Li, Y. Meng and L. Tang, Scheduling of continuous annealing with a multi-objective differential evolution algorithm based on deep reinforcement learning. IEEE Trans. Autom. Sci. Eng. 21 (2023) 1767–1780. [Google Scholar]
Y. Li, C. Liao, L. Wang, Y. Xiao, Y. Cao and S. Guo, A reinforcement learning-artificial bee colony algorithm for flexible job-shop scheduling problem with lot streaming. Appl. Soft Comput. 146 (2023) 110658. [Google Scholar]
Y. Lin, F. Lin, G. Cai, H. Chen, L. Zou and P. Wu, Evolutionary reinforcement learning: a systematic review and future directions. Preprint arXiv:2402.13296 (2024). [Google Scholar]
Z.J. Liu and H.N. Wu, Driver behavior modeling via inverse reinforcement learning based on particle swarm optimization, in Proceedings of the 2020 Chinese Automation Congress (CAC). IEEE (2020) 7232–7237. [Google Scholar]
H. Louati, S. Bechikh, A. Louati, C.C. Hung and L.B. Said, Deep convolutional neural network architecture design as a bilevel optimization problem. Neurocomputing 439 (2021) 44–62. [Google Scholar]
Y. Marinakis, A. Migdalas and P. Pardalos, A new bilevel formulation for the vehicle routing problem and a solution method using a genetic algorithm. J. Glob. Optim. 38 (2007) 555–580. [Google Scholar]
N. Mazyavkina, S. Sviridov, S. Ivanov and E. Burnæev, Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134 (2021) 105400. [Google Scholar]
J.A. Mejia-De-Dios, A. Rodriguez-Molina and E. Mezura-Montes, Multi-objective bilevel optimization: a survey of the state-of-the-art. IEEE Trans. Syst. Man Cybern. Syst. 53 (2023) 5478–5490. [Google Scholar]
A. Moreira, M.Y. Santos and S. Carneiro, Density-Based Clustering Algorithms-DBSCAN and SNN. University of Minho, Portugal (2005) 1–18. [Google Scholar]
L. Peng, Z. Yuan, G. Dai, M. Wang and Z. Tang, Reinforcement learning-based hybrid differential evolution for global optimization of interplanetary trajectory design. Swarm Evol. Comput. 81 (2023) 101351. [Google Scholar]
I. Rish, An empirical study of the naive Bayes classifier, in Proceedings of IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. Vol. 3 (2001) 41–46. [Google Scholar]
P.J. Ross, Taguchi Techniques for Quality Engineering: Loss Function, Orthogonal Experiments, Parameter and Tolerance Design. McGraw-Hill, New York (1988). [Google Scholar]
I.H. Sarker, Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2 (2021) 160. [Google Scholar]
G. Savard, J. Gauvin, The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15 (1994) 265–272. [Google Scholar]
K. Sindhu Meena and S. Suriya, A survey on supervised and unsupervised learning techniques, in Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications (AISGSC 2019), edited by L. Kumar, L. Jayashree and R. Manimegalai. Springer, Cham (2020) 613–623. https://doi.org/10.1007/978-3-030-24051-6_58. [Google Scholar]
A. Sinha, P. Malo and K. Deb, Efficient evolutionary algorithm for single-objective bilevel optimization. Preprint arXiv:1303.3901 (2013). [Google Scholar]
A. Sinha, P. Malo and K. Deb, A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans. Evol. Comput. 22 (2018) 276–295. [Google Scholar]
Y. Song, L. Wei, Q. Yang, J. Wu, L. Xing and Y. Chen, RL-GA: a reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem. Swarm Evol. Comput. 77 (2023) 101236. [Google Scholar]
Y. Song, Y. Wu, Y. Guo, R. Yan, P.N. Suganthan, Y. Zhang and Q. Feng, Reinforcement learning-assisted evolutionary algorithm: a survey and research opportunities. Swarm Evol. Comput. 86 (2024) 101517. [Google Scholar]
J. Sun, X. Liu, T. Bäck and Z. Xu, Learning adaptive differential evolution algorithm from optimization experiences by policy gradient. IEEE Trans. Evol. Comput. 25 (2021) 666–680. [Google Scholar]
H. Xia, C. Li, S. Zeng, Q. Tan, J. Wang and S. Yang, A reinforcement-learning-based evolutionary algorithm using solution space clustering for multimodal optimization problems, in 2021 IEEE Congress on Evolutionary Computation (CEC). IEEE (2021) 1938–1945. https://doi.org/10.1109/CEC45853.2021.9504896. [Google Scholar]
N. Yorino, M. Abdillah, Y. Sasaki and Y. Zoka, Robust power system security assessment under uncertainties using bi-level optimization. IEEE Trans. Power Syst. 33 (2017) 352–362. [Google Scholar]
X. Zhang, S. Xia, X. Li and T. Zhang, Multi-objective particle swarm optimization with multi-mode collaboration based on reinforcement learning for path planning of unmanned air vehicles. Knowl. Based Syst. 250 (2022) 109075. [Google Scholar]
D. Zhang, Y. Chen and G. Zhu, Multi-objective hole-making sequence optimization by genetic algorithm based on Q-learning. IEEE Trans. Emerg. Top. Comput. Intell. 8 (2024) 1–14. [Google Scholar]
F. Zhao, S. Di and L. Wang, A hyperheuristic with Q-learning for the multiobjective energy-efficient distributed blocking flow shop scheduling problem. IEEE Trans. Cybern. 53 (2023) 3337–3350. [Google Scholar]
B. Zhou and Z. Zhao, An adaptive artificial bee colony algorithm enhanced by deep Q-learning for milk-run vehicle scheduling problem based on supply hub. Knowl. Based Syst. 264 (2023) 110367. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.