PDF
Abstract
AutoML systems seek to assist Artificial Intelligence users in finding the best configurations for machine learning models. Following this line, recently the area of Automated Reinforcement Learning (AutoRL) has become increasingly relevant, given the growing increase in applications for reinforcement learning algorithms. However, the literature still lacks specific AutoRL systems for combinatorial optimization, especially for the Sequential Ordering Problem (SOP). Therefore, this paper aims to present a new AutoRL approach for SOP. For this, two new methods are proposed using hyperparameter optimization and metalearning: AutoRL-SOP and AutoRL-SOP-MtL. The proposed AutoRL techniques enable the combined tuning of three SARSA hyperparameters, being ϵ-greedy policy, learning rate, and discount factor. Furthermore, the new metalearning approach enables the transfer of hyperparameters between two combinatorial optimization domains: TSP (source) and SOP (target). The results show that the application of metalearning generates a reduction in computational cost in hyperparameter optimization. Furthermore, the proposed AutoRL methods achieved the best solutions in 23 out of 28 simulated TSPLIB instances compared to recent literature studies.
Keywords
Reinforcement Learning
/
AutoML
/
Sequential Ordering Problem
/
Hyperparameter Optimization
/
Metalearning
Cite this article
Download citation ▾
André Luiz Carvalho Ottoni.
Automated reinforcement learning for sequential ordering problem using hyperparameter optimization and metalearning.
Autonomous Intelligent Systems, 2025, 5(1): DOI:10.1007/s43684-025-00103-2
| [1] |
BaratchiM., WangC., LimmerS., van RijnJ.N., HoosH., BäckT., OlhoferM.. Automated machine learning: past, present and future. Artif. Intell. Rev., 2024, 57(5): 1-88
|
| [2] |
GuptaG., KataryaR.. Enpso: an automl technique for generating ensemble recommender system. Arab. J. Sci. Eng., 2021, 46(9): 8677-8695
|
| [3] |
HeX., ZhaoK., ChuX.. Automl: a survey of the state-of-the-art. Knowl.-Based Syst., 2021, 212106622
|
| [4] |
HutterF., KotthoffL., VanschorenJ.Automated Machine Learning: Methods, Systems, Challenges, 2019, Berlin. Springer. Available at http://automl.org/book
|
| [5] |
ZöllerM.-A., HuberM.F.. Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res., 2021, 70: 409-472
|
| [6] |
BrazdilP., van RijnJ.N., SoaresC., VanschorenJ.Metalearning: Applications to Automated Machine Learning and Data Mining, 2022Springer
|
| [7] |
T. Murai, Y. Inoue, A. Nambirige, G.A. Annor, Machine learning approach in predicting glutopeak test parameters from image data with automl and transfer learning. Heliyon 9(10) (2023)
|
| [8] |
SalehinI., IslamM.S., SahaP., NomanS., TuniA., HasanM.M., BatenM.A.. Automl: a systematic review on automated machine learning with neural architecture search. J. Inf. Intell., 2024, 2(1): 52-81
|
| [9] |
FilippouK., AifantisG., PapakostasG.A., TsekourasG.E.. Structure learning and hyperparameter optimization using an automated machine learning (automl) pipeline. Information, 2023, 144232
|
| [10] |
R.R. Afshar, Y. Zhang, J. Vanschoren, U. Kaymak (2022). Automated reinforcement learning: an overview. arXiv preprint arXiv:2201.05000
|
| [11] |
MussiM., LombardaD., MetelliA.M., TrovòF., RestelliM.. Arlo: a framework for automated reinforcement learning. Expert Syst. Appl., 2023, 224119883
|
| [12] |
Parker-HolderJ., RajanR., SongX., BiedenkappA., MiaoY., EimerT., ZhangB., NguyenV., CalandraR., FaustA., et al.. Automated reinforcement learning (autorl): a survey and open problems. J. Artif. Intell. Res., 2022, 74: 517-568
|
| [13] |
CaiB., LiH., ZhangN., CaoM., YuH.. A cooperative jamming decision-making method based on multi-agent reinforcement learning. Auton. Intell. Syst., 2025, 5(1): 1-15
|
| [14] |
O. Hamed, M. Hamlich, Navigation method for autonomous mobile robots based on ros and multi-robot improved q-learning. Prog. Artif. Intell., 1–9 (2024)
|
| [15] |
SuttonR., BartoA.Reinforcement Learning: An Introduction, 20182Cambridge. MIT Press.
|
| [16] |
WuR., HuangZ., ZhouZ., GayathriR., PremalathaR.. Feature recognition of student using heuristic political effective evaluation reinforcement learning algorithm. Prog. Artif. Intell., 2023, 12: 133-146
|
| [17] |
AshrafN.M., MostafaR.R., SakrR.H., RashadM.. Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm. PLoS ONE, 2021, 166e0252754
|
| [18] |
H. Bai, R. Cheng, Generalized population-based training for hyperparameter optimization in reinforcement learning. IEEE Trans. Emerg. Top. Comput. Intell. (2024)
|
| [19] |
FrankeJ.K., KöhlerG., BiedenkappA., HutterF.. Sample-efficient automated deep reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2021123
|
| [20] |
Parker-HolderJ., NguyenV., DesaiS., RobertsS.J.. Tuning mixed input hyperparameters on the fly for efficient population based autorl. Adv. Neural Inf. Process. Syst., 2021, 34: 15513-15528
|
| [21] |
OttoniA.L.C., NepomucenoE.G., de OliveiraM.S.. A response surface model approach to parameter estimation of reinforcement learning for the travelling salesman problem. J. Control Autom. Electr. Syst., 2018, 29(3): 350-359
|
| [22] |
OttoniA.L.C., NepomucenoE.G., de OliveiraM.S., de OliveiraD.C.R.. Reinforcement learning for the traveling salesman problem with refueling. Complex Intell. Syst., 2022, 8: 2001-2015
|
| [23] |
OttoniA.L.C., NepomucenoE.G., de OliveiraM.S., de OliveiraD.C.R.. Tuning of reinforcement learning parameters applied to sop using the Scott–knott method. Soft Comput., 2020, 24: 4441-4453
|
| [24] |
SouzaG.K.B., SantosS.O.S., OttoniA.L.C., OliveiraM.S., OliveiraD.C.R., NepomucenoE.G.. Transfer reinforcement learning for combinatorial optimization problems. Algorithms, 2024, 17287
|
| [25] |
EscuderoL.. An inexact algorithm for the sequential ordering problem. Eur. J. Oper. Res., 1988, 37(2): 236-249
|
| [26] |
GambardellaL.M., DorigoM.. An ant colony system hybridized with a new local search for the sequential ordering problem. INFORMS J. Comput., 2000, 12(3): 237-255
|
| [27] |
GonggiatgulT., ShobakiG., Muyan-ÖzçelikP.. A parallel branch-and-bound algorithm with history-based domination and its application to the sequential ordering problem. J. Parallel Distrib. Comput., 2023, 172: 131-143
|
| [28] |
SkinderowiczR.. An improved ant colony system for the sequential ordering problem. Comput. Oper. Res., 2017, 86: 1-17
|
| [29] |
KhachaiD., SadykovR., BattaiaO., KhachayM.. Precedence constrained generalized traveling salesman problem: polyhedral study, formulations, and branch-and-cut algorithm. Eur. J. Oper. Res., 2023, 309(2): 488-505
|
| [30] |
MontemanniR., SmithD., GambardellaL.. A heuristic manipulation technique for the sequential ordering problem. Comput. Oper. Res., 2008, 35(12): 3931-3944Part Special Issue: Telecommunications Network Engineering
|
| [31] |
SaliiY.. Revisiting dynamic programming for precedence-constrained traveling salesman problem and its time-dependent generalization. Eur. J. Oper. Res., 2019, 272(1): 32-42
|
| [32] |
SaliiY.V., ShekaA.S.. Improving dynamic programming for travelling salesman with precedence constraints: parallel morin–marsten bounding. Optim. Methods Softw., 2021, 36(6): 1128-1154
|
| [33] |
JelihovschiE.G., FariaJ.C., AllamanI.B.. Scottknott: a package for performing the Scott-knott clustering algorithm in R. TEMA - SBMAC, 2014, 15(1): 3-17
|
| [34] |
ScottA.J., KnottM.. A cluster analysis method for grouping means in the analysis of variance. Biometrics, 1974, 30(3): 507-512
|
| [35] |
OttoniA.L.C., NovoM.S., CostaD.B.. Hyperparameter tuning of convolutional neural networks for building construction image classification. Vis. Comput., 2023, 39(3): 847-861
|
| [36] |
ReineltG.The Traveling Salesman: Computational Solutions for TSP Applications, 1994, Berlin. Springer.
|
| [37] |
ChiangH.-T.L., FaustA., FiserM., FrancisA.. Learning navigation behaviors end-to-end with autorl. IEEE Robot. Autom. Lett., 2019, 4(2): 2007-2014
|
| [38] |
KimM., KimJ.-S., ParkJ.-H.. Automated hyperparameter tuning in reinforcement learning for quadrupedal robot locomotion. Electronics, 2023, 131116
|
| [39] |
V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., Isaac gym: high performance gpu-based physics simulation for robot learning (2021). arXiv preprint arXiv:2108.10470
|
| [40] |
XuY., WangY., LiuC.. Training a reinforcement learning agent with autorl for traffic signal control. 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology (FCSIT), 20225155IEEE
|
| [41] |
WangZ., WangL., LiangQ., LuoW., GongQ., LiS.. Research on automated reinforcement learning: based on tree-structured parzen estimators and median pruning. 2021 33rd Chinese Control and Decision Conference (CCDC), 202154615465IEEE
|
| [42] |
AnghinolfiD., MontemanniR., PaolucciM., GambardellaL.M.. A hybrid particle swarm optimization approach for the sequential ordering problem. Comput. Oper. Res., 2011, 38(7): 1076-1085
|
| [43] |
LibralessoL., BouhassounA.-M., CambazardH., JostV.. Tree search for the sequential ordering problem. ECAI 2020, 2020
|
| [44] |
ReineltG.. TSPLIB - a traveling salesman problem library. ORSA J. Comput., 1991, 3(4): 376-384
|
| [45] |
R Core TeamR: A Language and Environment for Statistical Computing, 2024, Vienna. R Foundation for Statistical Computing.
|
| [46] |
OttoniA.L.C., SouzaA.M., NovoM.S.. Automated hyperparameter tuning for crack image classification with deep learning. Soft Comput., 2023, 27: 18383-18402
|
| [47] |
BianchiR.A.C., RibeiroC.H.C., CostaA.H.R.. On the relation between ant colony optimization and heuristically accelerated reinforcement learning. 1st International Workshop on Hybrid Control of Autonomous System, 20094955
|
| [48] |
Lima JúniorF.C., NetoA.D.D., MeloJ.D.. Hybrid metaheuristics using reinforcement learning applied to salesman traveling problem. Traveling Salesman Problem, Theory and Applications, 2010213236InTech
|
| [49] |
MontgomeryD.C.Design and Analysis of Experiments, 20179New York. Wiley.
|
Funding
Pró-Reitoria de Pesquisa e Pós-Graduação, Universidade Federal de Ouro Preto (BR)(Edital PROPPI 18/2024)
RIGHTS & PERMISSIONS
The Author(s)