Multi-agent reinforcement learning for autonomous vehicles: a survey
Joris Dinneweth, Abderrahmane Boubezoul, René Mandiau, Stéphane Espié
Multi-agent reinforcement learning for autonomous vehicles: a survey
In the near future, autonomous vehicles (AVs) may cohabit with human drivers in mixed traffic. This cohabitation raises serious challenges, both in terms of traffic flow and individual mobility, as well as from the road safety point of view. Mixed traffic may fail to fulfill expected security requirements due to the heterogeneity and unpredictability of human drivers, and autonomous cars could then monopolize the traffic. Using multi-agent reinforcement learning (MARL) algorithms, researchers have attempted to design autonomous vehicles for both scenarios, and this paper investigates their recent advances. We focus on articles tackling decision-making problems and identify four paradigms. While some authors address mixed traffic problems with or without social-desirable AVs, others tackle the case of fully-autonomous traffic. While the latter case is essentially a communication problem, most authors addressing the mixed traffic admit some limitations. The current human driver models found in the literature are too simplistic since they do not cover the heterogeneity of the drivers’ behaviors. As a result, they fail to generalize over the wide range of possible behaviors. For each paper investigated, we analyze how the authors formulated the MARL problem in terms of observation, action, and rewards to match the paradigm they apply.
Multi-agent reinforcement learning / Simulation / Autonomous Vehicles
[1] |
S. Trommer, V. Kolarova, E. Fraedrich, L. Kröger, B. Kickhöfer, T. Kuhnimhof, B. Lenz, P. Phleps, The Impact of Vehicle Automation on Mobility Behaviour. Auton. Driv. 94, (2016)
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017). arXiv preprint. arXiv:1712.01815
|
[8] |
|
[9] |
|
[10] |
L.M. Schmidt, J. Brosig, A. Plinge, B.M. Eskofier, C. Mutschler, An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility (2022). arXiv preprint. arXiv:2203.07676
|
[11] |
|
[12] |
B.R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A.A. Al Sallab, S. Yogamani, P. Pérez, Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2021.3054625
|
[13] |
|
[14] |
Z. Zhu, H. Zhao, A survey of deep rl and il for autonomous driving policy learning. IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2021.3134702
|
[15] |
|
[16] |
|
[17] |
|
[18] |
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning (2015). arXiv preprint. arXiv:1509.02971
|
[19] |
|
[20] |
K. Zhang, Z. Yang, T. Başar, Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handb. Reinf. Learn. Control, 321–384 (2021)
|
[21] |
|
[22] |
R. Lowe, Y.I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 30, (2017). https://doi.org/10.5555/3295222.3295385
|
[23] |
P. Hernandez-Leal, M. Kaisers, T. Baarslag, E.M. de Cote, A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity (2019). arXiv:1707.09183 [cs]
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
A. OroojlooyJadid, D. Hajinezhad, A Review of Cooperative Multi-Agent Deep Reinforcement Learning (2021) arXiv:1908.03963 [cs, math, stat]
|
[31] |
J. Dong, S. Chen, P.Y.J. Ha, Y. Li, S. Labi, A drl-based multiagent cooperative control framework for cav networks: a graphic convolution q network (2020). arXiv preprint. arXiv:2010.05437
|
[32] |
|
[33] |
|
[34] |
|
[35] |
|
[36] |
C. Wu, A. Kreidieh, K. Parvate, E. Vinitsky, A.M. Bayen, Flow: architecture and benchmarking for reinforcement learning in traffic control (2017). arXiv preprint. arXiv:1710.05465
|
[37] |
|
[38] |
|
[39] |
|
[40] |
C. Munduteguy, Reconnaissance d’intention et prédiction d’action pour la gestion des interactions en environnement dynamique. PhD thesis, Paris, CNAM (2001)
|
[41] |
|
[42] |
|
[43] |
|
[44] |
|
[45] |
|
[46] |
|
[47] |
|
[48] |
|
[49] |
|
[50] |
B. Toghi, R. Valiente, D. Sadigh, R. Pedarsani, Y.P. Fallah, Social Coordination and Altruism in Autonomous Driving. IEEE Trans. Intell. Veh. (2022). https://doi.org/10.1109/TITS.2022.3207872
|
[51] |
|
[52] |
|
[53] |
D. Chen, Z. Li, M. Hajidavalloo, K. Chen, Y. Wang, L. Jiang, Y. Wang, Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic (2022). arXiv:2105.05701 [cs, eess]
|
[54] |
|
[55] |
|
[56] |
|
[57] |
|
[58] |
|
[59] |
|
[60] |
J. Foerster, I.A. Assael, N. De Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 29, (2016). https://doi.org/10.5555/3157096.3157336
|
[61] |
|
[62] |
|
[63] |
|
[64] |
|
[65] |
|
[66] |
E. Vinitsky, R. Köster, J.P. Agapiou, E. Duéñez-Guzmán, A.S. Vezhnevets, J.Z. Leibo, A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings (2021). arXiv preprint. arXiv:2106.09012
|
[67] |
S.J. Grimbly, J. Shock, A. Pretorius, Causal Multi-Agent Reinforcement Learning: Review and Open Problems (2021). arXiv:2111.06721 [cs, stat]
|
/
〈 | 〉 |