Multi-agent reinforcement learning for autonomous vehicles: a survey

Joris Dinneweth; Abderrahmane Boubezoul; Ren&eacute; Mandiau; St&eacute;phane Espi&eacute;

doi:10.1007/s43684-022-00045-z

Autonomous Intelligent Systems ›› 2022, Vol. 2 ›› Issue (1) : 27. DOI: 10.1007/s43684-022-00045-z

Review

Multi-agent reinforcement learning for autonomous vehicles: a survey

Author information +

History +

Abstract

In the near future, autonomous vehicles (AVs) may cohabit with human drivers in mixed traffic. This cohabitation raises serious challenges, both in terms of traffic flow and individual mobility, as well as from the road safety point of view. Mixed traffic may fail to fulfill expected security requirements due to the heterogeneity and unpredictability of human drivers, and autonomous cars could then monopolize the traffic. Using multi-agent reinforcement learning (MARL) algorithms, researchers have attempted to design autonomous vehicles for both scenarios, and this paper investigates their recent advances. We focus on articles tackling decision-making problems and identify four paradigms. While some authors address mixed traffic problems with or without social-desirable AVs, others tackle the case of fully-autonomous traffic. While the latter case is essentially a communication problem, most authors addressing the mixed traffic admit some limitations. The current human driver models found in the literature are too simplistic since they do not cover the heterogeneity of the drivers’ behaviors. As a result, they fail to generalize over the wide range of possible behaviors. For each paper investigated, we analyze how the authors formulated the MARL problem in terms of observation, action, and rewards to match the paradigm they apply.

Keywords

Multi-agent reinforcement learning / Simulation / Autonomous Vehicles

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Joris Dinneweth, Abderrahmane Boubezoul, René Mandiau, Stéphane Espié. Multi-agent reinforcement learning for autonomous vehicles: a survey. Autonomous Intelligent Systems, 2022, 2(1): 27 https://doi.org/10.1007/s43684-022-00045-z

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	S. Trommer, V. Kolarova, E. Fraedrich, L. Kröger, B. Kickhöfer, T. Kuhnimhof, B. Lenz, P. Phleps, The Impact of Vehicle Automation on Mobility Behaviour. Auton. Driv. 94, (2016)

[2]	PetrovićD., MijailovićR., PešićD.. Traffic accidents with autonomous vehicles: type of collisions, manoeuvres and errors of conventional vehicles’ drivers. Transp. Res. Proc., 2020, 45: 161-168 CrossRef Google scholar

[3]	WildeG.J.. Social interaction patterns in driver behavior: an introductory review. Hum. Factors, 1976, 18(5):477-492 CrossRef Google scholar

[4]	HaglundM., ÅbergL.. Speed choice in relation to speed limit and influences from other drivers. Transp. Res., Part F Traffic Psychol. Behav., 2000, 3(1):39-51 CrossRef Google scholar

[5]	SuttonR.S., BartoA.G.. Reinforcement Learning: An Introduction, 2018 2 Cambridge MIT Press

[6]	SilverD., HuangA., MaddisonC.J., GuezA., SifreL., Van Den DriesscheG., SchrittwieserJ., AntonoglouI., PanneershelvamV., LanctotM., et al.. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587):484-489 CrossRef Google scholar

[7]	D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017). arXiv preprint. arXiv:1712.01815

[8]	SchrittwieserJ., AntonoglouI., HubertT., SimonyanK., SifreL., SchmittS., GuezA., LockhartE., HassabisD., GraepelT., LillicrapT., SilverD.. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 2020, 588(7839):604-609 CrossRef Google scholar

[9]	VinyalsO., BabuschkinI., CzarneckiW.M., MathieuM., DudzikA., ChungJ., ChoiD.H., PowellR., EwaldsT., GeorgievP., et al.. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 2019, 575(7782):350-354 CrossRef Google scholar

[10]	L.M. Schmidt, J. Brosig, A. Plinge, B.M. Eskofier, C. Mutschler, An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility (2022). arXiv preprint. arXiv:2203.07676

[11]	ElallidB.B., BenamarN., HafidA.S., RachidiT., MraniN.. A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving. J. King Saud Univ, Comput. Inf. Sci., 2022 CrossRef Google scholar

[12]	B.R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A.A. Al Sallab, S. Yogamani, P. Pérez, Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2021.3054625

[13]	YeF., ZhangS., WangP., ChanC.-Y.. A survey of deep reinforcement learning algorithms for motion planning and control of autonomous vehicles. 2021 IEEE Intelligent Vehicles Symposium (IV), 2021 New York IEEE Press 1073-1080 CrossRef Google scholar

[14]	Z. Zhu, H. Zhao, A survey of deep rl and il for autonomous driving policy learning. IEEE Trans. Intell. Transp. Syst. (2021). https://doi.org/10.1109/TITS.2021.3134702

[15]	LiuB., DingZ., LvC.. Platoon control of connected autonomous vehicles: a distributed reinforcement learning method by consensus. IFAC-PapersOnLine, 2020, 53(2):15241-15246 CrossRef Google scholar

[16]	WatkinsC.J., DayanP.. Q-learning. Mach. Learn., 1992, 8(3):279-292 CrossRef Google scholar

[17]	MnihV., BadiaA.P., MirzaM., GravesA., LillicrapT., HarleyT., SilverD., KavukcuogluK.. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 2016 1928-1937 PMLR

[18]	T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning (2015). arXiv preprint. arXiv:1509.02971

[19]	SchulmanJ., LevineS., AbbeelP., JordanM., MoritzP.. Trust region policy optimization. International Conference on Machine Learning, 2015 1889-1897 PMLR

[20]	K. Zhang, Z. Yang, T. Başar, Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handb. Reinf. Learn. Control, 321–384 (2021)

[21]	ChuT., WangJ., CodecàL., LiZ.. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst., 2019, 21(3):1086-1095 CrossRef Google scholar

[22]	R. Lowe, Y.I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 30, (2017). https://doi.org/10.5555/3295222.3295385

[23]	P. Hernandez-Leal, M. Kaisers, T. Baarslag, E.M. de Cote, A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity (2019). arXiv:1707.09183 [cs]

[24]	ShohamY., Leyton-BrownK.. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, 2008 USA Cambridge University Press CrossRef Google scholar

[25]	GuptaJ.K., EgorovM., KochenderferM.. Cooperative multi-agent control using deep reinforcement learning. International Conference on Autonomous Agents and Multiagent Systems, 2017 Berlin Springer 66-83 CrossRef Google scholar

[26]	Hernandez-LealP., KartalB., TaylorM.E.. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst., 2019, 33(6):750-797 CrossRef Google scholar

[27]	NguyenT.T., NguyenN.D., NahavandiS.. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern., 2020, 50(9):3826-3839 CrossRef Google scholar

[28]	CaneseL., CardarilliG.C., Di NunzioL., FazzolariR., GiardinoD., ReM., SpanòS.. Multi-agent reinforcement learning: a review of challenges and applications. Appl. Sci., 2021, 11(11 CrossRef Google scholar

[29]	GronauerS., DiepoldK.. Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev., 2022, 55 2):895-943 CrossRef Google scholar

[30]	A. OroojlooyJadid, D. Hajinezhad, A Review of Cooperative Multi-Agent Deep Reinforcement Learning (2021) arXiv:1908.03963 [cs, math, stat]

[31]	J. Dong, S. Chen, P.Y.J. Ha, Y. Li, S. Labi, A drl-based multiagent cooperative control framework for cav networks: a graphic convolution q network (2020). arXiv preprint. arXiv:2010.05437

[32]	BengioY., LouradourJ., CollobertR., WestonJ.. Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning—ICML’09, 2009 Montreal ACM Press 1-8 CrossRef Google scholar

[33]	PateriaS., SubagdjaB., TanA.-H., QuekC.. Hierarchical reinforcement learning: a comprehensive survey. ACM Comput. Surv. (CSUR), 2021, 54(5):1-35 CrossRef Google scholar

[34]	ChenY., DongC., PalanisamyP., MudaligeP., MuellingK., DolanJ.M.. Attention-based hierarchical deep reinforcement learning for lane change behaviors in autonomous driving. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019 1326-1334 CrossRef Google scholar

[35]	DosovitskiyA., RosG., CodevillaF., LopezA., KoltunV.. Carla: an open urban driving simulator. Conference on Robot Learning, 2017 1-16 PMLR

[36]	C. Wu, A. Kreidieh, K. Parvate, E. Vinitsky, A.M. Bayen, Flow: architecture and benchmarking for reinforcement learning in traffic control (2017). arXiv preprint. arXiv:1710.05465

[37]	BehrischM., BiekerL., ErdmannJ., KrajzewiczD.. Sumo–simulation of urban mobility: an overview. Proceedings of SIMUL 2011, The Third International Conference on Advances in System Simulation, 2011 ThinkMind

[38]	DuanY., ChenX., HouthooftR., SchulmanJ., AbbeelP.. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning, 2016 1329-1338 PMLR

[39]	PalanisamyP.. Multi-agent connected autonomous driving using deep reinforcement learning. 2020 International Joint Conference on Neural Networks (IJCNN), 2020 Glasgow IEEE 1-7 CrossRef Google scholar

[40]	C. Munduteguy, Reconnaissance d’intention et prédiction d’action pour la gestion des interactions en environnement dynamique. PhD thesis, Paris, CNAM (2001)

[41]	MunduteguyC., DarsesF.. Perception et anticipation du comportement d’autrui en situation simulée de conduite automobile. Le Trav. Hum., 2007, 70(1):1-32 CrossRef Google scholar

[42]	ChaoQ., BiH., LiW., MaoT., WangZ., LinM.C., DengZ.. A survey on visual traffic simulation: models, evaluations, and applications in autonomous driving. Computer Graphics Forum, 2020 New York Wiley 287-308

[43]	HoogendoornS.P., BovyP.H.. State-of-the-art of vehicular traffic flow modelling. Proc. Inst. Mech. Eng., Part I, J. Syst. Control Eng., 2001, 215(4):283-303

[44]	MoridpourS., SarviM., RoseG.. Lane changing models: a critical review. Transp. Lett., 2010, 2(3):157-173 CrossRef Google scholar

[45]	TreiberM., HenneckeA., HelbingD.. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E, 2000, 62(2):1805-1824 CrossRef Google scholar

[46]	KestingA., TreiberM., HelbingD.. General lane-changing model MOBIL for car-following models. Transp. Res. Rec., 2007, 1999(1):86-94 CrossRef Google scholar

[47]	ErdmannJ.. Lane-changing model in sumo. Proc. SUMO2014 Model. Mobil. Open Data, 2014, 24: 77-88

[48]	WangJ., ShiT., WuY., Miranda-MorenoL., SunL.. Multi-agent graph reinforcement learning for connected automated driving. Conference: ICML Workshop on AI for Autonomous Driving, 2020 7

[49]	HanS., WangH.. Stable and efficient Shapley value-based reward reallocation for multi-agent reinforcement learning of autonomous vehicles. 2022 IEEE International Conference on Robotics and Automation, 2022

[50]	B. Toghi, R. Valiente, D. Sadigh, R. Pedarsani, Y.P. Fallah, Social Coordination and Altruism in Autonomous Driving. IEEE Trans. Intell. Veh. (2022). https://doi.org/10.1109/TITS.2022.3207872

[51]	ToghiB., ValienteR., SadighD., PedarsaniR., FallahY.P.. Cooperative autonomous vehicles that sympathize with human drivers. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021 4517-4524 CrossRef Google scholar

[52]	ToghiB., ValienteR., SadighD., PedarsaniR., FallahY.P.. Altruistic maneuver planning for cooperative autonomous vehicles using multi-agent advantage actor-critic. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021

[53]	D. Chen, Z. Li, M. Hajidavalloo, K. Chen, Y. Wang, L. Jiang, Y. Wang, Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic (2022). arXiv:2105.05701 [cs, eess]

[54]	SchwartingW., PiersonA., Alonso-MoraJ., KaramanS., RusD.. Social behavior for autonomous vehicles. Proc. Natl. Acad. Sci., 2019, 116(50):24972-24978 CrossRef Google scholar

[55]	ValienteR., ToghiB., PedarsaniR., FallahY.P.. Robustness and adaptability of reinforcement learning-based cooperative autonomous driving in mixed-autonomy traffic. IEEE Open J. Intell. Transp. Syst., 2022, 3: 397-410 CrossRef Google scholar

[56]	ZhouW., ChenD., YanJ., LiZ., YinH., GeW.. Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic. Auton. Intell. Syst., 2022, 2(1 CrossRef Google scholar

[57]	HuY., NakhaeiA., TomizukaM., FujimuraK.. Interaction-aware decision making with adaptive strategies under merging scenarios. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019 New York IEEE Press 151-158 CrossRef Google scholar

[58]	YuC., WangX., XuX., ZhangM., GeH., RenJ., SunL., ChenB., TanG.. Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs. IEEE Trans. Intell. Transp. Syst., 2020, 21 2):735-748 CrossRef Google scholar

[59]	BhallaS., Ganapathi SubramanianS., CrowleyM.. Deep multi agent reinforcement learning for autonomous driving. Canadian Conference on Artificial Intelligence, 2020 Berlin Springer 67-78 CrossRef Google scholar

[60]	J. Foerster, I.A. Assael, N. De Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 29, (2016). https://doi.org/10.5555/3157096.3157336

[61]	NakkaS.K.S., ChalakiB., MalikopoulosA.A.. A multi-agent deep reinforcement learning coordination framework for connected and automated vehicles at merging roadways. 2022 American Control Conference (ACC), 2022 New York IEEE 3297-3302 CrossRef Google scholar

[62]	WangL., YangZ., WangZ.. Breaking the curse of many agents: provable mean embedding q-iteration for mean-field reinforcement learning. International Conference on Machine Learning, 2020 10092-10103 PMLR

[63]	EspeholtL., SoyerH., MunosR., SimonyanK., MnihV., WardT., DoronY., FiroiuV., HarleyT., DunningI., LeggS., KavukcuogluK.. Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. International Conference on Machine Learning, 2018 1407-1416 PMLR

[64]	GarcıaJ., FernándezF.. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., 2015, 16(1):1437-1480

[65]	ÖzkanT., LajunenT., ChliaoutakisJ.E., ParkerD., SummalaH.. Cross-cultural differences in driving behaviours: a comparison of six countries. Transp. Res., Part F Traffic Psychol. Behav., 2006, 9(3):227-242 CrossRef Google scholar

[66]	E. Vinitsky, R. Köster, J.P. Agapiou, E. Duéñez-Guzmán, A.S. Vezhnevets, J.Z. Leibo, A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings (2021). arXiv preprint. arXiv:2106.09012