Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic
Wei Zhou, Dong Chen, Jun Yan, Zhaojian Li, Huilin Yin, Wanchen Ge
Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic
Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL) has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision-making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic (MA2C) method is proposed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is designed to incorporate fuel efficiency, driving comfort, and the safety of autonomous driving. A comprehensive experimental study is made that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety, and driver comfort.
Multi-agent deep reinforcement learning / Lane-changing / Connected autonomous vehicles / Mixed traffic
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
G. Wang, J. Hu, Z. Li, L. Li, Harmonious lane changing via deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. PP(99), 1–9 (2021)
|
[8] |
R. Du, S. Chen, Y. Li, J. Dong, P.Y.J. Ha, S. Labi, A cooperative control framework for CAV lane change in a mixed traffic environment (2020). CoRR. arXiv:2010.05439
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
S. Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. Safe, multi-agent, reinforcement learning for autonomous driving (2016). CoRR. arXiv:1610.03295
|
[14] |
|
[15] |
P. Young Joun Ha, S. Chen, J. Dong, R. Du, Y. Li, S. Labi, Leveraging the capabilities of connected and autonomous vehicles and multi-agent reinforcement learning to mitigate highway bottleneck congestion (2020). CoRR. arXiv:2010.05436
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
J. Dong, S. Chen, P. Young Joun Ha, Y. Li, S. Labi, A drl-based multiagent cooperative control framework for CAV networks: a graphic convolution Q network (2020). CoRR. arXiv:2010.05437
|
[23] |
D. Chen, Z. Li, Y. Wang, L. Jiang, Y. Wang, Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic (2021). CoRR. arXiv:2105.05701
|
[24] |
|
[25] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M.A. Riedmiller, Playing atari with deep reinforcement learning (2013). CoRR. arXiv:1312.5602
|
[26] |
|
[27] |
|
[28] |
M. Kaushik, N. Singhania, K.M. Krishna, Parameter sharing reinforcement learning architecture for multi agent driving behaviors (2018). CoRR. arXiv:1811.07214
|
[29] |
|
[30] |
|
[31] |
|
[32] |
A. Mavrogiannis, R. Chandra, D. Manocha, B-GAP: behavior-guided action prediction for autonomous navigation (2020). CoRR. arXiv:2011.03748
|
[33] |
|
[34] |
|
[35] |
M. Kaushik, S. Phaniteja, K.M. Krishna, Parameter sharing reinforcement learning architecture for multi agent driving behaviors (2018). CoRR. arXiv:1811.07214
|
[36] |
|
[37] |
|
[38] |
E. Leurent, An environment for autonomous driving decision-making (2018). https://github.com/eleurent/highway-env
|
[39] |
|
[40] |
|
[41] |
|
[42] |
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms (2017). CoRR. arXiv:1707.06347
|
[43] |
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms (2017). CoRR. arXiv:1707.06347
|
[44] |
|
/
〈 | 〉 |