Joint Optimization of Passenger Flow Control and Train Skip-Stopping for Overcrowded Metro Lines: A Multi-Agent Reinforcement Learning Approach
Siqiao Xing , Jianyuan Guo , Yong Qin , Limin Jia , Yueyue Wang , Shuning Jiang
Urban Rail Transit ›› : 1 -29.
Joint Optimization of Passenger Flow Control and Train Skip-Stopping for Overcrowded Metro Lines: A Multi-Agent Reinforcement Learning Approach
During rush hours, the capacity of metro in megacities is insufficient to meet the travel demand, resulting in oversaturation and high risk on platform in stations, especially transfer stations. This paper addresses this problem through the joint optimization of some operational interventions, aiming to alleviate passenger overloads while maintaining travel efficiency. To make the model more realistic, the stochastic characteristics of passengers are considered, including the probability distribution of passenger arrival time, inbound and transfer walking times. To provide a high-quality solution for the complex constraint model, three cooperative agents—governing passenger inflow, transfer flows, and train skip-stopping mode—are architected within improved Double Deep Q learning Network (IDDQN) to form a multi-agent reinforcement learning solution. Empirical validation on Beijing Metro Line 13 and Changping Line demonstrates that the multi-agent framework proposed in this paper can eliminate 100% of passenger over-limit flow while reducing the average waiting time of passengers. It also has a significant improvement in reducing stochastic characteristic impact and accelerating convergence.
Urban rail transit / Multi-agent reinforcement learning / Train skip-stopping mode / Passenger flow control / Joint optimization
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
Bertsekas D (2019) Reinforcement learning and optimal control. Vol. 1. Athena Scientific. |
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
Hessel M et al. (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second aaai conference on artificial intelligence. p 3215-3222 |
| [44] |
Van Hasselt H, Guez A and Silver D (2016) Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI conference on artificial intelligence, p 2094-2100 |
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
Babaeizadeh M et al. (2016) Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256 |
| [50] |
|
| [51] |
Schulman J (2015) Trust region policy optimization. arXiv:1502.05477 |
| [52] |
Zimmer M and Weng P (2019) Exploiting the sign of the advantage function to learn deterministic policies in continuous domains. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, p 4496-4502 |
| [53] |
Schaul T (2015) Prioritized experience replay. arXiv:1511.05952 |
| [54] |
|
The Author(s)
/
| 〈 |
|
〉 |