Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG

Shasha Tian , Yuanxiang Li , Xiao Zhang , Lu Zheng , Linhui Cheng , Wei She , Wei Xie

›› 2024, Vol. 10 ›› Issue (4) : 813 -826.

PDF
›› 2024, Vol. 10 ›› Issue (4) :813 -826. DOI: 10.1016/j.dcan.2023.02.016
Research article
research-article

Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG

Author information +
History +
PDF

Abstract

The path planning of Unmanned Aerial Vehicle (UAV) is a critical issue in emergency communication and rescue operations, especially in adversarial urban environments. Due to the continuity of the flying space, complex building obstacles, and the aircraft's high dynamics, traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination. Accordingly, in this paper, we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient (TSEB-DDPG) algorithm. We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes. After transformation, we propose the Hierarchical Learning Particle Swarm Optimization (HL-PSO) to obtain the empirical path. Then, to ensure the accuracy of the obtained paths, the empirical path, the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information. The sampling ratio of each buffer is dynamically adapted to the training stages. Moreover, we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning. The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally, and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy. We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm, DDPG algorithm, and TSEB-DDPG algorithm. The results show that the TSEB-DDPG algorithm can archive almost the best in terms of accuracy, the average time of actual path planning, and the success rate.

Keywords

Unmanned aerial vehicle / Path planning / Deep deterministic policy gradient / Three-step experience buffer / Particle swarm optimization

Cite this article

Download citation ▾
Shasha Tian, Yuanxiang Li, Xiao Zhang, Lu Zheng, Linhui Cheng, Wei She, Wei Xie. Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG. , 2024, 10(4): 813-826 DOI:10.1016/j.dcan.2023.02.016

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Y. Dong, E. Camci, E. Kayacan, Faster RRT-based nonholonomic path planning in 2D building environments using skeleton-constrained path biasing, J. Intell. Rob. Syst. 89 (3) (2018) 387-401.

[2]

U. Orozco-Rosas, O. Montiel, R. Sepúlveda, Mobile robot path planning using membrane evolutionary artificial potential field, Appl. Soft Comput. 77 (2019) 236-251.

[3]

C. Cai, S. Ferrari, Information-driven sensor path planning by approximate cell decomposition, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39 (3) (2009) 672-689.

[4]

T.P. Tunggal, A. Supriyanto, N.M.Z. Rochman, I. Faishal, I. Pambudi, I. Iswanto, Pursuit algorithm for robot trash can based on fuzzy-cell decomposition, Int. J. Electr. Comput. Eng. 6 (6) (2016) 2863.

[5]

Y. Xinyi, Z. Yichen, L. Liang, O. Linlin, Dynamic window with virtual goal (DW-VG): a new reactive obstacle avoidance approach based on motion prediction, Robotica 37 (8) (2019) 1438-1456.

[6]

J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of ICNN'95-international Conference on Neural Networks vol. 4, IEEE, 1995, November, pp. 1942-1948.

[7]

X. Cai, Z. Hu, Z. Fan, A novel memetic algorithm based on invasive weed optimization and differential evolution for constrained optimization, Soft Comput. 17 (10) (2013) 1893-1910.

[8]

M. Clerc, J. Kennedy, The particle swarm-explosion, stability, and convergence in a multidimensional complex space, IEEE Trans. Evol. Comput. 6 (1) (2002) 58-73.

[9]

P. Sun, R. Shan, Predictive control with velocity observer for cushion robot based on PSO for path planning, J. Syst. Sci. Complex. 33 (4) (2020) 988-1011.

[10]

S. Shao, Y. Peng, C. He, Y. Du, Efficient path planning for UAV formation via comprehensively improved particle swarm optimization, ISA Trans. 97 (2020) 415-430.

[11]

Y. Dong, X. Zou, Mobile robot path planning based on improved DDPG reinforcement learning algorithm, in: 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), IEEE, 2020, October, pp. 52-56.

[12]

G.T. Tu, J.G. Juang, Path planning and obstacle avoidance based on reinforcement learning for UAV application, in: 2021 International Conference on System Science and Engineering (ICSSE), IEEE, 2021, August, pp. 352-355.

[13]

E.H. Houssein, A.G. Gad, K. Hussain, P.N. Suganthan, Major advances in particle swarm optimization: theory, analysis, and application, Swarm Evol. Comput. 63 (2021) 100868.

[14]

Y. Shi, R. Eberhart,A modified particle swarm optimizer, in:1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), IEEE, 1998, May, pp. 69-73.

[15]

P.Y. Yang, F.I. Chou, J.T. Tsai, J.H. Chou, Adaptive-uniform-experimental-design-based fractional-order particle swarm optimizer with non-linear time-varying evolution, Appl. Sci. 9 (24) (2019) 5537.

[16]

C. Yang, W. Gao, N. Liu, C. Song, Low-discrepancy sequence initialized particle swarm optimization algorithm with high-order nonlinear time-varying inertia weight, Appl. Soft Comput. 29 (2015) 386-394.

[17]

M.S. Nobile, P. Cazzaniga, D. Besozzi, R. Colombo, G. Mauri, G. Pasi, Fuzzy Self-Tuning PSO: a settings-free algorithm for global optimization, Swarm Evol. Comput. 39 (2018) 70-85.

[18]

X. Ye, B. Chen, L. Jing, B. Zhang, Y. Liu, Multi-agent hybrid particle swarm optimization (MAHPSO) for wastewater treatment network planning, J. Environ. Manag. 234 (2019) 525-536.

[19]

R. Lan, L. Zhang, Z. Tang, Z. Liu, X. Luo, A hierarchical sorting swarm optimizer for large-scale optimization, IEEE Access 7 (2019) 40625-40635.

[20]

L.T. Al-Bahrani, J.C. Patra, A novel orthogonal PSO algorithm based on orthogonal diagonalization, Swarm Evol. Comput. 40 (2018) 1-23.

[21]

X. Xia, Y. Tang, B. Wei, L. Gui, Dynamic multi-swarm particle swarm optimization based on elite learning, IEEE Access 7 (2019) 184849-184865.

[22]

J. Ding, Q. Wang, Q. Zhang, Q. Ye, Y. Ma, A hybrid particle swarm optimization-cuckoo search algorithm and its engineering applications, Math. Probl Eng. (2019) 1-12.

[23]

C.X. Yang, J. Zhang, M.S. Tong, A hybrid quantum-behaved particle swarm optimization algorithm for solving inverse scattering problems, IEEE Trans. Antenn. Propag. 69 (9) (2021) 5861-5869.

[24]

X. Xia, L. Gui, G. He, et al., An expanded particle swarm optimization based on multi-exemplar and forgetting ability, Inf. Sci. 508 (2020) 105-120.

[25]

X. Chen, H. Tianfield, C. Mei, W. Du, G. Liu, Biogeography-based learning particle swarm optimization, Soft Comput. 21 (24) (2017) 7519-7541.

[26]

H. Chen, C. Wang, J. Huang, J. Gong, Efficient use of heuristics for accelerating XCS-based policy learning in Markov games, Swarm Evol. Comput. 65 (2021) 100914.

[27]

T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., Continuous Control with Deep Reinforcement Learning, 2015 arXiv preprint arXiv:1509.02971.

[28]

L.J. Lin, Reinforcement Learning for Robots Using Neural Networks, Carnegie Mellon University (Ph. D. Thesis), 1992.

[29]

Y. Hou, L. Liu, Q. Wei, X. Xu, C. Chen, A novel DDPG method with prioritized experience replay, in: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, 2017, October, pp. 316-321.

[30]

J. Li, T. Yu, X. Zhang, F. Li, D. Lin, H. Zhu, Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system, Appl. Energy 285 (2021) 116386.

[31]

Z. Zhang, J. Chen, Z. Chen, W. Li, Asynchronous episodic deep deterministic policy gradient: toward continuous control in computationally complex environments, IEEE Trans. Cybern. 51 (2) (2019) 604-613.

[32]

T. de Bruin, J. Kober, K. Tuyls, R. Babuška, Improved deep reinforcement learning for robotics through distribution-based experience retention, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2016, October, pp. 3947-3952.

[33]

T. Rojanaarpa, I. Kataeva, Density-based data pruning method for deep reinforcement learning, in: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2016, December, pp. 266-271.

[34]

A.W. Moore, C.G. Atkeson, Prioritized sweeping: reinforcement learning with less data and less time, Mach. Learn. 13 (1) (1993) 103-130.

[35]

T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized Experience Replay, 2015 arXiv preprint arXiv:1511.05952.

[36]

H. Xiang, J. Cheng, Q. Zhang, J. Liu, Synthesized prioritized data pruning based deep deterministic policy gradient algorithm improvement, in: 2018 IEEE International Conference on Information and Automation (ICIA), IEEE, 2018, August, pp. 121-126.

[37]

P. Li, X. Ding, H. Sun, S. Zhao, R. Cajo, Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm, Mobile Information Systems, 2021.

[38]

X. Zhang, L. Duan, Energy-saving deployment algorithms of UAV swarm for sustainable wireless coverage, IEEE Trans. Veh. Technol. 69 (9) (2020) 10320-10335.

[39]

J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, IEEE Trans. Evol. Comput. 10 (3) (2006) 281-295.

[40]

B. Niu, H. Huang, L. Tan, Q. Duan, Symbiosis-based alternative learning multi-swarm particle swarm optimization, IEEE ACM Trans. Comput. Biol. Bioinf 14 (1)(2015) 4-14.

[41]

Y. Liu, W. Zhang, F. Chen, J. Li,Path planning based on improved deep deterministic policy gradient algorithm, in: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), IEEE, 2019, March, pp. 295-299.

[42]

H. Xiang, J. Cheng, Q. Zhang, J. Liu, Synthesized prioritized data pruning based deep deterministic policy gradient algorithm improvement, in: 2018 IEEE International Conference on Information and Automation (ICIA), IEEE, 2018, August, pp. 121-126.

AI Summary AI Mindmap
PDF

72

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/