PDF
(5965KB)
Abstract
To address real-time path planning requirements for multi-unmanned aerial vehicle (multi-UAV) collaboration in environments, this study proposes an improved multi-agent deep deterministic policy gradient algorithm with prioritized experience replay (PER-MADDPG). By designing a multi-dimensional state representation incorporating relative positions, velocity vectors, and obstacle distance fields, we construct a composite reward function integrating safe obstacle avoidance, formation maintenance, and energy efficiency for environment perception and multi-objective collaborative optimization. The prioritized experience replay mechanism dynamically adjusts sampling weights based on temporal difference (TD) errors, enhancing learning efficiency for high-value samples. Simulation experiments demonstrate that our method generates real-time collaborative paths in 3D complex obstacle environments, reducing training time by 25.3% and 16.8% compared to traditional MADDPG and multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithms respectively, while achieving smaller path length variances among UAVs. Results validate the effectiveness of prioritized experience replay in multi-agent collaborative decision-making.
Keywords
multi-unmanned aerial vehicle (multi-UAV)
/
path planning
/
deep deterministic policy gradient
/
prioritized experience replay
Cite this article
Download citation ▾
Cailong Wu, Caiyi Chen, Zhengyu Guo, Jian Zhang, Delin Luo.
Multi-UAV Cooperative Path Planning Based on the Improved MADDPG.
Journal of Beijing Institute of Technology, 2026, 35(1): 31-43 DOI:10.15918/j.jbit1004-0579.2025.039