PDF
(10624KB)
Abstract
To guide an unmanned aerial vehicle (UAV) flying in complex three-dimensional (3D) environments with unknown obstacles, a novel UAV path planning algorithm named IRRT∗-2TD3 is proposed.The algorithm combines the rapidly-exploring random tree star (RRT∗) algorithm with the twin delayed deep deterministic policy gradients (TD3) algorithm (a deep reinforcement learning algorithm).By employing exploration strategies from reinforcement learning, IRRT∗-C2TD3 improves the RRT∗ algorithm.IRRT∗-C2TD3 is a two-stage path planning algorithm comprising pre-planning and real-time planning.It performs re-planning of paths by generating paths based on geometric connections toward the goal and smoothing them using cubic B-spline curves.By designing the network architecture and reward function of the TD3 algorithm, real-time planning in unknown environments is achieved based on the pre-planned path from the first stage.Simulation results show that IRRT∗-C2TD3 demonstrates better path planning performance in 3D partially unknown environments than RRT∗-C2TD3, M-C2TD3 and MODRRT∗ algorithms.
Keywords
3D path planning
/
deep reinforcement learning
/
rapidly-exploring random tree (RRT)
/
UAV
Cite this article
Download citation ▾
Yanxi HE, Jie QI, Nailong WU.
A Hybrid of RRT∗ and TD3 Deep Reinforcement Learning Algorithm for UAV Path Planning in 3D Partially Unknown Environments.
Journal of Donghua University(English Edition), 2025, 42(6): 639-649 DOI:10.19884/j.1672-5220.202407004
| [1] |
LIU Z H, LIU Q, XU W J, et al. Robot learning towards smart robotic manufacturing: a review[J]. Robotics and Computer-Integrated Manufacturing, 2022, 77: 102360.
|
| [2] |
HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204.
|
| [3] |
SENTHILNATH J, KANDUKURI M, DOKANIA A, et al. Application of UAV imaging platform for vegetation analysis based on spectral-spatial methods[J]. Computers and Electronics in Agriculture, 2017, 140: 8-24.
|
| [4] |
SHAHSAVANI H. An aeromagnetic survey carried out using a rotary-wing UAV equipped with a low-cost magneto-inductive sensor[J]. International Journal of Remote Sensing, 2021, 42(23): 8805-8818.
|
| [5] |
TANG G, TANG C Q, CLARAMUNT C, et al. Geometric A-star algorithm: an improved A-star algorithm for AGV path planning in a port environment[J]. IEEE Access, 2021, 9: 59196-59210.
|
| [6] |
HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4 (2): 100-107.
|
| [7] |
XU H Q, XING H X, LIU Y. Path planning of UAV by combining improved ant colony system and dynamic window algorithm[J]. Journal of Donghua University (English Edition), 2023, 40 (6): 676-683.
|
| [8] |
KARAMAN S, FRAZZOLI E. Sampling-based algorithms for optimal motion planning[J]. The International Journal of Robotics Research, 2011, 30(7): 846-894.
|
| [9] |
LAVALLE S M, KUFFNER J J Jr. Randomized kinodynamic planning[J]. International Journal of Robotics Research, 2001, 20(5): 378-400.
|
| [10] |
GAMMELL J D, SRINIVASA S S, BARFOOT T D. Informed RRT∗: optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic[C]//2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2014: 2997-3004.
|
| [11] |
LI Y J, WEI W, GAO Y, et al. PQ-RRT∗: an improved path planning algorithm for mobile robots[J]. Expert Systems with Applications, 2020, 152: 113425.
|
| [12] |
WANG J K, LI T G, LI B P, et al. GMRRRT∗: sampling-based path planning using Gaussian mixture regression[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7 (3): 690-700.
|
| [13] |
ESHTEHARDIAN S A, KHODAYGAN S. A continuous RRT∗-based path planning method for non-holonomic mobile robots using B-spline curves[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (7): 8693-8702.
|
| [14] |
SUN Z Y, SHEN B, PAN A Q, et al. A modified self-adaptive sparrow search algorithm for robust multi-UAV path planning[J]. Journal of Donghua University (English Edition), 2024, 41(6): 630-643.
|
| [15] |
QI J, YANG H, SUN H X. MOD-RRT∗: a sampling-based algorithm for robot path planning in dynamic environment[J]. IEEE Transactions on Industrial Electronics, 2021, 68 (8): 7244-7251.
|
| [16] |
VASHISTH A, RÜCKIN J, MAGISTRI F, et al. Deep reinforcement learning with dynamic graphs for adaptive informative path planning[J]. IEEE Robotics and Automation Letters, 2024, 9(9): 7747-7754.
|
| [17] |
LI W J, YUE M, SHANGGUAN J Y, et al. Navigation of mobile robots based on deep reinforcement learning: reward function optimization and knowledge transfer[J]. International Journal of Control, Automation and Systems, 2023, 21(2): 563-574.
|
| [18] |
LEE M H, MOON J. Deep reinforcement learning-based model-free path planning and collision avoidance for UAVs: a soft actor-critic with hindsight experience replay approach[J]. ICT Express, 2023, 9(3): 403-408.
|
| [19] |
ANDRYCHOWICZ M, WOLSKI F, RAY A, et al.Hindsight experience replay[EB/OL].(2018-02-23)[2024-07-20].https://arxiv.org/abs/1707.01495.
|
| [20] |
WANG J K, CHI W Z, LI C M, et al. Neural RRT∗: learning-based optimal path planning[J]. IEEE Transactions on Automation Science and Engineering, 2020, 17(4): 1748-1758.
|
| [21] |
WANG J K, JIA X, ZHANG T Y, et al. Deep neural network enhanced sampling-based path planning in 3D space[J]. IEEE Transactions on Automation Science and Engineering, 2022, 19 (4): 3434-3443.
|
| [22] |
URAIN J, LE A T, LAMBERT A, et al. Learning implicit priors for motion optimization[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2022: 7672-7679.
|
| [23] |
LIU B L, JIANG G D, ZHAO F, et al. Collision-free motion generation based on stochastic optimization and composite signed distance field networks of articulated robot[J]. IEEE Robotics and Automation Letters, 2023, 8 (11): 7082-7089.
|
| [24] |
FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. [S.l.]: PMLR, 2018: 1587-1596.
|
| [25] |
MIENYE I D, SUN Y X. A survey of ensemble learning: concepts, algorithms, applications, and prospects[J]. IEEE Access, 2022, 10: 99129-99149.
|
| [26] |
YANG C P, ZHAO Y Q, CAI X, et al. Path planning algorithm for unmanned surface vessel based on multiobjective reinforcement learning[J]. Computational Intelligence and Neuroscience, 2023, 2023 (1): 2146314.
|
| [27] |
HUANG S Q, WU X R, HUANG G M.Deep reinforcement learning-based multi-objective 3D path planning for vehicles[C]//Proceedings of 2023 Chinese Intelligent Systems Conference. Singapore: Springer, 2023: 867-875.
|
| [28] |
LIU X F, ZHANG P, FANG H, et al. Multiobjective reactive power optimization based on improved particle swarm optimization with ε -greedy strategy and Pareto archive algorithm[J]. IEEE Access, 2021, 9: 65650-65659.
|
| [29] |
QU C Z, GAI W D, ZHONG M Y, et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning[J]. Applied Soft Computing, 2020, 89: 106099.
|
| [30] |
JOHNSON D. The triangular distribution as a proxy for the beta distribution in risk analysis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1997, 46(3): 387-398.
|
Funding
National Natural Science Foundation of China(62173084)
Foundation of Shanghai Committee of Science and Technology, China(23ZR1401800)
Foundation of Shanghai Committee of Science and Technology, China(22JC1401403)