A Hybrid of RRT∗ and TD3 Deep Reinforcement Learning Algorithm for UAV Path Planning in 3D Partially Unknown Environments

Yanxi HE; Jie QI; Nailong WU

doi:10.19884/j.1672-5220.202407004

Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (6) :639 -649. DOI: 10.19884/j.1672-5220.202407004

Information Technology and Artificial Intelligence

research-article

A Hybrid of RRT^∗ and TD3 Deep Reinforcement Learning Algorithm for UAV Path Planning in 3D Partially Unknown Environments

Yanxi HE ¹
, Jie QI ¹^,²^,*
, Nailong WU ¹^,²

Author information +

History +

PDF (10624KB)

Abstract

To guide an unmanned aerial vehicle (UAV) flying in complex three-dimensional (3D) environments with unknown obstacles, a novel UAV path planning algorithm named IRRT^∗-2TD3 is proposed.The algorithm combines the rapidly-exploring random tree star (RRT^∗) algorithm with the twin delayed deep deterministic policy gradients (TD3) algorithm (a deep reinforcement learning algorithm).By employing exploration strategies from reinforcement learning, IRRT^∗-C2TD3 improves the RRT^∗ algorithm.IRRT^∗-C2TD3 is a two-stage path planning algorithm comprising pre-planning and real-time planning.It performs re-planning of paths by generating paths based on geometric connections toward the goal and smoothing them using cubic B-spline curves.By designing the network architecture and reward function of the TD3 algorithm, real-time planning in unknown environments is achieved based on the pre-planned path from the first stage.Simulation results show that IRRT^∗-C2TD3 demonstrates better path planning performance in 3D partially unknown environments than RRT^∗-C2TD3, M-C2TD3 and MODRRT^∗ algorithms.

Keywords

3D path planning / deep reinforcement learning / rapidly-exploring random tree (RRT) / UAV

Cite this article

Download citation ▾

Yanxi HE, Jie QI, Nailong WU. A Hybrid of RRT^∗ and TD3 Deep Reinforcement Learning Algorithm for UAV Path Planning in 3D Partially Unknown Environments. Journal of Donghua University(English Edition), 2025, 42(6): 639-649 DOI:10.19884/j.1672-5220.202407004

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	LIU Z H, LIU Q, XU W J, et al. Robot learning towards smart robotic manufacturing: a review[J]. Robotics and Computer-Integrated Manufacturing, 2022, 77: 102360.

[2]	HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204.

[3]	SENTHILNATH J, KANDUKURI M, DOKANIA A, et al. Application of UAV imaging platform for vegetation analysis based on spectral-spatial methods[J]. Computers and Electronics in Agriculture, 2017, 140: 8-24.

[4]	SHAHSAVANI H. An aeromagnetic survey carried out using a rotary-wing UAV equipped with a low-cost magneto-inductive sensor[J]. International Journal of Remote Sensing, 2021, 42(23): 8805-8818.

[5]	TANG G, TANG C Q, CLARAMUNT C, et al. Geometric A-star algorithm: an improved A-star algorithm for AGV path planning in a port environment[J]. IEEE Access, 2021, 9: 59196-59210.

[6]	HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4 (2): 100-107.

[7]	XU H Q, XING H X, LIU Y. Path planning of UAV by combining improved ant colony system and dynamic window algorithm[J]. Journal of Donghua University (English Edition), 2023, 40 (6): 676-683.

[8]	KARAMAN S, FRAZZOLI E. Sampling-based algorithms for optimal motion planning[J]. The International Journal of Robotics Research, 2011, 30(7): 846-894.

[9]	LAVALLE S M, KUFFNER J J Jr. Randomized kinodynamic planning[J]. International Journal of Robotics Research, 2001, 20(5): 378-400.

[10]	GAMMELL J D, SRINIVASA S S, BARFOOT T D. Informed RRT∗: optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic[C]//2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2014: 2997-3004.

[11]	LI Y J, WEI W, GAO Y, et al. PQ-RRT∗: an improved path planning algorithm for mobile robots[J]. Expert Systems with Applications, 2020, 152: 113425.

[12]	WANG J K, LI T G, LI B P, et al. GMRRRT^∗: sampling-based path planning using Gaussian mixture regression[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7 (3): 690-700.

[13]	ESHTEHARDIAN S A, KHODAYGAN S. A continuous RRT∗-based path planning method for non-holonomic mobile robots using B-spline curves[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (7): 8693-8702.

[14]	SUN Z Y, SHEN B, PAN A Q, et al. A modified self-adaptive sparrow search algorithm for robust multi-UAV path planning[J]. Journal of Donghua University (English Edition), 2024, 41(6): 630-643.

[15]	QI J, YANG H, SUN H X. MOD-RRT^∗: a sampling-based algorithm for robot path planning in dynamic environment[J]. IEEE Transactions on Industrial Electronics, 2021, 68 (8): 7244-7251.

[16]	VASHISTH A, RÜCKIN J, MAGISTRI F, et al. Deep reinforcement learning with dynamic graphs for adaptive informative path planning[J]. IEEE Robotics and Automation Letters, 2024, 9(9): 7747-7754.

[17]	LI W J, YUE M, SHANGGUAN J Y, et al. Navigation of mobile robots based on deep reinforcement learning: reward function optimization and knowledge transfer[J]. International Journal of Control, Automation and Systems, 2023, 21(2): 563-574.

[18]	LEE M H, MOON J. Deep reinforcement learning-based model-free path planning and collision avoidance for UAVs: a soft actor-critic with hindsight experience replay approach[J]. ICT Express, 2023, 9(3): 403-408.

[19]	ANDRYCHOWICZ M, WOLSKI F, RAY A, et al.Hindsight experience replay[EB/OL].(2018-02-23)[2024-07-20].https://arxiv.org/abs/1707.01495.

[20]	WANG J K, CHI W Z, LI C M, et al. Neural RRT^∗: learning-based optimal path planning[J]. IEEE Transactions on Automation Science and Engineering, 2020, 17(4): 1748-1758.

[21]	WANG J K, JIA X, ZHANG T Y, et al. Deep neural network enhanced sampling-based path planning in 3D space[J]. IEEE Transactions on Automation Science and Engineering, 2022, 19 (4): 3434-3443.

[22]	URAIN J, LE A T, LAMBERT A, et al. Learning implicit priors for motion optimization[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2022: 7672-7679.

[23]	LIU B L, JIANG G D, ZHAO F, et al. Collision-free motion generation based on stochastic optimization and composite signed distance field networks of articulated robot[J]. IEEE Robotics and Automation Letters, 2023, 8 (11): 7082-7089.

[24]	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. [S.l.]: PMLR, 2018: 1587-1596.

[25]	MIENYE I D, SUN Y X. A survey of ensemble learning: concepts, algorithms, applications, and prospects[J]. IEEE Access, 2022, 10: 99129-99149.

[26]	YANG C P, ZHAO Y Q, CAI X, et al. Path planning algorithm for unmanned surface vessel based on multiobjective reinforcement learning[J]. Computational Intelligence and Neuroscience, 2023, 2023 (1): 2146314.

[27]	HUANG S Q, WU X R, HUANG G M.Deep reinforcement learning-based multi-objective 3D path planning for vehicles[C]//Proceedings of 2023 Chinese Intelligent Systems Conference. Singapore: Springer, 2023: 867-875.

[28]	LIU X F, ZHANG P, FANG H, et al. Multiobjective reactive power optimization based on improved particle swarm optimization with ε -greedy strategy and Pareto archive algorithm[J]. IEEE Access, 2021, 9: 65650-65659.

[29]	QU C Z, GAI W D, ZHONG M Y, et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning[J]. Applied Soft Computing, 2020, 89: 106099.

[30]	JOHNSON D. The triangular distribution as a proxy for the beta distribution in risk analysis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1997, 46(3): 387-398.