Communication-aided multi-UAV collision detection and avoidance based on two-stage curriculum reinforcement learning

Guanzheng Wang , Xiangke Wang , Zhiqiang Miao , Zhihong Liu , Xinyu Hu

Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (4) : 100253

PDF (2537KB)
Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (4) :100253 DOI: 10.1016/j.birob.2025.100253
Research Article
research-article

Communication-aided multi-UAV collision detection and avoidance based on two-stage curriculum reinforcement learning

Author information +
History +
PDF (2537KB)

Abstract

Currently, multi-UAV collision detection and avoidance is facing many challenges, such as navigating in cluttered environments with dynamic obstacles while equipped with low-cost perception devices having a limited field of view (FOV). To this end, we propose a communication-aided collision detection and avoidance method based on curriculum reinforcement learning (CRL). This method integrates perception and communication data to improve environmental understanding, allowing UAVs to handle potential collisions that may go unnoticed. Furthermore, given the challenges in policy learning caused by the substantial differences in scale between perception and communication data, we employ a two-stage training approach, which performs training with the network expanded from part to whole. In the first stage, we train a partial policy network in an obstacle-free environment for inter-UAV collision avoidance. In the second stage, the full network is trained in a complex environment with obstacles, enabling both inter-UAV collision avoidance and obstacle avoidance. Experiments with PX4 software-in-the-loop (SITL) simulations and real flights demonstrate that our method outperforms state-of-the-art baselines in terms of reliability of collision avoidance, including the DRL-based method and NH-ORCA (Non-Holonomic Optimal Reciprocal Collision Avoidance). Besides, the proposed method achieves zero-shot transfer from simulation to real-world environments that were never experienced during training.

Keywords

Collision detection and avoidance / End-to-end / Multi-UAV / Two-stage / Curriculum reinforcement learning

Cite this article

Download citation ▾
Guanzheng Wang, Xiangke Wang, Zhiqiang Miao, Zhihong Liu, Xinyu Hu. Communication-aided multi-UAV collision detection and avoidance based on two-stage curriculum reinforcement learning. Biomimetic Intelligence and Robotics, 2025, 5(4): 100253 DOI:10.1016/j.birob.2025.100253

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Jia Wu, Chunbo Luo, Yang Luo, Ke Li, Distributed UAV swarm formation and collision avoidance strategies over fixed and switching topologies, IEEE Trans. Cybern. 52 (10) (2021) 10969-10979.

[2]

Zhihong Liu, Xiangke Wang, Lincheng Shen, Shulong Zhao, Yirui Cong, Jie Li, Dong Yin, Shengde Jia, Xiaojia Xiang, Mission-oriented miniature fixed-wing UAV swarms: A multilayered and distributed architecture, IEEE Trans. Syst. Man, Cybern.: Syst. 52 (3) (2022) 1588-1602.

[3]

Chao Yan, Chang Wang, Xiaojia Xiang, Zhen Lan, Yuna Jiang, Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing UAVs using local situation maps, IEEE Trans. Ind. Informatics 18 (2) (2021) 1260-1270.

[4]

Ruihua Han, Shengduo Chen, Shuaijun Wang, Zeqing Zhang, Rui Gao, Qi Hao, Jia Pan, Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards, IEEE Robot. Autom. Lett. 7 (3) (2022) 5896-5903.

[5]

Boyu Zhou, Fei Gao, Luqi Wang, Chuhao Liu, Shaojie Shen, Robust and efficient quadrotor trajectory generation for fast autonomous flight, IEEE Robot. Autom. Lett. 4 (4) (2019) 3529-3536.

[6]

Xin Zhou, Zhepei Wang, Hongkai Ye, Chao Xu, Fei Gao, EGO-planner: An ESDF-free gradient-based local planner for quadrotors, IEEE Robot. Autom. Lett. 6 (2) (2021) 478-485.

[7]

Xin Zhou, Jiangchao Zhu, Hongyu Zhou, Chao Xu, Fei Gao, EGO-Swarm: A Fully Autonomous and Decentralized Quadrotor Swarm System in Cluttered Environments, in: IEEE International Conference on Robotics and Automation, ICRA, 2021, pp. 4101-4107.

[8]

Jur Van Den Berg, Ming Lin, Dinesh Manocha, Reciprocal velocity obstacles for real-time multi-agent navigation, in: IEEE International Conference on Robotics and Automation, ICRA, ISBN: 1424416469, 2008, pp. 1928-1935.

[9]

Jur Van Den Berg, Stephen J Guy, Ming Lin, Dinesh Manocha, Reciprocal n-body collision avoidance, Robotics Research: The 14th International Symposium ISRR, Springer, 2011, pp. 3-19.

[10]

Parker Conroy, Daman Bareiss, Matt Beall, Jur Van Den Berg, 3-d reciprocal collision avoidance on physical quadrotor helicopters with on-board sensing for relative positioning, 2014, arXiv preprint arXiv:1411.3794.

[11]

Richard S. Sutton, Andrew G. Barto, Reinforcement learning: An introduction, MIT Press, Cambridge, MA, USA, 2018.

[12]

Yu Fan Chen, Miao Liu, Michael Everett, Jonathan P. How, Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning, in: IEEE International Conference on Robotics and Automation, ICRA, ISBN: 150904633X, 2017, pp. 285-292.

[13]

Tingxiang Fan, Pinxin Long, Wenxi Liu, Jia Pan, Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios, Int. J. Robot. Res. 39 (7) (2020) 856-892.

[14]

Petru Soviany, Radu Tudor Ionescu, Paolo Rota, Nicu Sebe, Curriculum learning: A survey, Int. J. Comput. Vis. 130 (6) (2022) 1526-1565.

[15]

Zhihong Liu, Xin Xu, Peng Qiao, Dongsheng Li, Acceleration for deep reinforcement learning using parallel and distributed computing: A survey, ACM Comput. Surv. 57 (4) (2024) 1-35.

[16]

Xin Wang, Yudong Chen, Wenwu Zhu, A survey on curriculum learning, IEEE Trans. Pattern Anal. Mach. Intell. 44 (9) (2021) 4555-4576.

[17]

Chao Yan, Chang Wang, Xiaojia Xiang, Kin Huat Low, Xiangke Wang, Xin Xu, Lincheng Shen, Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments: a task-specific curriculum-based MADRL approach, IEEE Trans. Neural Networks Learn. Syst. 35 (8) (2023) 10894-10908.

[18]

L. Meier, D. Honegger, M. Pollefeys, PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms, in: IEEE International Conference on Robotics and Automation, ICRA, ISBN: 1050-4729, 2015, pp. 6235-6240.

[19]

Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan, Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, in: International Conference on Robotics and Automation, ICRA, ISBN: 1538630818, 2018, pp. 6252-6259.

[20]

Xin Zhou, Xiangyong Wen, Zhepei Wang, Yuman Gao, Haojia Li, Qianhao Wang, Tiankai Yang, Haojian Lu, Yanjun Cao, Chao Xu, Swarm of micro flying robots in the wild, Sci. Robot. 7 (66) (2022) eabm5954.

[21]

Jesus Tordesillas, Jonathan P. How, MADER: Trajectory planner in multiagent and dynamic environments, IEEE Trans. Robot. 38 (1) (2021) 463-476.

[22]

Guangtong Xu, Tianyue Wu, Zihan Wang, Qianhao Wang, Fei Gao, Flying on point clouds with reinforcement learning, 2025, arXiv preprint arXiv:2503.00496.

[23]

A. Loquercio, E. Kaufmann, R. Ranftl, M. Muller, V. Koltun, D. Scaramuzza, Learning high-speed flight in the wild, Sci. Robot. 6 (59) (2021) eabg5810.

[24]

Guangyu Zhao, Tianyue Wu, Yeke Chen, Fei Gao, Learning speed adaptation for flight in clutter, IEEE Robot. Autom. Lett. 9 (8) (2024) 7222-7229.

[25]

Xin Zhou, Chao Xu, Fei Gao, Automatic parameter adaptation for quadrotor trajectory planning, IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2022, pp. 3348-3355.

[26]

Steven Roelofsen, Alcherio Martinoli, Denis Gillet, 3D collision avoidance algorithm for unmanned aerial vehicles with limited field of view constraints, in: IEEE Conference on Decision and Control, CDC, 2016, pp. 2555-2560.

[27]

Pinxin Long, Wenxi Liu, Jia Pan, Deep-learned collision avoidance policy for distributed multiagent navigation, IEEE Robot. Autom. Lett. 2 (2) (2017) 656-663.

[28]

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li, End-to-end autonomous driving: Challenges and frontiers, 2023, arXiv preprint arXiv:2306.16927.

[29]

B. Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, Patrick Perez, Deep reinforcement learning for autonomous driving: A survey, IEEE Trans. Intell. Transp. Syst. 23 (6) (2022) 4909-4926.

[30]

Luc Le Mero, Dewei Yi, Mehrdad Dianati, Alexandros Mouzakitis, A survey on imitation learning techniques for end-to-end autonomous vehicles, IEEE Trans. Intell. Transp. Syst. 23 (9) (2022) 14128-14147.

[31]

Andrew Clark, Control barrier functions for stochastic systems, Automatica 130 (2021) 109688.

[32]

Zhijun Wang, Tengfei Hu, Lijun Long, Multi-UAV safe collaborative transportation based on adaptive control barrier function, IEEE Trans. Syst. Man, Cybern.: Syst. 53 (11) (2023) 6975-6983.

[33]

Dawei Wang, Tingxiang Fan, Tao Han, Jia Pan, A two-stage reinforcement learning approach for multi-UAV collision avoidance under imperfect sensing, IEEE Robot. Autom. Lett. 5 (2) (2020) 3098-3105.

[34]

Ramzi Ourari, Kai Cui, Ahmed Elshamanhory, Heinz Koeppl, Nearest-neighbor-based collision avoidance for quadrotors via reinforcement learning, in: International Conference on Robotics and Automation, ICRA, ISBN: 1728196817, 2022, pp. 293-300.

[35]

Huaxing Huang, Guijie Zhu, Zhun Fan, Hao Zhai, Yuwei Cai, Ze Shi, Zhaohui Dong, Zhifeng Hao, Vision-based Distributed Multi-UAV Collision Avoidance via Deep Reinforcement Learning for Navigation, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, ISBN: 1665479272, 2022, pp. 13745-13752.

[36]

Hang Yu, Christophe De Wagter, Guido C.H. E. de Croon, MAVRL: Learn to fly in cluttered environments with varying speed, IEEE Robot. Autom. Lett. 10 (2) (2025) 1441-1448.

[37]

Mihir Kulkarni, Kostas Alexis, Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding, in: IEEE International Conference on Robotics and Automation, ICRA, 2024, pp. 15781-15788.

[38]

Ivan Lopez-Sanchez, Javier Moreno-Valenzuela, PID control of quadrotor UAVs: A survey, Annu. Rev. Control. 56 (2023) 100900.

[39]

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, High-dimensional continuous control using generalized advantage estimation, 2015, arXiv preprint arXiv:1506.02438.

[40]

Diederik P. Kingma, Jimmy Ba, Adam: A method for stochastic optimization, 2014, ArXiv Preprint ArXiv:.1412.6980.

[41]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, Proximal policy optimization algorithms, 2017, arXiv preprint arXiv:1707.06347.

[42]

Robert Penicka, Yunlong Song, Elia Kaufmann, Davide Scaramuzza, Learning minimum-time flight in cluttered environments, IEEE Robot. Autom. Lett. 7 (3) (2022) 7209-7216.

[43]

Jiawei Fu, Yunlong Song, Yan Wu, Fisher Yu, Davide Scaramuzza, Learning deep sensorimotor policies for vision-based autonomous drone racing, IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2023, pp. 5243-5250.

[44]

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson, Counterfactual multi-agent policy gradients, in: Proceedings of the AAAI Conference on Artificial Intelligence, 32, (1) 2018.

[45]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, Continuous control with deep reinforcement learning, 2015, arXiv preprint arXiv:1509.02971.

[46]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu,

[47]

Kun Xiao, Shaochang Tan, Guohui Wang, Xueyan An, Xiang Wang, Xiangke Wang, Xtdrone: A customizable multi-rotor UAVs simulation platform, International Conference on Robotics and Automation Sciences, ICRAS, IEEE, 2020, pp. 55-61.

AI Summary AI Mindmap
PDF (2537KB)

117

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/