Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes

SUN Zeyi1, WANG Bin1,2, HU Xinyue1, XIONG Xin1,2, JIN Huaiping1,2

PDF(1510 KB)
PDF(1510 KB)
Journal of Deep Space Exploration ›› 2024, Vol. 11 ›› Issue (3) : 244-255. DOI: 10.15982/j.issn.2096-9287.2024.20230159
Special Issue:Intelligent Landing on Small Celestial Bodies

Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes

  • SUN Zeyi1, WANG Bin1,2, HU Xinyue1, XIONG Xin1,2, JIN Huaiping1,2
Author information +
History +

Abstract

To meet the requirements for autonomy,rapidity,and adaptability in the collaborative planning of each subsystem during the attachment mission of a deep space probe,a collaborative planning strategy based on proximal policy optimization method and multi-agent reinforcement learning was proposed. By combining the single-agent proximal policy optimization algorithm with the hybrid collaborative mechanism of multi-agent,a multi-agent autonomous task planning model was designed. The noise-regularized advantage value ws introduced to solve the problem of overfitting in the collaborative strategy of multi-agent centralized training. Simulation results show that the multi-agent reinforcement learning collaborative autonomous task planning method can intelligently optimize the collaboration strategy of small celestial body attachment missions according to real-time environmental changes,and compared with the previous algorithm,it improves the success rate of task planning and quality of planning solutions,and shortens the time of task planning.

Keywords

multi-agent reinforcement learning / autonomous task planning of deep space exploration / proximal policy optimization / small celestial body attachment

Cite this article

Download citation ▾
SUN Zeyi, WANG Bin, HU Xinyue, XIONG Xin, JIN Huaiping. Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes. Journal of Deep Space Exploration, 2024, 11(3): 244‒255 https://doi.org/10.15982/j.issn.2096-9287.2024.20230159

References

[1] 徐瑞,李朝玉,朱圣英,等. 深空探测器自主规划技术研究进展[J]. 深空探测学报(中英文),2021,8(2):111-123.
XU R,LI Z Y,ZHU S Y,et al. Progress in deep space explorer autonomous planning[J]. Journal of Deep Space Exploration,2021,8(2):111-123.
[2] 姜啸. 基于约束可满足的深空探测器任务规划方法[D]. 北京:北京理工大学,2018.
JIANG X. Mission planning method for deep space probes based on constraint satisfaction [D]. Beijing:Beijing Institute of Technology,2018.
[3] 赵宇庭,徐瑞,李朝玉,等. 基于动态智能体交互图的深空探测器任务规划方法[J]. 深空探测学报(中英文),2021,8(5):519-527.
ZHAO Y T,XU R,LI Z Y,et al. Mission planning method for deep space probe based on dynamic agent interaction diagram[J]. Journal of Deep Space Exploration ,2021,8(5):519-527.
[4] 史兼郡. 基于深度强化学习的空间站短期任务规划方法研究[D]. 长沙:国防科技大学,2020.
SHI J J. Research on short-term task planning method for space station based on deep reinforcement learning[D]. Changsha:National University of Defense Technology,2020.
[5] 柳景兴,王彬,毛维杨,等. 深空探测器任务规划认知图谱及多属性约束冲突检测[J]. 深空探测学报(中英文),2023,10(1):88-96.
LIU J X,WANG B,MAO W Y,et al. Cognitive graph for autonomous deep space mission planning and multi-constraints collision detection[J]. Journal of Deep Space Exploration. 2023,10(1):88-96.
[6] 毛维杨,王彬,柳景兴,等. 基于强化学习的深空探测器自主任务规划方法[J]. 深空探测学报(中英文),2023,10(2):220-230.
MAO W Y,WANG B,LIU J X,et al. An autonomous planning method for deep space exploration tasks in reinforcement learning based on dynamic rewards[J]. Journal of Deep Space Exploration,2023,10(2):220-230.
[7] YU C ,VELU A ,VINITSKY E ,et al. The surprising effectiveness of MAPPO in cooperative,multi-agent games[EB/OL]. (2022-11-4)[2023-11-03]. https://arxiv.org/abs/2103.01955v1.
[8] WANG S Y,CHEN W Y,HU J,et al. Noise-regularized advantage value for multi-agent reinforcement learning[J]. Mathematics, 2022,10(15):2728.
[9] SCHULMAN J,WOLSKI F,DHARIWAL P,et al. Proximal policy optimization algorithms[EB/OL]. (2017)[2023-11-3]. http://arxiv preprint arxiv:1707.06347,2017.
[10] 司雪圆. 基于约束可满足的航天器自主任务规划方法研究[D]. 北京:北京理工大学,2015.
SI X Y. Autonomous mission planning method of spacecraft based on the constraint satisfaction[D]. Beijing:Beijing Institute of Technology,2015
[11] 徐雅男. 小行星附着机构的整机构型设计与动力学分析[D]. 南京:南京航空航天大学,2023.
XU Y N. Whole mechanism design and dynamic analysis of asteroid attachment mechanism[D]. Nanjing:Nanjing University of Aeronautics and Astronautics,2023.
[12] 崔平远,徐瑞,朱圣英,等. 深空探测器自主技术发展现状与趋势[J]. 航空学报,2014,35(1):13-28.
CUI P Y,XU R,ZHU S Y,et al. Development status and trend of deep space probe autonomous technology[J]. Acta Aeromautica et Astaronautica Sinica,2014,35(1):13-28.
[13] KOOTBALLY Z ,SCHLENOFF C ,LAWLER C ,et al. Towards robust assembly with knowledge representation for the planning domain definition language (PDDL)[J]. Robotics and Computer-Integrated Manufacturing,2014,33(C):42-45.
[14] GHALLAB M ,NAU D S ,TRAVERSO P . Automated planning:theory & practice[M]. Burlington:Morgan,2004.
[15] 徐文明. 深空探测器自主任务规划方法研究与系统设计[D]. 哈尔滨:哈尔滨工业大学,2006.
XU W M. Research and system design of autonomous mission planning methods for deep space probes [D]. Harbin:Harbin Institute of Technology,2006.
[16] 冯小恩,李玉庆,杨晨,等. 面向自主运行的深空探测航天器体系结构设计及自主任务规划方法[J]. 控制理论与应用,2019,36(12):2035-2041.
FENG X E,LI Y Q,YANG C,et al. Architecture design and autonomous mission planning for autonomous deep space exploration spacecraft[J]. Control Theory and Application,2019,36(12):2035-2041.
PDF(1510 KB)

Accesses

Citations

Detail

Sections
Recommended

/