Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes

doi:10.15982/j.issn.2096-9287.2024.20230159

PDF(1510 KB)

Journal of Deep Space Exploration ›› 2024, Vol. 11 ›› Issue (3) : 244-255. DOI: 10.15982/j.issn.2096-9287.2024.20230159

Special Issue：Intelligent Landing on Small Celestial Bodies

Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes

SUN Zeyi¹, WANG Bin^1,2, HU Xinyue¹, XIONG Xin^1,2, JIN Huaiping^1,2

Author information +

History +

Abstract

To meet the requirements for autonomy，rapidity，and adaptability in the collaborative planning of each subsystem during the attachment mission of a deep space probe，a collaborative planning strategy based on proximal policy optimization method and multi-agent reinforcement learning was proposed. By combining the single-agent proximal policy optimization algorithm with the hybrid collaborative mechanism of multi-agent，a multi-agent autonomous task planning model was designed. The noise-regularized advantage value ws introduced to solve the problem of overfitting in the collaborative strategy of multi-agent centralized training. Simulation results show that the multi-agent reinforcement learning collaborative autonomous task planning method can intelligently optimize the collaboration strategy of small celestial body attachment missions according to real-time environmental changes，and compared with the previous algorithm，it improves the success rate of task planning and quality of planning solutions，and shortens the time of task planning.

Keywords

multi-agent reinforcement learning / autonomous task planning of deep space exploration / proximal policy optimization / small celestial body attachment

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

SUN Zeyi, WANG Bin, HU Xinyue, XIONG Xin, JIN Huaiping. Multi-Agent Reinforcement Learning Autonomous Task Planning for Deep Space Probes. Journal of Deep Space Exploration, 2024, 11(3): 244‒255 https://doi.org/10.15982/j.issn.2096-9287.2024.20230159

This is a preview of subscription content, contact us for subscripton.

References

[1] 徐瑞，李朝玉，朱圣英，等. 深空探测器自主规划技术研究进展[J]. 深空探测学报（中英文），2021，8（2）：111-123.
XU R，LI Z Y，ZHU S Y，et al. Progress in deep space explorer autonomous planning[J]. Journal of Deep Space Exploration，2021，8（2）：111-123.
[2] 姜啸. 基于约束可满足的深空探测器任务规划方法[D]. 北京：北京理工大学，2018.
JIANG X. Mission planning method for deep space probes based on constraint satisfaction [D]. Beijing：Beijing Institute of Technology，2018.
[3] 赵宇庭，徐瑞，李朝玉，等. 基于动态智能体交互图的深空探测器任务规划方法[J]. 深空探测学报（中英文），2021，8（5）：519-527.
ZHAO Y T，XU R，LI Z Y，et al. Mission planning method for deep space probe based on dynamic agent interaction diagram[J]. Journal of Deep Space Exploration ，2021，8（5）：519-527.
[4] 史兼郡. 基于深度强化学习的空间站短期任务规划方法研究[D]. 长沙：国防科技大学，2020.
SHI J J. Research on short-term task planning method for space station based on deep reinforcement learning[D]. Changsha：National University of Defense Technology，2020.
[5] 柳景兴，王彬，毛维杨，等. 深空探测器任务规划认知图谱及多属性约束冲突检测[J]. 深空探测学报（中英文），2023，10（1）：88-96.
LIU J X，WANG B，MAO W Y，et al. Cognitive graph for autonomous deep space mission planning and multi-constraints collision detection[J]. Journal of Deep Space Exploration. 2023，10（1）：88-96.
[6] 毛维杨，王彬，柳景兴，等. 基于强化学习的深空探测器自主任务规划方法[J]. 深空探测学报（中英文），2023，10（2）：220-230.
MAO W Y，WANG B，LIU J X，et al. An autonomous planning method for deep space exploration tasks in reinforcement learning based on dynamic rewards[J]. Journal of Deep Space Exploration，2023，10（2）：220-230.
[7] YU C ，VELU A ，VINITSKY E ，et al. The surprising effectiveness of MAPPO in cooperative，multi-agent games[EB/OL]. （2022-11-4）[2023-11-03]. https://arxiv.org/abs/2103.01955v1.
[8] WANG S Y，CHEN W Y，HU J，et al. Noise-regularized advantage value for multi-agent reinforcement learning[J]. Mathematics, 2022，10（15）：2728.
[9] SCHULMAN J，WOLSKI F，DHARIWAL P，et al. Proximal policy optimization algorithms[EB/OL]. （2017）[2023-11-3]. http://arxiv preprint arxiv：1707.06347，2017.
[10] 司雪圆. 基于约束可满足的航天器自主任务规划方法研究[D]. 北京：北京理工大学，2015.
SI X Y. Autonomous mission planning method of spacecraft based on the constraint satisfaction[D]. Beijing：Beijing Institute of Technology，2015
[11] 徐雅男. 小行星附着机构的整机构型设计与动力学分析[D]. 南京：南京航空航天大学，2023.
XU Y N. Whole mechanism design and dynamic analysis of asteroid attachment mechanism[D]. Nanjing：Nanjing University of Aeronautics and Astronautics，2023.
[12] 崔平远，徐瑞，朱圣英，等. 深空探测器自主技术发展现状与趋势[J]. 航空学报，2014，35（1）：13-28.
CUI P Y，XU R，ZHU S Y，et al. Development status and trend of deep space probe autonomous technology[J]. Acta Aeromautica et Astaronautica Sinica，2014，35（1）：13-28.
[13] KOOTBALLY Z ，SCHLENOFF C ，LAWLER C ，et al. Towards robust assembly with knowledge representation for the planning domain definition language （PDDL）[J]. Robotics and Computer-Integrated Manufacturing，2014，33（C）：42-45.
[14] GHALLAB M ，NAU D S ，TRAVERSO P . Automated planning：theory & practice[M]. Burlington：Morgan，2004.
[15] 徐文明. 深空探测器自主任务规划方法研究与系统设计[D]. 哈尔滨：哈尔滨工业大学，2006.
XU W M. Research and system design of autonomous mission planning methods for deep space probes [D]. Harbin：Harbin Institute of Technology，2006.
[16] 冯小恩，李玉庆，杨晨，等. 面向自主运行的深空探测航天器体系结构设计及自主任务规划方法[J]. 控制理论与应用，2019，36（12）：2035-2041.
FENG X E，LI Y Q，YANG C，et al. Architecture design and autonomous mission planning for autonomous deep space exploration spacecraft[J]. Control Theory and Application，2019，36（12）：2035-2041.