An Autonomous Planning Method for Deep Space Exploration Tasks in Reinforcement Learning Based on Dynamic Rewards

doi:10.15982/j.issn.2096-9287.2023.20220049

PDF(7381 KB)

Journal of Deep Space Exploration ›› 2023, Vol. 10 ›› Issue (2) : 220-230. DOI: 10.15982/j.issn.2096-9287.2023.20220049

Research Papers

An Autonomous Planning Method for Deep Space Exploration Tasks in Reinforcement Learning Based on Dynamic Rewards

MAO Weiyang¹, WANG Bin^1,2, LIU Jingxing¹, XIONG Xin¹

Author information +

History +

Abstract

Aiming at the characteristics of multi-system parallelism and the need to meet various constraints in the proceAiming at the characteristics of multi-system parallelism and the need to meet various constraints in the process of autonomous mission planning of deep space detectors, a reinforcement learning task autonomous planning model construction method for deep space detectors was proposed based on dynamic rewards, and a deep space detector agent was established. In the interactive environment, a policy network and a loss function integrating resource constraints, time constraints and timing constraints were constructed, and a dynamic reward mechanism was proposed to improve the traditional policy gradient learning method. The simulation results show that the method in this paper could realize autonomous task planning. Compared with the static reward policy gradient algorithm, the planning success rate and planning efficiency were significantly improved, and the method could start planning in any state without changing the model structure, which improved the accuracy of the algorithm. This method provides a new solution for autonomous mission planning and decision-making of deep space probes.

Keywords

deep space exploration / task planning / policy gradient / reinforcement learning / dynamic reward

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

MAO Weiyang, WANG Bin, LIU Jingxing, XIONG Xin. An Autonomous Planning Method for Deep Space Exploration Tasks in Reinforcement Learning Based on Dynamic Rewards. Journal of Deep Space Exploration, 2023, 10(2): 220‒230 https://doi.org/10.15982/j.issn.2096-9287.2023.20220049

This is a preview of subscription content, contact us for subscripton.

References

[1] 崔平远. 深空探测:空间拓展的战略制高点[J]. 人民论坛·学术前沿,2017(5):13-18
CUI P Y. Deep space exploration:strategic height of space expansion[J]. People’s Forum. Academic Frontier,2017(5):13-18
[2] 于登云,张兴旺,张明,等. 小天体采样探测技术发展现状及展望[J]. 航天器工程,2020,29(2):1-10
YU D Y,ZHANG X W,ZHANG M,et al. Current status and prospects of small object sampling and detection technology[J]. Spacecraft Engineering,2020,29(2):1-10
[3] 赵凡宇,徐瑞,崔平远. 启发式深空探测器任务规划方法[J]. 宇航学报,2015,36(5):496-503
ZHAO F Y,XU R,CUI P Y. Heuristic mission planning method for deep space probes[J]. Journal of Astronautics,2015,36(5):496-503
[4] 姜啸,徐瑞,朱圣英. 基于约束可满足的深空探测任务规划方法研究[J]. 深空探测学报(中英文),2018,5(3):262-268
JIANG X,XU R,ZHU S Y. Research on constrained satisfiable deep space mission planning method[J]. Journal of Deep Space Exploration,2018,5(3):262-268
[5] 姜啸,徐瑞,陈俐均. 深空探测器动态约束规划中的外延约束过滤方法研究[J]. 深空探测学报(中英文),2019,6(6):586-594
JIANG X,XU R,CHEN L J. Study on extensive constraint filtering method for dynamic constraint planning of deep space detector[J]. Journal of Deep Space Exploration,2019,6(6):586-594
[6] 金颢,徐瑞,朱圣英,等. 适用于深空探测器的时间线转移路标启发式规划方法[J]. 宇航学报,2021,42(7):862-872
JIN B,XU R,ZHU S Y,et al. Time line transfer landmark heuristic planning method for deep space detector[J]. Journal of Astronautics,2021,42(7):862-872
[7] 赵宇庭,徐瑞,李朝玉,等. 基于动态智能体交互图的深空探测器任务规划方法[J]. 深空探测学报(中英文),2021,8(5):519-527
ZHAO Y T,XU R,LI C Y,et al. Mission planning method for deep space probe based on dynamic agent interaction diagram[J]. Journal of Deep Space Exploration,2021,8(5):519-527
[8] 王晓晖,李爽. 深空探测器约束简化与任务规划方法研究[J]. 宇航学报,2016,37(7):768-774
WANG X H,LI S. Research on constraint simplification and task planning method for deep space detector[J]. Journal of Astronautics,2016,37(7):768-774
[9] 冯小恩,李玉庆,杨晨,等. 面向自主运行的深空探测航天器体系结构设计及自主任务规划方法[J]. 控制理论与应用,2019,36(12):2035-2041
FENG X E,LI Y Q,YANG C,et al. Architecture design and autonomous mission planning for autonomous deep space exploration spacecraft[J]. Control Theory and Application,2019,36(12):2035-2041
[10] 王鑫,赵清杰,徐瑞. 基于知识图谱的深空探测器任务规划建模[J]. 深空探测学报(中英文),2021,8(3):315-323
WANG X,ZHAO Q J,XU R. Modeling of deep space probe mission planning based on knowledge map[J]. Journal of Deep Space Exploration,2021,8(3):315-323
[11] 李玉庆,徐敏强,王日新. 航天器观测重调度问题中的模糊性不确定因素及其处理[J]. 宇航学报,2009,30(3):1106-1111
Li Y Q,XU M Q,WANG R X. Fuzzy uncertainty factors in spacecraft observation rescheduling problem and their processing[J]. Journal of Astronautics,2009,30(3):1106-1111
[12] 贺东雷,冯小恩,雷明佳,等. 面向深空探测任务的实数遗传编码多星任务规划算法[J]. 控制理论与应用,2019,36(12):2055-2064
HE D L,FENG X E,LEI M J,et al. Real-number genetic encoding multistar mission planning algorithm for deep space mission[J]. Control Theory and Application,2019,36(12):2055-2064
[13] SUTTON R S, BARTO AG. Reinforcement learning：an introduction[J]. IEEE Transactions on Neural Networks，1998，9(5)：1054.
[14] 史兼郡,张进,罗亚中,等. 基于深度强化学习算法的空间站任务重规划方法[J]. 载人航天,2020,26(4):469-476
SHIJ J,ZHANG J,LUO Y Z,et al. Space station task replanning method based on deep enhanced learning algorithm[J]. Manned Space,2020,26(4):469-476
[15] 伍国威，崔本杰，曲耀斌，等. 基于深度强化学习的卫星实时引导任务规划方法及系统：中国，CN111950873A[P]. 2022-11-15.
WU G W，CUI B J，QU Y B，et al. Satellite real-time guidance mission planning method and system based on deep reinforcement learning：China，CN111950873A[P]. 2022-11-15.
[16] 郭林杰. 基于深度强化学习的跳跃式小行星探测器规划策略研究[D]. 哈尔滨：哈尔滨工业大学，2019.
GUO L J. Study on planning strategy of skip asteroid detector based on deep reinforcement learning [D]. Harbin：Harbin University of Technology，2019.
[17] FURFARO R，LINARES R. Deep learning for autonomous lunar landing[C]// Proceedings of AAS/AIAA Astrodynamics Specialist Conference. [S. l.]：AIAA，2018.
[18] HECKE K V，DE CROON G C H E，HENNES D，et al. Self-supervised learning as an enabling technology for future space exploration robots：ISS experiments on monocular distance learning[J]. Acta Astronautica，2017：S0094576517302862.
[19] 徐瑞,李朝玉,朱圣英,等. 深空探测器自主规划技术研究进展[J]. 深空探测学报(中英文),2021,8(2):111-123
XU R,LI C Y,ZHU S Y,et al. Progress in deep space explorer autonomous planning[J]. Journal of Deep Space Exploration,2021,8(2):111-123
[20] 刘志荣,姜树海. 基于强化学习的移动机器人路径规划研究综述[J]. 制造业自动化,2019,41(3):90-92
LIU Z R,JIANG S H. A review of path planning for mobile robots based on reinforcement learning[J]. Manufacturing Automation,2019,41(3):90-92
[21] 俞胜平，韩忻辰，袁志明，等. 基于策略梯度强化学习的高铁列车动态调度方法[J]. 控制与决策，2022（9）：2407-2417.
YU S P，HAN X C，YUAN Z M，et al. Dynamic scheduling method of high-speed train based on policy gradient reinforcement learning [J]. Control and Decision, 2022（9）：2407-2417.
[22] 张淼,张琦,刘文韬,等. 一种基于策略梯度强化学习的列车智能控制方法[J]. 铁道学报,2020,42(1):69-75
ZHANG B,ZHANG Q,LIU W T,et al. A train intelligent control method based on strategic gradient enhanced learning[J]. Journal of Railways,2020,42(1):69-75
[23] 周飞燕,金林鹏,董军. 卷积神经网络研究综述[J]. 计算机学报,2017,40(6):1229-1251
ZHOU F Y,JIN L P,DONG J. A review of convolution neural networks[J]. Journal of Computer Science,2017,40(6):1229-1251
[24] 李高杨,吕晓鹏,张星. 基于强化学习的交通信号控制及深度学习应用[J]. 人工智能,2020(3):84-9
LI G Y,LV X P,ZHANG X. Application of traffic signal control and in-depth learning based on reinforcement learning[J]. Artificial Intelligence,2020(3):84-9