Hierarchical reinforcement learning for enhancing stability and adaptability of hexapod robots in complex terrains

Shichang Huang , Zhihan Xiao , Minhua Zheng , Wen Shi

Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (3) : 100231 -100231.

PDF (3200KB)
Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (3) : 100231 -100231. DOI: 10.1016/j.birob.2025.100231
Research Article
research-article

Hierarchical reinforcement learning for enhancing stability and adaptability of hexapod robots in complex terrains

Author information +
History +
PDF (3200KB)

Abstract

In the field of hexapod robot control, the application of central pattern generators (CPG) and deep reinforcement learning (DRL) is becoming increasingly common. Compared to traditional control methods that rely on dynamic models, both the CPG and the end-to-end DRL approaches significantly simplify the complexity of designing control models. However, relying solely on DRL for control also has its drawbacks, such as slow convergence speed and low exploration efficiency. Moreover, although the CPG can produce rhythmic gaits, its control strategy is relatively singular, limiting the robot’s ability to adapt to complex terrains. To overcome these limitations, this study proposes a three-layer DRL control architecture. The high-level reinforcement learning controller is responsible for learning the parameters of the middle-level CPG and the low-level mapping functions, while the middle and low level controllers coordinate the joint movements within and between legs. By integrating the learning capabilities of DRL with the gait generation characteristics of CPG, this method significantly enhances the stability and adaptability of hexapod robots in complex terrains. Experimental results show that, compared to pure DRL approaches, this method significantly improves learning efficiency and control performance, when dealing with complex terrains, it considerably enhances the robot’s stability and adaptability compared to pure CPG control.

Keywords

Hexapod robot / Central pattern generation / Reinforcement learning / Complex terrains

Cite this article

Download citation ▾
Shichang Huang, Zhihan Xiao, Minhua Zheng, Wen Shi. Hierarchical reinforcement learning for enhancing stability and adaptability of hexapod robots in complex terrains. Biomimetic Intelligence and Robotics, 2025, 5(3): 100231-100231 DOI:10.1016/j.birob.2025.100231

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Shichang Huang: Writing - review & editing, Writing - original draft, Methodology, Conceptualization. Zhihan Xiao: Writing - review & editing, Writing - original draft, Conceptualization. Minhua Zheng: Resources, Funding acquisition. Wen Shi: Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Beijing Natural Science Foundation - Xiaomi Innovation Joint Fund (L243013) and the National Natural Science Foundation of China (62172392).

Appendix A. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.birob.2025.100231.

References

[1]

J. Wu, H. Yang, R. Li, Q. Ruan, S. Yan, Y. an Yao, Design and analysis of a novel octopod platform with a reconfigurable trunk, Mech. Mach. Theory 156 (2021).

[2]

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, M. Hutter, Learning agile and dynamic motor skills for legged robots, Sci. Robot. 4 (26) (2019) eaau5872.

[3]

P. Čížek, J. Faigl, Self-supervised learning of the biologically-inspired obstacle avoidance of hexapod walking robot, Bioinspiration & Biomimetics 14 (4) (2019) 046002, http://dx.doi.org/10.1088/1748-3190/ab1a9c.

[4]

B. Xia, K. Che, Z. Tang, J. Wang, M.Q.-H. Meng, Motion planning for hexapod robots in dynamic rough terrain environments, in: 2021 IEEE International Conference on Robotics and Biomimetics, ROBIO, 2021, pp. 1611-1616, http://dx.doi.org/10.1109/ROBIO54168.2021.9739381.

[5]

A. Stelzer, H. Hirschmüller, M. Görner, Stereo-vision-based navigation of a six-legged walking robot in unknown rough terrain, Int. J. Robot. Res. 31(4) (2012) 381-402.

[6]

R. Grandia, F. Jenelten, S. Yang, F. Farshidian, M. Hutter, Perceptive locomotion through nonlinear model-predictive control, IEEE Trans. Robot. 39 (5) (2023).

[7]

A. Torres-Pardo, D. Pinto-Fernández, M. Garabini, F. Angelini, D. Rodriguez-Cianca, S. Massardi, J. Tornero, J.C. Moreno, D. Torricelli, Legged locomotion over irregular terrains: state of the art of human and robot performance, Bioinspiration Biomimetics 17 (6) (2022) 061002, http://dx.doi.org/10.1088/1748-3190/ac92b3.

[8]

D. Owaki, A. Ishiguro, A quadruped robot exhibiting spontaneous gait transitions from walking to trotting to galloping, Sci. Rep. 7 (1) (2017).

[9]

H. Tanaka, O. Matsumoto, T. Kawasetsu, K. Hosoda,Enhancing postu-ral stability in musculoskeletal quadrupedal locomotion through tension feedback for CPG-based controller, Bioinspiration Biomimetics (2024) URL

[10]

M. Shafiee, G. Bellegarda, A. Ijspeert, Puppeteer and marionette: Learning anticipatory quadrupedal locomotion based on interactions of a central pattern generator and supraspinal drive, in: 2023 IEEE International Conference Robotics Automation, ICRA, 2023.

[11]

G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng, Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Adv. Bioinform. 27 (2) (2008) 213-228.

[12]

H. Kimura, Y. Fukuoka, A.H. Cohen,Adaptive dynamic walking of a quadruped robot on natural ground based on biological concepts, Philos. Trans. Ser. A, Math. Phys. Eng. Sci. 26 (5) (2007) 475-490.

[13]

L. Righetti, A.J. Ijspeert, Pattern generators with sensory feedback for the control of quadruped locomotion, in: IEEE International Conference on Robotics and Automation, 2008.

[14]

A.J. Ijspeert, A. Crespi, D. Ryczko, J.-M. Cabelguen, From swimming to walking with a salamander robot driven by a spinal cord model, Science 315 (5817) (2007) 1416-1420.

[15]

Y. Zeng, J. Li, S.X. Yang, E. Ren, A bio-inspired control strategy for locomotion of a quadruped robot, Appl. Sci. 8 (1) ( 2018) http://dx.doi.org/10.3390/app8010056,URLhttps://www.mdpi.com/2076-3417/8/1/56.

[16]

D.J. Hyun, S. Seok, J. Lee, S. Kim, High speed trot-running: Implementation of a hierarchical controller using proprioceptive impedance control on the MIT Cheetah, I. J. Robot. Res. 33 (11) (2014) 1417-1445.

[17]

L. Righetti, A.J. Ijspeert, Pattern generators with sensory feedback for the control of quadruped locomotion, in: IEEE International Conference on Robotics and Automation, 2008.

[18]

C. Liu, L. Xia, C. Zhang, Q. Chen, Multi-layered CPG for adaptive walking of quadruped robots, J. Bionic Eng. 15 (2) (2018) 341-355.

[19]

B. Wang, K. Zhang, X. Yang, X. Cui, The gait planning of hexapod robot based on CPG with feedback, Int. J. Adv. Robot. Syst. 17 (3) (2020) 172988142093050.

[20]

J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning quadrupedal locomotion over challenging terrain, Sci. Robot. 5 (47) (2020).

[21]

K. Li, Y. Xu, J. Wang, M.Q.-H. Meng, SARL: Deep reinforcement learning based human-aware navigation for mobile robot in indoor environments, in: 2019 IEEE International Conference on Robotics and Biomimetics, ROBIO, 2019, pp. 688-694, http://dx.doi.org/10.1109/ROBIO49542.2019.8961764.

[22]

N. Rudin, D. Hoeller, P. Reist, M. Hutter, Learning to walk in minutes using massively parallel deep reinforcement learning, Comput. Res. Repos.(2021).

[23]

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, V. Vanhoucke, Sim-to-real: Learning agile locomotion for quadruped robots, Robot.: Sci. Syst. XIV (2018) arXiv:1804.10332.

[24]

A. Kumar, Z. Fu, D. Pathak, J. Malik, RMA: Rapid motor adaptation for legged robots, in: Robotics: Science and Systems, 2021.

[25]

S. Chen, B. Zhang, M.W. Mueller, A. Rai, K. Sreenath, Learning torque control for quadrupedal locomotion, in: 2023 IEEE- RAS 22nd International Conference Humanoid Robots, Humanoids, 2023.

[26]

G. Bellegarda, C. Nguyen, Q. Nguyen, Robust quadruped jumping via deep reinforcement learning, Robot. Auton. Syst. (2024) 104799.

[27]

G. Bellegarda, Y. Chen, Z. Liu, Q. Nguyen, Robust high-speed running for quadruped robots via deep reinforcement learning, in: 2022 EEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2022.

[28]

G. Li, A. Ijspeert, M. Hayashibe, AI-CPG: Adaptive imitated central pattern generators for bipedal locomotion learned through reinforced reflex neural networks, IEEE Robot. Autom. Lett. PP (99) (2024) 1-8.

[29]

B. Guillaume, S. Milad, I. Auke, Visual CPG-RL: Learning central pattern generators for visually-guided quadruped locomotion, in: ICRA 2024, 2024.

[30]

S. Su, Y. Chen, C. Li, K. Ni, J. Zhang, Intelligent control strategy for robotic manta via CPG and deep reinforcement learning, Drones 8 (7) (2024).

[31]

S. Huang, M. Zheng, Z. Hu, P.X. Liu, Enhancing hexapod robot mobility on challenging terrains: Optimizing CPG-generated gait with reinforcement learning, Neurocomputing 622 (2025) 129328.

[32]

W. Ouyang, H. Chi, J. Pang, W. Liang, Q. Ren, Adaptive locomotion control of a hexapod robot via bio-inspired learning, Front. Neurorobotics 15 (2021).

[33]

D. Li, W. Wei, Z. Qiu, Combined reinforcement learning and CPG algorithm to generate terrain-adaptive gait of hexapod robots, Actuators 12 (4)(2023) 157.

[34]

W. Zhang, Q. Gong, H. Yang, Y. Tang, CPG modulates the omnidirectional motion of a hexapod robot in unstructured terrain, J. Bionic Eng. 20 (2)(2023) 558-567, http://dx.doi.org/10.1007/s42235-022-00290-1.

[35]

D. Li, W. Wei, Z. Qiu, Combined reinforcement learning and CPG algorithm to generate terrain-adaptive gait of hexapod robots, Actuators 12 (4)(2023) http://dx.doi.org/10.3390/act12040157.

[36]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, 2017, arXiv:1707.06347. URL https://arxiv.org/abs/1707.06347.

[37]

L. Wang, R. Li, Z. Huangfu, Y. Feng, Y. Chen, A soft actor-critic approach for a blind walking hexapod robot with obstacle avoidance, Actuators 12(10) (2023) http://dx.doi.org/10.3390/act12100393.

[38]

Z. Zang, M. Kawawa-Beaudan, W. Yu, T. Zhang, A. Zakhor, Perceptive hexapod legged locomotion for climbing joist environments, in: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2023, pp. 2738-2745, http://dx.doi.org/10.1109/IROS55552.2023.10341957.

[39]

J. Panerati, H. Zheng, S. Zhou, J. Xu, A. Prorok, A.P. Schoellig, Learning to fly—a gym environment with PyBullet physics for reinforcement learning of multi-agent quadcopter control, in: 2021 IEEE/RSJ International Con-ference on Intelligent Robots and Systems, IROS, 2021, pp. 7512-7519, http://dx.doi.org/10.1109/IROS51168.2021.9635857.

[40]

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018, arXiv:1801.01290. URL https://arxiv.org/abs/1801.01290.

[41]

S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor-critic methods, 2018, arXiv:1802.09477. URL https://arxiv.org/abs/1802.09477.

[42]

V. Mnih, A.P. Badia, M. Mirza, A. Graves, T.P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, 2016, arXiv:1602.01783. URL https://arxiv.org/abs/1602.01783.

AI Summary AI Mindmap
PDF (3200KB)

548

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/