From knowing to doing: Learning diverse motor skills through instruction learning

Linqi Ye , Yi Cheng , Jiayi Li , Xianhao Wang , Bin Liang , Yan Peng

Biomimetic Intelligence and Robotics ›› 2026, Vol. 6 ›› Issue (1) : 100286

PDF
Biomimetic Intelligence and Robotics ›› 2026, Vol. 6 ›› Issue (1) :100286 DOI: 10.1016/j.birob.2026.100286
Research Article
research-article
From knowing to doing: Learning diverse motor skills through instruction learning
Author information +
History +
PDF

Abstract

Recent years have witnessed many successful trials in the robot learning field. For contact-rich robotic tasks, it is challenging to learn coordinated motor skills by reinforcement learning. Imitation learning solves this problem by using a mimic reward to encourage the robot to track a given reference trajectory. However, imitation learning is not so efficient and may constrain the learned motion. In this paper, we propose instruction learning, which is inspired by the human learning process and is highly efficient, flexible, and versatile for robot motion learning. Instead of using a reference signal in the reward, instruction learning applies a reference signal directly as a feedforward action, and it is combined with a feedback action learned by reinforcement learning to control the robot. Besides, we propose the action bounding technique and remove the mimic reward, which is shown to be crucial for efficient and flexible learning. We compare the performance of instruction learning with imitation learning, indicating that instruction learning can greatly speed up the training process and guarantee learning the desired motion correctly. The effectiveness of instruction learning is validated through a bunch of motion learning examples for a biped robot and a quadruped robot, where skills can be learned typically within several million steps. Besides, we also conduct sim-to-real transfer and online learning experiments on a real quadruped robot. Instruction learning has shown great merits and potential, making it a promising alternative for imitation learning.

Keywords

Reinforcement learning / Legged locomotion / Instruction learning / Quadruped robots / Biped robots

Cite this article

Download citation ▾
Linqi Ye, Yi Cheng, Jiayi Li, Xianhao Wang, Bin Liang, Yan Peng. From knowing to doing: Learning diverse motor skills through instruction learning. Biomimetic Intelligence and Robotics, 2026, 6(1): 100286 DOI:10.1016/j.birob.2026.100286

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Linqi Ye: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Investigation, Formal analysis, Conceptualization. Yi Cheng: Writing – original draft, Validation, Methodology, Data curation. Jiayi Li: Writing – original draft, Methodology. Xianhao Wang: Validation, Investigation. Bin Liang: Project administration, Conceptualization. Yan Peng: Supervision, Resources.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (62003188 and 92248304), in part by Science and Technology Commission of Shanghai Municipality (24511103304).

Thanks for the insightful discussion with Professor Andy Ruina. Thanks for the experimental facilities provided by the National Demonstration Center for Experimental Engineering Training Education, Shanghai University.

Appendix A. Supplementary data

The code is available at https://github.com/Lr-2002/raisimLib and https://github.com/loongOpen/Unity-RL-Playground/Supplementary material related to this article can be found online at https://doi.org/10.1016/j.birob.2026.100286.

References

[1]

J. Hwangbo, J. Lee, A. Dosovitskiy, et al., Learning agile and dynamic motor skills for legged robots, Sci. Robot. 4 (26) (2019) eaau5872.

[2]

J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning quadrupedal locomotion over challenging terrain, Sci. Robot. 5 (47) (2020) eabc5986.

[3]

A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, D. Scaramuzza, Learning high-speed flight in the wild, Sci. Robot. 6 (59) (2021) eabg5810.

[4]

O.M. Andrychowicz, B. Baker, M. Chociej, et al., Learning dexterous in-hand manipulation, Int. J. Robot. Res. Vol. 39 (1) (2020) 3-20.

[5]

T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al., Continuous control with deep reinforcement learning, in: International Conference on Learning Representations, ICLR, San Juan, Puerto rico, 2016, pp. 1-14.

[6]

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International Conference on Machine Learning, ICML, Stockholm, Sweden, 2018, pp. 1861-1870.

[7]

X.B. Peng, E. Coumans, T. Zhang, T.W. Lee, J. Tan, S. Levine, Learning agile robotic locomotion skills by imitating animals, Robotics: Science and Systems, RSS, Corvalis, Oregon, USA, 2020, pp. 1-12.

[8]

L.K. Ma, Z. Yang, X. Tong, B. Guo, K. Yin, Learning and exploring motor skills with spacetime bounds, Comput. Graph. Forum 40 (2) (2021) 251-263.

[9]

H. Duan, J. Dao, K. Green, T. Apgar, A. Fern, J. Hurst, Learning Task Space Actions for Bipedal Locomotion, in: 2021 IEEE International Conference on Robotics and Automation, ICRA, Xi’an, China, 2021, pp. 1276-1282.

[10]

J. Siekmann, Y. Godse, A. Fern, J. Hurst, Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition, in: 2021 IEEE International Conference on Robotics and Automation, ICRA, Xi’an, China, 2021, pp. 7309-7315.

[11]

G. Ji, J. Mun, H. Kim, J. Hwangbo, Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion, IEEE Robot. Autom. Lett. 7 (2) (2022) 4630-4637, https://doi.org/10.1109/LRA.2022.3151396.

[12]

G.B. Margolis, G. Yang, K. Paigwar, T. Chen, P. Agrawal, Rapid locomotion via reinforcement learning, Int. J. Robot. Res. 43 (4) (2022) 572-587.

[13]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning robust perceptive locomotion for quadrupedal robots in the wild, Sci. Robot. 7 (62) (2022) eabk2822.

[14]

G.B. Margolis, P. Agrawal, Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior, in: Conference on Robot Learning, CoRL, 2023, pp. 22-31.

[15]

Y. Kim, et al., Not only rewards but also constraints: Applications on legged robot locomotion, IEEE Trans. Robot. 40 (2024) 2984-3003.

[16]

N. Rudin, D. Hoeller, P. Reist, M. Hutter, Learning to walk in minutes using massively parallel deep reinforcement learning, in: Conference on Robot Learning, CoRL, 2022, pp. 91-100.

[17]

V. Makoviychuk, L. Wawrzyniak, Y. Guo, et al., Isaac gym: High performance gpu-based physics simulation for robot learning, in: Conference on Neural Information Processing Systems, NeurIPS, Virtual, Online, 2021.

[18]

X.B. Peng, P. Abbeel, S. Levine, M. Van de Panne, Deepmimic: Example-guided deep reinforcement learning of physics-based character skills, ACM Trans. Graph. 37 (4) (2018) 1-14.

[19]

Y. Jin, X. Liu, Y. Shao, H. Wang, W. Yang, High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning, Nat. Mach. Intell., 4 (12) 1198–1208, 2033

[20]

J. Hua, L. Zeng, G. Li, Z. Ju, Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning, Sensors 21 (4) (2021) 1278.

[21]

F. Torabi, G. Warnell, P. Stone, Behavioral cloning from observation, in: International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, 2018.

[22]

S. Arora, P. Doshi, A survey of inverse reinforcement learning: Challenges, methods and progress, Artificial Intelligence 297 (2021) 103500.

[23]

J. Ho, S. Ermon, Generative adversarial imitation learning, Adv. Neural Inf. Process. Syst., 29 (2016).

[24]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., Generative adversarial nets, Adv. Neural Inf. Process. Syst., 27 (2014).

[25]

A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, A.A. Bharath, Generative adversarial networks: An overview, IEEE Signal Process. Mag. 35 (1) (2018) 53-65, https://doi.org/10.1109/MSP.2017.2765202.

[26]

N. Baram, O. Anschel, I. Caspi, S. Mannor, End-to-end differentiable adversarial imitation learning, in: International Conference on Machine Learning, ICML, 2017, pp. 390-399.

[27]

X.B. Peng, Z. Ma, P. Abbeel, S. Levine, A. Kanazawa, Amp: Adversarial motion priors for stylized physics-based character control, ACM Trans. Graph. (ToG) 40 (4) (2021) 1-20.

[28]

X.B. Peng, Y. Guo, L. Halper, S. Levine, S. Fidler, Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters, ACM Trans. Graph. 41 (4) (2022) 1-17.

[29]

E. Vollenweider, M. Bjelonic, V. Klemm, N. Rudin, J. Lee, M. Hutter, Advanced skills through multiple adversarial motion priors in reinforcement learning, in: 2023 IEEE International Conference on Robotics and Automation, ICRA, 2023, pp. 5120-5126.

[30]

X.B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real Transfer of Robotic Control with Dynamics Randomization, in: 2018 IEEE International Conference on Robotics and Automation, ICRA, 2018, pp. 3803-3810.

[31]

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2017, pp. 23-30.

[32]

A. Kumar, Z. Fu, D. Pathak, J. Malik, RMA: Rapid motor adaptation for legged robots, Robotics: Science and Systems, RSS, (2021).

[33]

Z. Zhuang, et al., Robot Parkour Learning, in: Conference on Robot Learning, CoRL, Atlanta, United states, 2023.

[34]

W. Tan, X. Fang, W. Zhang, R. Song, T. Chen, Y. Zheng, Y. Li, A hierarchical framework for quadruped locomotion based on reinforcement learning, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2021, pp. 8462-8468.

[35]

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, D. Quillen, Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, Int. J. Robot. Researc 37 (4–5) (2018) 421-436.

[36]

P. Wu, A. Escontrela, D. Hafner, K. Goldberg, P. Abbeel, DayDreamer: World models for physical robot learning, Conference on Robot Learning, PMLR, 2023, pp. 2226-2240.

[37]

T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, S. Levine, Learning to walk via deep reinforcement learning, Robotics: Science and Systems, RSS, Freiburg im Breisgau, Germany, (2019).

[38]

S. Ha, P. Xu, Z. Tan, S. Levine, J. Tan, Learning to Walk in the Real World with Minimal Human Effort, in: Conference on Robot Learning, CoRL, Virtual, Online, 2020.

[39]

Y. Yang, K. Caluwaerts, A. Iscen, T. Zhang, J. Tan, V. Sindhwani, Data effcient reinforcement learning for legged robots, Conference on Robot Learning, PMLR, 2020, pp. 1-10.

[40]

M. Bloesch, et al., Towards real robot learning in the wild: A case study in bipedal locomotion, Conference on Robot Learning, PMLR, 2022, pp. 1502-1511.

[41]

A. Gupta, et al., Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention, in: 2021 IEEE International Conference on Robotics and Automation, ICRA, 2021, pp. 6664-6671.

[42]

L. Smith, J.C. Kew, X. Bin Peng, S. Ha, J. Tan, S. Levine, Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, in: International Conference on Robotics and Automation, ICRA, 2022, pp. 1593-1599.

[43]

L. Smith, I. Kostrikov, S. Levine, A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning, Robotics: Science and Systems, RSS, Daegu, Republic of Korea, (2023).

[44]

K.E. Adolph, J.M. Franchak, The development of motor behavior, Wiley Interdiscip. Rev. Cogn. Sci. 8 (1-2) (2017) e1430.

[45]

A. Ruina, Efficient, robust, and nimble open-source legged robot in progress, (2019),http://ruina.tam.cornell.edu/research/topics/locomotion_and_robotics/Tik-Tok/index.html. (Accessed 16 August 2023).

PDF

24

Accesses

0

Citation

Detail

Sections
Recommended

/