A Comprehensive Survey of Deep Reinforcement Learning Techniques for Soft Mobile Robots

Mohammad Hadi Ghodsi; Hamed Shahbazi; Keivan Torabi

doi:10.70322/dav.2025.10022

Drones Auton. Veh. ›› 2026, Vol. 3 ›› Issue (1) :10022 DOI: 10.70322/dav.2025.10022

Article

research-article

A Comprehensive Survey of Deep Reinforcement Learning Techniques for Soft Mobile Robots

Author information +

History +

PDF (5705KB)

Abstract

Soft robotics has emerged as a promising direction for enabling safe, adaptive, and energy-efficient interactions with unstructured environments due to its inherent compliance. Recently, Deep Reinforcement Learning (DRL) has become a powerful tool for autonomous behavior generation in soft robots, surpassing limitations of classical model-based control. However, despite rapid growth of publications in this domain, there is still a lack of systematic comparative surveys that clarify how different DRL approaches have been used for soft mobile robots, what types of tasks they address, and what performance evaluation criteria have been used. In this article, we review and classify existing works in DRL-enabled soft robotics, focusing particularly on soft mobile systems, and present a structured synthesis of contributions, algorithms, training strategies, and real-world applications. Unlike previous reviews that discuss soft robotics or DRL separately, this paper explicitly provides cross-comparison across DRL paradigms and soft robot tasks, enabling researchers to identify suitable DRL approaches for different soft mobile robotic behaviors. Finally, major challenges and promising future directions are proposed to advance this interdisciplinary research area.

Keywords

Deep reinforcement learning / Soft robotics

Cite this article

Download citation ▾

Mohammad Hadi Ghodsi, Hamed Shahbazi, Keivan Torabi. A Comprehensive Survey of Deep Reinforcement Learning Techniques for Soft Mobile Robots. Drones Auton. Veh., 2026, 3(1): 10022 DOI:10.70322/dav.2025.10022

登录浏览全文

4963

注册一个新账户忘记密码

Author Contributions

Conceptualization: M.H.G. and H.S.; Methodology: M.H.G.; Software: H.S.; Validation: M.H.G., H.S. and K.T.; Formal Analysis: H.S.; Investigation: H.S.; Resources: H.S.; Data Curation: M.H.G.; Writing—Original Draft Preparation: K.T.

Ethics Statement

Not applicable. This study did not involve humans or animals.

Informed Consent Statement

Not applicable. This study did not involve humans.

Data Availability Statement

Not applicable. No datasets were generated or analyzed during the current study.

Funding

This research received no external funding.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Rus D, Tolley MT. Design, fabrication and control of soft robots. Nature 2015, 521, 467-475. doi:10.1038/nature14543.

[2]	Kim D, Kim SH, Kim T, Kang BB, Lee M, Park W, et al. Review of machine learning methods in soft robotics. PLoS ONE 2021, 16, e0246102. doi:10.1371/journal.pone.0246102.

[3]	Kober J, Bagnell JA, Peters J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238-1274. doi:10.1177/0278364913495721.

[4]	Bhagat S, Banerjee H, Ho Tse ZT, Ren H. Deep reinforcement learning for soft, flexible robots: Brief review with impending challenges. Robotics 2019, 8, 4. doi:10.3390/robotics8010004.

[5]	Ijspeert AJ, Crespi A, Ryczko D, Cabelguen JM. From swimming to walking with a salamander robot driven by a spinal cord model. Science 2007, 315, 1416-1420. doi:10.1126/science.1138353.

[6]	Ijspeert AJ. Central pattern generators for locomotion control in animals and robots: A review. Neural Netw. 2008, 21, 642-653. doi:10.1016/j.neunet.2008.03.014.

[7]	Ijspeert AJ, Nakanishi J, Hoffmann H, Pastor P, Schaal S. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput. 2013, 25, 328-373. doi:10.1162/NECOa00393.

[8]	Wang H, Chen J, Lau HY, Ren H. Motion planning based on learning from demonstration for multiple-segment flexible soft robots actuated by electroactive polymers. IEEE Robot. Autom. Lett. 2016, 1, 391-398. doi:10.1109/LRA.2016.2521384.

[9]	Bern JM, Kumagai G, Coros S. Fabrication, Modeling, and Control of Plush Robots. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24-28 September 2017. doi:10.1109/IROS.2017.8206223.

[10]	Zhang H, Cao R, Zilberstein S, Wu F, Chen X. Toward effective soft robot control via reinforcement learning. In International Conference on Intelligent Robotics and Applications; Springer International Publishing: Cham, Switzerland, 2017; pp. 173-184.

[11]	Nakajima K, Hauser H, Li T, Pfeifer R. Exploiting the dynamics of soft materials for machine learning. Soft Robot. 2018, 3, 339-347. doi:10.1089/soro.2017.0075.

[12]	Yang T, Xiao Y, Zhang Z, Liang Y, Li G, Zhang M, et al. A soft artificial muscle driven robot with reinforcement learning. Sci. Rep. 2018, 8, 14518. doi:10.1038/s41598-018-32757-9.

[13]	Katzschmann RK, DelPreto J, MacCurdy R, Rus D. Exploration of underwater life with an acoustically controlled soft robotic fish. Sci. Robot. 2018, 3, eaar3449. doi:10.1126/scirobotics.aar3449.

[14]	Thuruthel TG, Falotico E, Renda F, Laschi C. Model-based reinforcement learning for closed-loop dynamic control of soft robotic manipulators. IEEE Trans. Robot. 2018, 35, 124-134. doi:10.1109/TRO.2018.2878318.

[15]	Homberg B, Sanders W, Morrow J, Correll N. Robust proprioceptive grasping with a soft robot hand. Auton. Robot. 2018, 43, 681-696. doi:10.1007/s10514-018-9754-1.

[16]	Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S. Learning to walk via deep reinforcement learning. arXiv 2018, arXiv:1812.11103. doi:10.48550/arXiv.1812.11103.

[17]	Ishige M, Umedachi T, Taniguchi T, Kawahara Y. Exploring behaviors of caterpillar-like soft robots with a central pattern generator-based controller and reinforcement learning. Soft Robot. 2019, 6, 579-594. doi:10.1089/soro.2018.0126.

[18]	Satheeshbabu S, Uppalapati NK, Chowdhary G, Krishnan G. Open loop position control of soft continuum arm using deep reinforcement learning. IEEE Robot. Autom. Lett. 2019, 5133-5139. doi:10.1109/ICRA.2019.8793653.

[19]	Park T, Cha Y. Soft mobile robot inspired by animal-like running motion. Sci. Rep. 2019, 9, 14700. doi:10.1038/s41598-019-51308-4.

[20]

Truby RL, Katzschmann RK, Lewis JA, Rus D. Soft Robotic Fingers with Embedded Ionogel Sensors and Discrete Actuation Modes for Somatosensi ive Manipulation. In Proceedings of the 2019 2nd IEEE International Conference on Soft Robotics (RoboSoft), Seoul, Republic of Korea, 14-18 April 2019; IEEE: New York, NY, USA, 2019; pp. 322-329. doi:10.1109/ROBOSOFT.2019.8722722.

[21]

Berndt A, Pfaff P, Ryll A, Allaire PE. Soft Robot Control with a Learned Differentiable Model. In Proceedings of the 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft), New Haven, CT, USA, 15 May-15 July 2020; IEEE: New York, NY, USA, 2020; pp. 417-423. doi:10.1109/RoboSoft48309.2020.9116011.

[22]	Li G, Shintake S, Shintake M. Deep Reinforcement Learning Framework for Underwater Locomotion of Soft Robot. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2021), Xi’an, China, 30 May-5 June 2021; pp. 12033-12039. doi:10.1109/ICRA48506.2021.9561145.

[23]	Raeisinezhad M, Pagliocca N, Koohbor B, Trkov M. Design optimization of a pneumatic soft robotic actuator using model-based optimization and deep reinforcement learning. Front. Robot. AI 2021, 8, 639102. doi:10.3389/frobt.2021.639102.

[24]	Yin X, Müller R. Integration of deep learning and soft robotics for a biomimetic approach to nonlinear sensing. Nat. Mach. Intell. 2021, 3, 507-512. doi:10.1038/s42256-021-00330-1.

[25]	Thanabalan HP. Learning Soft Robot and Soft Actuator Dynamics Using Deep Neural Network. Master’s Thesis, Queen Mary University of London, London, UK, 2020.

[26]	Lin YH, Siddall R, Schwab F, Fukushima T, Banerjee H, Baek Y, et al. Modeling and Control of a Soft Robotic Fish with Integrated Soft Sensing. Adv. Intell. Syst. 2023, 5, 2000244. doi:10.1002/aisy.202000244.

[27]

Wang Z, Li X, Cheng L, Shen Y. Review on Reinforcement Learning Controller in Soft Manipulator. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Sanya, China, 27-31 December 2021; IEEE: New York, NY, USA, 2021; pp. 558-563. doi:10.1109/ROBIO54168.2021.9739342.

[28]	Sutton RS, Barto AG. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1999; Volume 17, pp. 229-235.

[29]	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529-533. doi:10.1038/nature14236.

[30]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. doi:10.48550/arXiv.1707.06347.

[31]	Fujimoto S, Hoof H, Meger D. Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning; PMLR: Norfolk, MA, USA, 2018; pp. 1587-1596.

[32]	Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. doi:10.48550/arXiv.1509.02971.

[33]	Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P. Trust Region Policy Optimization (Trpo). In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6-11 July 2015.

[34]	Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, Todorov E, et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv 2017, arXiv:1709.10087. doi:10.48550/arXiv.1709.10087.

[35]	Duriez C. Control of Elastic Soft Robots Based on Real-Time Finite Element Method. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6-10 May 2013; IEEE: New York, NY, USA, 2013; pp. 3982-3987. doi:10.1109/ICRA.2013.6631138.

[36]	Todorov E, Erez T, Tassa Y. Mujoco: A Physics Engine for Model-Based Control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7-12 October 2012; IEEE: New York, NY, USA, 2012; pp. 5026-5033.

[37]	Finn C, Abbeel P, Levine S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6-11 August 2017; PMLR: Norfolk, MA, USA, 2017; Volume 70, pp. 1126-1135.

[38]	Peng XB, Abbeel P, Levine S, van de Panne M. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG) 2018, 37, 1-14. doi:10.1145/3197517.3201311.

[39]

Tzeng E, Devin C, Hoffman J, Finn C, Abbeel P, Levine S, et al. Adapting deep visuomotor representations with weak pairwise constraints. In Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 688-703. doi:10.1007/978-3-030-43089-4-44.

[40]	Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1-40.

[41]	Kumar S, Rani K, Banga V. Robotic arm movement optimization using soft computing. IAES Int. J. Robot. Autom. (IJRA) 2017, 6, 1-14. doi:10.11591/ijra.v6i1.pp1-14.

[42]	Qin L, Liang X, Huang H, Chui CK, Yeow RCH, Zhu J. A versatile soft crawling robot with rapid locomotion. Soft Robot. 2019, 6, 455-467. doi:10.1089/soro.2018.0124.

[43]

Botteghi N, Obbink R, Geijs D, Poel M, Sirmacek B, Brune C, et al. Low Dimensional State Representation Learning with Reward-Shaped Priors. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10-15 January 2021; IEEE: New York, NY, USA, 2021; pp. 3736-3743.

[44]	Miriyev A, Stack K, Lipson H. Soft material for soft actuators. Nat. Commun. 2017, 8, 596. doi:10.1038/s41467-017-00685-3.

[45]	Merckling A, Perrin-Gilbert N, Coninx A, Doncieux S. Exploratory state representation learning. Front. Robot. AI 2022, 9, 762051. doi:10.3389/frobt.2022.762051.

[46]	Zhu J, Navarro B, Passama R, Fraisse P, Crosnier A, Cherubini A. Robotic manipulation planning for shaping deformable linear objects with environmental contacts. IEEE Robot. Autom. Lett. 2019, 5, 16-23. doi:10.1109/LRA.2019.2944304.

[47]	Majidi C. Soft robotics: A perspective—current trends and prospects for the future. Soft Robot. 2014, 1, 5-11. doi:10.1089/soro.2013.0001.

[48]	Laschi C, Cianchetti M, Mazzolai B, Margheri L, Follador M, Dario P. Soft robot arm inspired by the octopus. Adv. Robot. 2012, 26, 709-727 doi:10.1163/156855312X626343.

[49]	Chen T, Ben A, Jiaheng H, Rohan C, Roberto M-M, Peter S. Deep reinforcement learning for robotics: A survey of real-world successes. Annu. Rev. Control. Robot. Auton. Syst. 2025, 26, 153-188. doi:10.1146/annurev-control-030323-022510.

[50]

Janner M, Fu J, Zhang M, Levine S. When to Trust your Model: Model-Based Policy Optimization. Part of Advances in Neural Information Processing Systems. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8-14 December 2019; Volume 32.

[51]	He L, Qi X, Chen D, Li Z, Tan X. DiSA-IQL: Offline Reinforcement Learning for Robust Soft Robot Control under Distribution Shifts. arXiv 2025, arXiv:2510.00358. doi:10.48550/arXiv.2510.00358.

[52]	Cianchetti M, Laschi C, Menciassi A, Dario P. Biomedical applications of soft robotics. Nat. Rev. Mater. 2018, 3, 143-153. doi:10.1038/s41578-018-0022-y.

[53]	Trivedi D, Rahn CD, Kier WM, Walker ID. Soft robotics: Biological inspiration, state of the art, and future research. Appl. Bionics Biomech. 2008, 5, 99-117. doi:10.1080/11762320802557865.

[54]	Lee D. Reinforcement Learning Techniques for Autonomous Robots in Virtual Environments with LLM-Based Multimodal Data Integration and Virtual Embodiment. Clin. Res. Clin. Trials 2025, 12. doi:10.31579/2693-4779/263.

[55]	Naughton N, Sun J, Tekinalp A, Parthasarathy T, Chowdhary G, Gazzola M. Elastica: A compliant mechanics environment for soft robotic control. IEEE Robot. Autom. Lett. 2021, 6, 3389-3396. doi:10.1109/LRA.2021.3063698.

[56]

Peng XB, Andrychowicz M, Zaremba W, Abbeel P. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21-25 May 2018; IEEE: New York, NY, USA, 2018; pp. 3803-3810. doi:10.1109/ICRA.2018.8460528.

[57]

James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, et al. Sim-to-Real via Sim-to-Sim: Data-Efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15-20 June 2019; pp. 12627-12637.

[58]	Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, et al. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4, eaau5872. doi:10.1126/scirobotics.aau5872.

[59]	Shintake J, Cacucciolo V, Shea H, Floreano D. Soft biomimetic fish robot made of dielectric elastomer actuators. Soft Robot. 2018, 5, 466-474. doi:10.1089/soro.2017.0062.

[60]	Della Santina C, Catalano MG, Bicchi A. Soft robots. In Encyclopedia of Robotics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1-14.

[61]	Kim KJ, Tadokoro S. Electroactive Polymers for Robotic Applications: Artificial Muscles and Sensors; Springer: Berlin/Heidelberg, Germany, 2007; Volume 23, p. 291.

[62]	Shepherd RF, Ilievski F, Choi W, Morin SA, Stokes AA, Mazzeo AD, et al. Multigait soft robot. Proc. Natl. Acad. Sci. USA 2011, 108, 20400-20403. doi:10.1073/pnas.1116564108.

[63]	Zhao H, O’Brien K, Li S, Shepherd RF. Optoelectronically innervated soft prosthetic hand via stretchable optical waveguides Sci. Robot. 2016, 1, eaai7529. doi:10.1126/scirobotics.aai7529.

[64]	Marchese AD. Design, Fabrication, and Control of Soft Robots with Fluidic Elastomer Actuators. Ph.D. Dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 2015.

[65]

Li L, Donato E, Lomonaco V, Falotico E. Continual Policy Distillation of Reinforcement Learning-Based Controllers for Soft Robotic In-Hand Manipulation. In Proceedings of the 2024 IEEE 7th International Conference on Soft Robotics (RoboSoft), San Diego, CA, USA, 14-17 April 2024; IEEE: New York, NY, USA, 2024; Volume 37, pp. 1026-1033. doi:10.1109/RoboSoft60065.2024.10522027.

[66]	Nagabandi A, Finn C, Levine S. Deep online learning via meta-learning: Continual adaptation for model-based rl. arXiv 2018, arXiv:1812.07671. doi:10.48550/arXiv.1812.07671.

[67]	Garcıa J, Fernández F. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 2015, 16, 1437-1480.

[68]

Ding Z, Tsai Y-Y, Lee WW, Huang B. Sim-to-Real Transfer for Robotic Manipulation with Tactile Sensory. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September-1 October 2021; IEEE: New York, NY, USA, 2021; pp. 6778-6785. doi:10.48550/arXiv.2103.00410.

[69]	Lowe R, Wu YI, Tamar A, Harb J, Pieter A, Mordatch I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4-9 December 2017.

[70]	Krichmar JL, Chou T-S. A tactile robot for developmental disorder therapy. In Proceedings of the Technology, Mind, and Society; TechMindSociety: Online, 2018; pp. 1-6. doi:10.1145/3183654.3183657.

[71]	Bryson JJ. Patiency is not a virtue: The design of intelligent systems and systems of ethics. Ethics Inf. Technol. 2018, 20, 15-26. doi:10.1007/s10676-018-9448-6.