SoftGrasp: Adaptive grasping for dexterous hand based on multimodal imitation learning

Yihong Li , Ce Guo , Junkai Ren , Bailiang Chen , Chuang Cheng , Hui Zhang , Huimin Lu

Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (2) : 100217

PDF (5396KB)
Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (2) : 100217 DOI: 10.1016/j.birob.2025.100217
Research Article

SoftGrasp: Adaptive grasping for dexterous hand based on multimodal imitation learning

Author information +
History +
PDF (5396KB)

Abstract

Biomimetic grasping is crucial for robots to interact with the environment and perform complex tasks, making it a key focus in robotics and embodied intelligence. However, achieving human-level finger coordination and force control remains challenging due to the need for multimodal perception, including visual, kinesthetic, and tactile feedback. Although some recent approaches have demonstrated remarkable performance in grasping diverse objects, they often rely on expensive tactile sensors or are restricted to rigid objects. To address these challenges, we introduce SoftGrasp, a novel multimodal imitation learning approach for adaptive, multi-stage grasping of objects with varying sizes, shapes, and hardness. First, we develop an immersive demonstration platform with force feedback to collect rich, human-like grasping datasets. Inspired by human proprioceptive manipulation, this platform gathers multimodal signals, including visual images, robot finger joint angles, and joint torques, during demonstrations. Next, we utilize a multi-head attention mechanism to align and integrate multimodal features, dynamically allocating attention to ensure comprehensive learning. On this basis, we design a behavior cloning method based on an angle-torque loss function, enabling multimodal imitation learning. Finally, we validate SoftGrasp in extensive experiments across various scenarios, demonstrating its ability to adaptively adjust joint forces and finger angles based on real-time inputs. These capabilities result in a 98% success rate in real-world experiments, achieving dexterous and stable grasping. Source code and demonstration videos are available at https://github.com/nubot-nudt/SoftGrasp.

Keywords

Adaptive grasping / Dexterous hand / Multimodal fusion / Imitation learning

Cite this article

Download citation ▾
Yihong Li, Ce Guo, Junkai Ren, Bailiang Chen, Chuang Cheng, Hui Zhang, Huimin Lu. SoftGrasp: Adaptive grasping for dexterous hand based on multimodal imitation learning. Biomimetic Intelligence and Robotics, 2025, 5(2): 100217 DOI:10.1016/j.birob.2025.100217

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Yihong Li: Writing - review & editing, Writing - original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Ce Guo: Writing - review & editing, Investigation. Junkai Ren: Writing - review & editing, Supervision, Conceptualization. Bailiang Chen: Writing - original draft, Formal analysis, Data curation. Chuang Cheng: Resources, Validation. Hui Zhang: Supervision, Resources. Huimin Lu: Supervision, Resources, Project administration, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Innovation Science Foundation of National University of Defense Technology, China (24-ZZCX-GZZ-11), and the National Science Foundation of China (62373201).

Appendix A. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.birob.2025.100217.

References

[1]

M.A. Roa, M.J. Argus, D. Leidner, C. Borst, G. Hirzinger, Power grasp plan-ning for anthropomorphic robot hands, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation, ICRA, 2012, pp. 563-569, http://dx.doi.org/10.1109/ICRA.2012.6225068.

[2]

Y. Huang, D. Fan, H. Duan, D. Yan, W. Qi, J. Sun, Q. Liu, P. Wang, Human-like dexterous manipulation for anthropomorphic five-fingered hands: A review, Biomim. Intell. Robot. (2025) 100212, http://dx.doi.org/10.1016/j.birob.2025.100212.

[3]

T. Chen, J. Xu, P. Agrawal, A system for general in-hand object re-orientation, in:Proc. of the Conf. on Robot Learning (CoRL), 2022, pp. 297-307, https://proceedings.mlr.press/v164/chen22a.html.

[4]

Y. Chen, Y. Geng, F. Zhong, J. Ji, J. Jiang, Z. Lu, H. Dong, Y. Yang, Bi-DexHands: Towards human-level bimanual dexterous manipulation, IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 46 (5) (2024) 2804-2818, http://dx.doi.org/10.1109/TPAMI.2023.3339515.

[5]

Q. Hu, E. Dong, D. Sun, Soft gripper design based on the integration of flat dry adhesive, soft actuator, and microspine, IEEE Trans. Robot. (TRO) 37 (4) (2021) 1065-1080, http://dx.doi.org/10.1109/TRO.2020.3043981.

[6]

S. Xu, Y. Zhang, L. Jia, K.E. Mathewson, K.-I. Jang, J. Kim, H. Fu, X. Huang, P. Chava, R. Wang, S. Bhole, L. Wang, Y.J. Na, Y. Guan, M. Flavin, Z. Han, Y. Huang, J.A. Rogers, Soft microfluidic assemblies of sensors, circuits, and radios for the skin, Science 344 (6179) (2014) 70-74, http://dx.doi.org/10.1126/science.1250169.

[7]

A.P. Gerratt, N. Sommer, S.P. Lacour, A. Billard, Stretchable capacitive tactile skin on humanoid robot fingers - First experiments and results, in: Proc. of the IEEE Intl. Conf. on Humanoid Robots (Humanoids), 2014, pp. 238-245, http://dx.doi.org/10.1109/HUMANOIDS.2014.7041366.

[8]

A. Billard, D. Kragic, Trends and challenges in robot manipulation, Science 364 (6446) (2019) eaat8414, http://dx.doi.org/10.1126/science.aat8414.

[9]

F. Liu, F. Sun, B. Fang, X. Li, S. Sun, H. Liu, Hybrid robotic grasping with a soft multimodal gripper and a deep multistage learning scheme, IEEE Trans. Robot. (TRO) 39 (3) (2023) 2379-2399, http://dx.doi.org/10.1109/TRO.2023.3238910.

[10]

M. Oller, J. Wu, Z. Wu, J.B. Tenenbaum, A. Rodriguez, See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion, Science Robotics 4 (26) (2019) eaav3123, http://dx.doi.org/10.1126/scirobotics.aav3123.

[11]

I. Huang, Y. Narang, C. Eppner, B. Sundaralingam, M. Macklin, R. Bajcsy, T. Hermans, D. Fox, DefGraspSim: Physics-based simulation of grasp outcomes for 3D deformable objects, IEEE Robot. Autom. Lett. (RA-L) 7 (3) (2022) 6274-6281, http://dx.doi.org/10.1109/LRA.2022.3158725.

[12]

W. Xie, Z. Yu, Z. Zhao, B. Zuo, Y. Wang, HMDO: Markerless multi-view hand manipulation capture with deformable objects, Graph. Models 127 (2023) 101178, http://dx.doi.org/10.1016/j.gmod.2023.101178.

[13]

X. Li, Y. Guo, Y. Tu, Y. Ji, Y. Liu, J. Ye, C. Zheng, Textureless deformable object tracking with invisible markers, IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI, Early Access) (2024) 1-12., http://dx.doi.org/10.1109/TPAMI.2024.3463422.

[14]

J. Bohg, K. Hausman, B. Sankaran, O. Brock, D. Kragic, S. Schaal, G.S. Sukhatme, Interactive perception: Leveraging action in perception and perception in action, IEEE Trans. Robot. (TRO) 33 (6) (2017) 1273-1291, http://dx.doi.org/10.1109/TRO.2017.2721939.

[15]

H. Duan, Y. Yang, D. Li, P. Wang, Human-robot object handover: Recent progress and future direction, Biomim. Intell. Robot. 4 (1) (2024) 100145, http://dx.doi.org/10.1016/j.birob.2024.100145.

[16]

M.T. Mason, Toward robotic manipulation, Annu. Rev. Control. Robot. Auton. Syst. 1 (2018) 1-28, http://dx.doi.org/10.1146/annurev-control-060117-104848.

[17]

S. Funabashi, T. Isobe, F. Hongyi, A. Hiramoto, A. Schmitz, S. Sugano, T. Ogata, Multi-fingered in-hand manipulation with various object properties using graph convolutional networks and distributed tactile sensors, IEEE Robot. Autom. Lett. (RA-L) 7 (2) (2022) 2102-2109, http://dx.doi.org/10.1109/LRA.2022.3142417.

[18]

O. Kroemer, S. Niekum, G. Konidaris, A review of robot learning for manipulation: Challenges, representations, and algorithms, J. Mach. Learn. Res. 22 (30) (2021) 1-82, http://dx.doi.org/10.5555/3546258.3546288.

[19]

A. Palleschi, F. Angelini, C. Gabellieri, D.W. Park, L. Pallottino, A. Bicchi, M. Garabini, Grasp it like a pro 2.0: A data-driven approach exploiting basic shape decomposition and human data for grasping unknown objects, IEEE Trans. Robot. (TRO) 39 (5) (2023) 4016-4036, http://dx.doi.org/10.1109/TRO.2023.3286115.

[20]

F. Ceola, L. Rosasco, L. Natale, RESPRECT: Speeding-up multi-fingered grasping with residual reinforcement learning, IEEE Robot. Autom. Lett. (RA-L) 9 (4) (2024) 3045-3052, http://dx.doi.org/10.1109/LRA.2024.3363532.

[21]

J. Orbik, A. Agostini, D. Lee, Inverse reinforcement learning for dexterous hand manipulation, in: Proc. of the IEEE Intl. Conf. on Development and Learning (ICDL), 2021, pp. 1-7, http://dx.doi.org/10.1109/ICDL49984.2021.9515637.

[22]

I. Radosavovic, X. Wang, L. Pinto, J. Malik, State-only imitation learning for dexterous manipulation, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, 2021, pp. 7865-7871, http://dx.doi.org/10.1109/IROS51168.2021.9636557.

[23]

A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, S. Levine, Learning complex dexterous manipulation with deep reinforcement learning and demonstrations, in: Proc. of Robotics: Science and Systems, RSS, 14, Pittsburgh, Pennsylvania, 2018, http://dx.doi.org/10.15607/RSS.2018.XIV.049.

[24]

L. Shao, T. Migimatsu, Q. Zhang, K. Yang, J. Bohg, Concept2Robot: Learning manipulation concepts from instructions and human demonstrations, Int. J. Robot. Res. (IJRR) 40 (12-14) (2021) 1419-1434, http://dx.doi.org/10.1177/02783649211046285.

[25]

Y. Qin, Y.-H. Wu, S. Liu, H. Jiang, R. Yang, Y. Fu, X. Wang, DexMV: Imitation learning for dexterous manipulation from human videos, in: Proc. of the Europ. Conf. on Computer Vision, ECCV, 2022, pp. 570-587, http://dx.doi.org/10.1007/978-3-031-19842-7_33.

[26]

S. Bahl, R. Mendonca, L. Chen, U. Jain, D. Pathak, Affordances from human videos as a versatile representation for robotics, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 01-13, http://dx.doi.org/10.1109/CVPR52729.2023.01324.

[27]

H. Yin, A. Varava, D. Kragic, Modeling, learning, perception, and control methods for deformable object manipulation, Sci. Robot. 6 (54) (2021) eabd8803, http://dx.doi.org/10.1126/scirobotics.abd8803.

[28]

Z. Yang, Y. Wang, Y. Jiang, H. Zhang, C. Yang, DeformerNet based 3D deformable objects shape servo control for bimanual robot manipulation, in: Proc. of the IEEE Intl. Conf. on Industrial Technology, ICIT, 2024, pp. 1-7, http://dx.doi.org/10.1109/ICIT58233.2024.10570080.

[29]

M. Yu, K. Lv, H. Zhong, S. Song, X. Li, Global model learning for large deformation control of elastic deformable linear objects: An efficient and adaptive approach, IEEE Trans. Robot. (TRO) 39 (1) (2023) 417-436, http://dx.doi.org/10.1109/TRO.2022.3200546.

[30]

S. Li, Z. Huang, T. Chen, T. Du, H. Su, J.B. Tenenbaum, C. Gan, DexDeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics,in:Proc. the Intl. Conf. on Learning Representations, ICLR, 2023, https://openreview.net/forum?id=LIV7-_7pYPl.

[31]

R. Wu, C. Ning, H. Dong, Learning foresightful dense visual affordance for deformable object manipulation, in: Proc. of the IEEE Intl. Conf. on Computer Vision, ICCV, 2023, pp. 10913-10922, http://dx.doi.org/10.1109/ICCV51070.2023.01005.

[32]

Y. Xu, W. Wan, J. Zhang, H. Liu, Z. Shan, H. Shen, R. Wang, H. Geng, Y. Weng, J. Chen, T. Liu, L. Yi, H. Wang, UniDexGrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 4737-4746, http://dx.doi.org/10.1109/CVPR52729.2023.00459.

[33]

T. Zhu, R. Wu, X. Lin, Y. Sun, Toward human-like grasp: Dexterous grasping via semantic representation of object-hand, in: Proc. of the IEEE Intl. Conf. on Computer Vision, ICCV, 2021, pp. 15721-15731, http://dx.doi.org/10.1109/ICCV48922.2021.01545.

[34]

T. Zhu, R. Wu, J. Hang, X. Lin, Y. Sun, Toward human-like grasp: Functional grasp by dexterous robotic hand via object-hand semantic represen-tation, IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 45 (10) (2023) 12521-12534, http://dx.doi.org/10.1109/TPAMI.2023.3272571.

[35]

S. Cui, R. Wang, J. Wei, F. Li, S. Wang, Grasp state assessment of deformable objects using visual-tactile fusion perception, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation, ICRA, 2020, pp. 538-544, http://dx.doi.org/10.1109/ICRA40945.2020.9196787.

[36]

R. Gao, Z. Si, Y.-Y. Chang, S. Clarke, J. Bohg, L. Fei-Fei, W. Yuan, J. Wu, ObjectFolder 2.0: A multisensory object dataset for Sim2Real transfer, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 10588-10598, http://dx.doi.org/10.1109/CVPR52688.2022.01034.

[37]

M.A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, J. Bohg, Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation, ICRA, 2019, pp. 8943-8950, http://dx.doi.org/10.1109/ICRA.2019.8793485.

[38]

M.A. Lee, Y. Zhu, P. Zachares, M. Tan, K. Srinivasan, S. Savarese, L. Fei-Fei, A. Garg, J. Bohg, Making sense of vision and touch: Learning multimodal representations for contact-rich tasks, IEEE Trans. Robot. (TRO) 36 (3)(2020) 582-596, http://dx.doi.org/10.1109/TRO.2019.2959445.

[39]

H. Liang, L. Cong, N. Hendrich, S. Li, F. Sun, J. Zhang, Multifingered grasping based on multimodal reinforcement learning, IEEE Robot. Autom. Lett. (RA-L) 7 (2) (2022) 1174-1181, http://dx.doi.org/10.1109/LRA.2021.3138545.

[40]

F. Ceola, E. Maiettini, L. Rosasco, L. Natale, A grasp pose is all you need: Learning multi-fingered grasping with deep reinforcement learning from vision and touch, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, 2023, pp. 2985-2992, http://dx.doi.org/10.1109/IROS55552.2023.10341776.

[41]

H. Li, Y. Zhang, J. Zhu, S. Wang, M.A. Lee, H. Xu, E. Adelson, L. Fei-Fei, R. Gao, J. Wu, See, hear, and feel: Smart sensory fusion for robotic manipulation,in:Proc. of the Conf. on Robot Learning (CoRL), 2023, pp. 1368-1378, https://proceedings.mlr.press/v205/li23c.html.

[42]

S. Kadalagere Sampath, N. Wang, H. Wu, C. Yang, Review on human-like robot manipulation using dexterous hands, Cogn. Comput. Syst. 5 (1)(2023) 14-29, http://dx.doi.org/10.1049/ccs2.12073.

[43]

Y. Liu, Y.-Y. Tsai, B. Huang, J. Guo, Virtual reality based tactile sensing enhancements for bilateral teleoperation system with in-hand manipu-lation, IEEE Robot. Autom. Lett. (RA-L) 7 (3) (2022) 6998-7005, http://dx.doi.org/10.1109/LRA.2022.3161711.

[44]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770-778, http://dx.doi.org/10.1109/CVPR.2016.90.

[45]

Z. Fu, T.Z. Zhao, C. Finn, Mobile ALOHA: Learning bimanual mobile manip-ulation with low-cost whole-body teleoperation,in:Proc. of the Conf. on Robot Learning (CoRL), 2024, in press, https://mobile-aloha.github.io/.

[46]

G. Bertasius, H. Wang, L. Torresani, Is space-time attention all you need for video understanding? in:Proc. of the Intl. Conf. on Machine Learning, 2021, pp. 813-824, https://proceedings.mlr.press/v139/bertasius21a.html.

AI Summary AI Mindmap
PDF (5396KB)

1605

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/