Humanoid dexterous hands from structure to gesture semantics for enhanced human-robot interaction: A review

Xin Li; Wenfu Xu; Zaiqiao Ye; Han Yuan

doi:10.1016/j.birob.2025.100258

Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (4) :100258 DOI: 10.1016/j.birob.2025.100258

Review

research-article

Humanoid dexterous hands from structure to gesture semantics for enhanced human-robot interaction: A review

Xin Li ^a^,^b
, Wenfu Xu ^a^,¹
, Zaiqiao Ye ^b
, Han Yuan ^a^,^*

Author information +

History +

PDF (3124KB)

Abstract

As human–robot interaction (HRI) technology advances, dexterous robotic hands are playing a dual role—serving both as tools for manipulation and as channels for non-verbal communication. While much of the existing research emphasizes improving grasping and structural dexterity, the semantic dimension of gestures and its impact on user experience has been relatively overlooked. Studies from HRI and cognitive psychology consistently show that the naturalness and cognitive empathy of gestures significantly influence user trust, satisfaction, and engagement. This shift reflects a broader transition from mechanically driven designs toward cognitively empathic interactions — robots’ ability to infer human affect, intent, and social context to generate appropriate nonverbal responses. In this paper, we argue that large language models (LLMs) enable a paradigm shift in gesture control — from rule-based execution to semantic-driven, context-aware generation. By leveraging LLMs and visual-language models, robots can interpret environmental and social cues, dynamically map emotions, and generate gestures aligned with human communication norms. We conducted a comprehensive review of research in dexterous hand mechanics, gesture semantics, and user experience evaluation, integrating insights from linguistics and cognitive science. Furthermore, we propose a closed-loop framework — “perception–cognition–generation–assessment” — to guide gesture design through iterative, multimodal feedback. This framework lays the conceptual foundation for building universal, adaptive, and emotionally intelligent gesture systems in future human–robot interaction.

Keywords

Human–robot interaction (HRI) / Dexterous hand / Large language models / Gesture / Communication

Cite this article

Download citation ▾

Xin Li, Wenfu Xu, Zaiqiao Ye, Han Yuan. Humanoid dexterous hands from structure to gesture semantics for enhanced human-robot interaction: A review. Biomimetic Intelligence and Robotics, 2025, 5(4): 100258 DOI:10.1016/j.birob.2025.100258

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Z. Xia, Z. Deng, B. Fang, Y. Yang, F. Sun, A review on sensory perception for dexterous robotic manipulation, Int. J. Adv. Robot. Syst., 19 (2) (2022) http://dx.doi.org/10.1177/17298806221095974.

[2]	Kelsey Tonner, What to do with your hands while speaking? Hand gestures!, 2017.

[3]	Cohen Doron, Geoffrey Beattie, Heather Shovelton, Tracking the distribution of individual semantic features in gesture across spoken discourse: new perspectives in multi-modal interaction, Semiotica, 2011 (185) (2011).

[4]	J. Li, M. Chignell, Sachi Mizobuchi, Michiaki Yasumura, Emotions and messages in simple robot gestures, Lecture notes in computer science, 2009, pp. 331-340.

[5]	H. Duan, P. Wang, Y. Li, D. Li, W. Wei, Learning human-to-robot dexterous handovers for anthropomorphic hand, IEEE Trans. Cogn. Dev. Syst. 15 (3) (2023) 1224-1238.

[6]

Xiang Pan, Malcolm Doering, Takayuki Kanda, What is your other hand doing, robot? A model of behavior for shopkeeper robot’s idle hand, Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, HRI’24, Association for Computing Machinery, New York, NY, USA, 2024, pp. 552-560.

[7]	Y. Huang, D. Fan, H. Duan, et al., Human-like dexterous manipulation for anthropomorphic five-fingered hands: A review, Biomim. Intell. Robot. (2025) 100212.

[8]	Shadow dexterous hand e1 series, 2013, http://www.shadowrobot.com/wpcontent/uploads/shadowdexteroushandtechnicalspecificationE120130101.pdf/.

[9]	H. Liu, K. Wu, P. Meusel, et al., Multisensory five-finger dexterous hand: The DLR/HIT hand II, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2008, pp. 3692-3697.

[10]	Park, integrated linkage-driven dexterous anthropomorphic robotic hand, Nat. Commun. 12 (1) (2021) 1-13.

[11]	Y. Liu, Z. Li, H. Liu, Z. Kan, B. Xu, Bioinspired embodiment for intelligent sensing and dexterity in fine manipulation: a survey, IEEE Trans. Ind. Inf. 16 (7) (2020) 4308-4321.

[12]	A.R. Sobinov, S.J. Bensmaia, The neural mechanisms of manual dexterity, Nature Rev. Neurosci. 22 (12) (2021) 741-757.

[13]	L. Biagiotti, F. Lotti, C. Melchiorri, G. Vassura, How Far Is the Human Hand? a Review on Anthropomorphic Robotic End-Effectors: Tech. Rep., University of Bologna, 2004.

[14]	A.M. Dollar, R.D. Howe, The highly adaptive SDM hand: design and performance evaluation, Int. J. Robot. Res. 29 (5) (2010) 585-597.

[15]	F. Ficuciello, D. Zaccara, B. Siciliano, Learning grasps in a synergybased framework, International Symposium on Experimental Robotics, Springer, 2016, pp. 125-135.

[16]	H. Yousef, M. Boukallel, K. Althoefer, Tactile sensing for dexterous in - hand manipulation in roboticsa review, Sensors Actuators A - Phys. 167 (2) (2011) 171-187.

[17]	Junchang Yang, J. Mun, S. Kwon, et al., Electronic skin: Recent progress and future prospects for skin - attachable devices for health monitoring, robotics, and prosthetics, Adv. Mater. 31 (48) (2019) 1904765.

[18]	Li Shuo, Bai Hedan, R.F. Shepherd, et al., Bio - inspired design and additive manufacturing of soft materials, machines, robots, and haptic interfaces, Angew. Chem. - Int. Ed. 58 (33) (2019) 11182-11204.

[19]	T. Yamaguchi, T. Kashiwagi, T. Aire, et al., Human - like electronic skin - integrated soft robotic hand, Adv. Intell. Syst. 1 (2) (2019) 1900018.

[20]	Y.W. Chao, W. Yang, Y. Xiang, et al., DexYCB: A benchmark for capturing hand grasping of objects, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9044-9053.

[21]	S. Hampali, M. Rad, M. Oberweger, V. Lepetit, Honnotate: a method for 3d annotation of hand and object poses, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3196-3206.

[22]	S. Brahmbhatt, C. Tang, C.D. Twigg, C.C. Kemp, J. Hays, ContactPose: A dataset of grasps with object contact and hand pose, Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August (2020) 23-28, Proceedings, Part XIII 16, Springer (2020), pp. 361-378.

[23]	C. Yu, P. Wang, Dexterous manipulation for multi-fingered robotic hands with reinforcement learning: A review, Front. Neurorobotics 16 (2022) 861825.

[24]	M. Vecerik, T. Hester, J. Scholz, et al., Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards, 2017, arXiv preprint arXiv:1707.08817.

[25]	P. Sharma, D. Pathak, A. Gupta, Third-person visual imitation learning via decoupled hierarchical controller, Advances in Neural Information Processing Systems, 2019, p. 32.

[26]	F. Liu, Z. Ling, T. Mu, H. Su, State alignment-based imitation learning, in: ICLR, 2020.

[27]	Y.W. Chao, W. Yang, Y. Xiang, P. Molchanov, A. Handa, J. Tremblay, Y.S. Narang, K.Van. Wyk, U. Iqbal, S. Birchfield, et al., Dexycb: A benchmark for capturing hand grasping of objects, in: CVPR, 2021.

[28]	A. Handa, K.Van. Wyk, W. Yang, J. Liang, Y.W. Chao, Q. Wan, S. Birchfield, N. Ratliff, D. Fox, Dexpilot: vision-based teleoperation of dexterous robotic hand-arm system, in: ICRA, 2020.

[29]	D. Antotsiou, G. Garcia-Hernando, T.K. Kim, Task-oriented hand motion retargeting for dexterous manipulation imitation, in: ECCV Workshops, 2018.

[30]	A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, S. Levine, Learning complex dexterous manipulation with deep reinforcement learning and demonstrations, 2018.

[31]	I. Radosavovic, X. Wang, L. Pinto, J. Malik, State-only imitation learning for dexterous manipulation, in: IROS, 2021.

[32]	J. Ho, S. Ermon, Generative adversarial imitation learning, in: NeurIPS, 2016.

[33]	A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, S. Levine, Learning complex dexterous manipulation with deep reinforcement learning and demonstrations, 2017, arXiv.

[34]	I. Radosavovic, X. Wang, L. Pinto, J. Malik, State-only imitation learning for dexterous manipulation, in: IROS, 2021.

[35]	H. Knight, R. Simmons, Laban head-motions convey robot state: A call for robot body language, in: Proc. IEEE Int. Conf. Robot. Automat, 2016, pp. 2881-2888.

[36]	M. Dubois, J.-A. Claret, L. Basañez, G. Venture, Influence of emotional motions in human–robot interactions, in: Proc. 29th IEEE Int. Conf. Robot Hum. Interactive Commun, 2020, pp. 1243-1250.

[37]	A.D. Dragan, S. Bauman, J. Forlizzi, S.S. Srinivasa, Effects of robot motion on human–robot collaboration, in: Proc. 10th Annu. ACM/IEEE Int. Conf. Hum.-Robot Interact, 2015, pp. 51-58.

[38]	A. Zhou, D. Hadfield-Menell, A. Nagabandi, A.D. Dragan, Expressive robot motion timing, in: Proc. ACM/IEEE Int. Conf. Hum.-Robot Interact, 2018, pp. 22-31.

[39]	A. Zinina, L. Zaidelman, N. Arinkin, A. Kotov, Non-verbal behavior of the robot companion: A contribution to the likeability, Procedia Comput. Sci. 169 (2020) 800-806.

[40]	M. Salem, K. Rohlfing, S. Kopp, F. Joublin, A friendly gesture: investigating the effect of multimodal robot behavior in human–robot interaction, 2011 RO-MAN, Atlanta, GA, USA, 2011, pp. 247-252, http://dx.doi.org/10.1109/ROMAN.2011.6005285.

[41]	J. Xu, J. Broekens, K. Hindriks, M.A. Neerincx, Effects of a robotic storyteller’s moody gestures on storytelling perception, 2015 International Conference on Affective Computing and Intelligent Interaction, (ACII), Xi’an, China, 2015, pp. 449-455, http://dx.doi.org/10.1109/ACII.2015.7344609.

[42]	A. Esposito, J. Vassallo, A.M. Esposito, N. Bourbakis, On the amount of semantic information conveyed by gestures, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence, ICTAI, Vietri Sul Mare, Italy, 2015, pp. 660-667, http://dx.doi.org/10.1109/ICTAI.2015.100.

[43]	S. Saunderson, G. Nejat, How robots influence humans: a survey of nonverbal communication in social human–robot interaction, Int. J. Soc. Robot. 11 (4) (2019) 575-608.

[44]	U. Zabala, I. Rodriguez, J.M. Martínez-Otzeta, E. Lazkano, Expressing robot personality through talking body language, Appl. Sci. (Basel) 11 (10) (2021) 4639.

[45]	I. Rodriguez, J.M Martínez-Otzeta, I. Irigoien, E. Lazkano, Spontaneous talking gestures using generative adversarial networks, Robot. Auton. Syst. 114 (C) (2019) 57-65, http://dx.doi.org/10.1016/j.robot.2018.11.024, [Online]. Available.

[46]	S. Mirchandani, F. Xia, P. Florence, brian. ichter, D. Driess, M.G. Arenas, K. Rao, D. Sadigh, A. Zeng, Large language models as general pattern machines, in: 7th Annual Conference on Robot Learning, 2023.

[47]	L. Roy, E.A. Croft, A. Ramirez, D. Kulić, GPT-driven gestures: Leveraging large language models to generate expressive robot motion for enhanced human-robot interaction, IEEE Robot. Autom. Lett. 10 (5) (2025) 4172-4179, http://dx.doi.org/10.1109/LRA.2025.3547631.

[48]	M. Marmpena, F. Garcia, A. Lim, et al., Data-driven emotional body language generation for social robotics, 2022, arXiv preprint arXiv:2205.00763.

[49]	P. Ekman, W.V. Friesen, The repertoire of nonverbal behavior: categories, origins, usage, and coding, Semiotica 1 (1) (1969) 49-98.

[50]	S. Gallagher, Empathy and theories of direct perception, The Routledge Handbook of Philosophy of Empathy, Routledge, 2017, pp. 158-168.

[51]	T. Brown, B. Mann, N. Ryder, et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst. 33 (2020) 1877-1901.

[52]	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.

[53]	R. Po, W. Yifan, V. Golyanik, et al., State of the art on diffusion models for visual computing, Computer Graphics Forum, vol. 43, (2024) e15063.

[54]	C. Raffel, N. Shazeer, A. Roberts, et al., Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res. 21 (140) (2020) 1-67.

[55]	Y. Wu, Z. Zhang, J. Chen, et al., Vila-u: a unified foundation model integrating visual understanding and generation, 2024, arXiv preprint arXiv:2409.04429.

[56]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, A survey of large language models, 2024.

[57]	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems, 2017.

[58]	F. Zeng, W. Gan, Y. Wang, N. Liu, P.S. Yu, Large language models for robotics: a survey, 2023, arXiv preprint arXiv:2311.07226.

[59]	C. Zhang, J. Chen, J. Li, Y. Peng, Z. Mao, Large language models for human–robot interaction: a review, Biomim. Intell. Robot. (2023) 100131.

[60]	A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al., Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023, arXiv preprint arXiv:2307.15818.

[61]	A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, In International Conference on Machine Learning, pmlr, 2021, pp. 8748-8763.

[62]	D. Shah, B. Osinski, S. Levine, et al., Lm-nav: Robotic navigation with large pre-trained’ models of language, vision, and action, Conference on Robot Learning, pmlr, 2023, pp. 492-504.

[63]	D. Driess, F. Xia, M.S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al., Palm-e: An embodied multimodal language model, 2023, arXiv preprint arXiv:2303.03378.

[64]	Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, 2017 IEEE International Conference on Robotics and Automation, ICRA, ieee, 2017, pp. 3357-3364.

[65]	M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al., Do as i can, not as i say: Grounding language in robotic affordances, 2022, arXiv preprint arXiv:2204.01691.

[66]	L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., Training language models to follow instructions with human feedback, Adv. Neural Inf. Process. Syst. 35 (2022) 27730-27744.

[67]	M. Crosby, M. Rovatsos, R. Petrick, Automated agent decomposition for classical planning, Proceedings of the International Conference on Automated Planning and Scheduling, vol. 23, 2013, pp. 46-54.

[68]	B. Xu, Z. Peng, B. Lei, S. Mukherjee, Y. Liu, D. Xu, Rewoo: decoupling reasoning from observations for efficient augmented language models, 2023, arXiv preprint arXiv:2305.18323.

[69]	T. Kojima, S.S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, Large language models are zero-shot reasoners, Adv. Neural Inf. Process. Syst. 35 (2022) 22199-22213.

[70]	S.S. Raman, V. Cohen, D. Paulius, I. Idrees, E. Rosen, R. Mooney, S. Tellex, Cape: Corrective actions from precondition errors using large language models, 2022, arXiv preprint arXiv:2211.09935.

[71]	Z. Liu, A. Bahety, S. Song, Reflect: summarizing robot experiences for failure explanation and correction, 2023, arXiv preprint arXiv:2306.15724.

[72]	A. Padalkar, A. Pooley, A. Jain, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Singh, A. Brohan, et al., Open x-embodiment: robotic learning datasets and rt-x models, 2023, arXiv preprint arXiv:2310.08864.

[73]	S. Reed, K. Zolna, E. Parisotto, S.G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J.T. Springenberg, et al., A generalist agent, 2022, arXiv preprint arXiv:2205.06175.

[74]	N.M. Shafiullah, Z. Cui, A.A. Altanzaya, L. Pinto, Behavior transformers: cloning k modes with one stone, Adv. Neural Inf. Process. Syst. 35 (2022) 22955-22968.

[75]	E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. Zhu, et al., Ai2-thor: an interactive 3d environment for visual ai, 2017, arXiv preprint arXiv:1712.05474.

[76]	A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al., Rt-1: Robotics transformer for real-world control at scale, 2022, arXiv preprint arXiv:2212.06817.

[77]	C. Matuszek, E. Herbst, L. Zettlemoyer, D. Fox, Learning to parse natural language commands to a robot control system, Experimental Robotics: The 13th International Symposium on Experimental Robotics, springer, 2013, pp. 403-415.

[78]	D. Chen, R. Mooney, Learning to interpret natural language navigation instructions from observations, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, 2011, pp. 859-865.

[79]	J. Arkin, D. Park, S. Roy, M.R. Walter, N. Roy, T.M. Howard, R. Paul, Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions, Int. J. Robot. Res. 39 (10–11) (2020) 1279-1304.

[80]	A. Bucker, L. Figueredo, S. Haddadin, A. Kapoor, S. Ma, S. Vemprala, R. Bonatti, Latte: language trajectory transformer, 2023 IEEE International Conference on Robotics and Automation, ICRA, ieee, 2023, pp. 7287-7294.

[81]	RealDex: Towards Human-like Grasping for Robotic Dexterous Hand.

[82]	Knowledge Augmentation and Task Planning in Large Language Models for Dexterous Grasping.

[83]	Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation.

[84]	T.B. Brown, Language models are few-shot learners, 2020, arXiv preprint arXiv:2005.14165.

[85]	K. Mahadevan, J. Chien, N. Brown, Z. Xu, C. Parada, F. Xia, A. Zeng, L. Takayama, D. Sadigh, Generative expressive robot behaviors using large language models, Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 482-491.

[86]	Y.-J. Wang, B. Zhang, J. Chen, K. Sreenath, Prompt a robot to walk with large language models, in: Conference on Decision and Control, CDC, 2024.

[87]	Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng, Code as policies: language model programs for embodied control, 2023 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2023, pp. 9493-9500.

[88]

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg, Progprompt: generating situated robot task plans using large language models, 2023 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2023, pp. 11523-11530.

[89]	Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar, Voyager: an open-ended embodied agent with large language models, 2023, arXiv preprint arXiv:2305.16291.

[90]	Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, Denny Zhou, et al., Chain-of-thought prompting elicitsreasoning in large language models, Large Language Models, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 24824-24837.

[91]	Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh, Towards grounded social reasoning, 2023, arXiv preprint arXiv:2306.08651.

[92]	N. Di Palo, E. Johns, Keypoint action tokens enable in-context imitation learning in robotics, Robot.: Sci. Syst. (RSS), 2024 (2024).

[93]	Y.-J. Wang, B. Zhang, J. Chen, K. Sreenath, Prompt a robot to walk with large language models, in: Conference on Decision and Control, CDC, 2024.

[94]	P. Huang, Y. Hu, N. Nechyporenko, et al., EMOTION: Expressive motion sequence generation for humanoid robots with in-context learning, 2024, arXiv preprint arXiv:2410.23234.

[95]	C. Wang, S. Hasler, D. Tanneberg, LaMI: Large language models for multi-modal human–robot interaction, in: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024, pp. 1-10.

[96]	X. Zhan, L. Yang, Y. Zhao, Oakink2: A dataset of bimanual hands-object manipulation in complex task completion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 445-456.

[97]	K. Chu, X. Zhao, C. Weber, et al., Large language models for orchestrating bimanual robots, 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), IEEE, 2024, pp. 328-334.

[98]	Q. Feng, D.S.M. Lema, M. Malmir, et al., Dexgangrasp: Dexterous generative adversarial grasping synthesis for task-oriented manipulation, 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), IEEE, 2024, pp. 918-925.

[99]	Y. Zhong, Q. Jiang, J. Yu, et al., DexGrasp anything: Towards universal robotic dexterous grasping with physics awareness, 2025, arXiv preprint arXiv:2503.08257.

[100]

H. Li, W. Mao, W. Deng, et al., Multi-graspllm: A multimodal LLM for multi-hand semantic guided grasp generation, 2024, arXiv preprint arXiv:2412.08468.

[101]

Y.L. Wei, J.J. Jiang, C. Xing, et al., Grasp as you say: Language-guided dexterous grasp generation, 2024, arXiv preprint arXiv:2405.19291.

[102]

K.F. Gbagbe, M.A. Cabrera, A. Alabbas, et al., Bi-vla: Vision-language-action model-based system for bimanual robotic dexterous manipulations, 2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC, IEEE, 2024, pp. 2864-2869.

[103]

T. Wu, S. Li, C. Lyu, et al., MoDex: Planning high-dimensional dexterous control via learning neural hand models, 2024, arXiv preprint arXiv:2409.10983.

[104]

Z. Liang, Y. Mu, Y. Wang, et al., DexHandDiff: Interaction-aware diffusion planning for adaptive dexterous manipulation, Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 1745-1755.

[105]

C. Zhang, J. Chen, J. Li, et al., Large language models for human–robot interaction: A review, Biomim. Intell. Robot. 3 (4) (2023) 100131.

[106]

H. Xu, G. Ghosh, P.-Y. Huang, D. Okhonko, A. Aghajanyan, F. Metze, L. Zettlemoyer, C. Feichtenhofer, Videoclip: Contrastive pre-training for zero-shot video-text understanding, 2021, arXiv preprint arXiv:2109.14084.

[107]

B. Jiang, X. Chen, W. Liu, J. Yu, G. Yu, T. Chen, MotionGPT: Human motion as a foreign language, 2023, arXiv preprint arXiv:2306.14795.

[108]

E. Nichols, L. Gao, R. Gomez, Collaborative storytelling with large-scale neural language models, in: Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games, 2020, pp. 1-10.

[109]

X. Li, H. Zhong, B. Zhang, J. Zhang, A general Chinese chatbot based on deep learning and its-application for children with ASD, Int. J. Mach. Learn. Comput. 10 (4) (2020) 519-526.

[110]

E. Schwitzgebel, D. Schwitzgebel, A. Strasser, Creating a large language model of a philosopher, 2023, arXiv preprint arXiv:2302.01339.

[111]

I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, A. Garg, Progprompt: generating situated robot task plans using large language models, 2023 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2023, pp. 11523-11530.

[112]

L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, et al., Ethical and social risks of harm from language models, 2021, 2021, arXiv preprint arXiv:2112.04359.

[113]

P. Tsarouchi, S. Makris, G. Chryssolouris, Human–robot interaction review and challenges on task planning and programming, Int. J. Comput. Integ. Manuf. 29 (8) (2016) 916-931.

[114]

S. Shahriar, K. Hayawi, Let’s have a chat! a conversation with chatgpt: Technology, applications, and limitations, 2023, arXiv preprint arXiv:2302.13817.

[115]

C.H. Song, J. Wu, C. Washington, B.M. Sadler, W.-L. Chao, Y. Su, Llmplanner: fresh-shot grounded planning for embodied agents with large language models, 2022, arXiv. arXiv preprint arXiv:2212.04088.

[116]

R.M. Krauss, Y. Chen, R.F. Gottesman, Lexical gestures and lexical access: a process model, McNeill D. (Ed.), Language and Gesture (261–283), Cambridge University Press, Britain, (2000).

[117]

Y.U. Wenhua, L.U. Zhongyi, A new perspective on the cognitive function of gestures: The spatializing gesture hypothesis, Adv. Psychol. Sci. 28 (3) (2020) 426-433.

[118]

G. Buccino, F. Binkofski, G.R. Fink, et al., Action observation activates premotor and parietal areas in a somatotopic manner: an fMRI study, Eur. J. Neurosci. 13 (2) (2001) 400-404.

[119]

M.A. Arbib, From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics, Behav. Brain Sci. 28 (2) (2005) 105-124.

[120]

G. Rizzolatti, R. Camarda, M. Fogassi, M. Gentilucci, G. Luppino, M. Matelli, Functional organization of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal movements, Exp. Brain Res. 71 (3) (1988) 491-507.

[121]

V. Raos, M.-A. Umilta, A. Murata, L. Fogassi, V. Gallese, Functional properties of grasping-related neurons in the ventral premotor area F5 of the macaque monkey, J. Neurophysiol. 95 (2) (2006) 709-729.

[122]

M.A. Umilta, T. Brochier, R.L. Spinks, R.N. Lemon, Simultaneous recording of macaque premotor and primary motor cortex neuronal populations reveals different functional contributions to visuomotor grasp, J. Neurophysiol. (2007).

[123]

J. Gong, L.G. Foo, Y. He, et al., Llms are good sign language translators, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18362-18372.