Large language model-based task planning for service robots: A review

Shaohan Bian; Ying Zhang; Guohui Tian; Zhiqiang Miao; Edmond Q. Wu; Simon X. Yang; Changchun Hua

doi:10.1016/j.birob.2026.100274

Biomimetic Intelligence and Robotics ›› 2026, Vol. 6 ›› Issue (1) :100274 DOI: 10.1016/j.birob.2026.100274

Review

research-article

Large language model-based task planning for service robots: A review

Author information +

History +

PDF

Abstract

With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core—“brain”—of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.

Keywords

Large language model / Service robot / Task planning / Review

Cite this article

Download citation ▾

Shaohan Bian, Ying Zhang, Guohui Tian, Zhiqiang Miao, Edmond Q. Wu, Simon X. Yang, Changchun Hua. Large language model-based task planning for service robots: A review. Biomimetic Intelligence and Robotics, 2026, 6(1): 100274 DOI:10.1016/j.birob.2026.100274

登录浏览全文

4963

注册一个新账户忘记密码

CRediT authorship contribution statement

Shaohan Bian: Writing – review & editing, Writing – original draft, Investigation, Formal analysis. Ying Zhang: Writing – review & editing, Writing – original draft, Supervision, Investigation. Guohui Tian: Writing – review & editing, Investigation. Zhiqiang Miao: Writing – review & editing, Writing – original draft, Investigation. Edmond Q. Wu: Writing – review & editing, Investigation. Simon X. Yang: Writing – review & editing, Investigation. Changchun Hua: Writing – review & editing, Supervision, Investigation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (62203378 and 62203377), in part by the Hebei Natural Science Foundation (F2024203036), in part by the Beijing-Tianjin-Hebei Basic Research Cooperation Project of Hebei Natural Science Foundation (F2024203115), in part by the Science Research Project of Hebei Education Department (BJK2024195), and in part by the S&T Program of Hebei (236Z2002G and 236Z1603G).

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Dominik Bauer, Peter Hönig, Jean-Baptiste Weibel, José García-Rodríguez, Markus Vincze, et al., Challenges for monocular 6d object pose estimation in robotics, IEEE Trans. Robot., (2024).

[2]	Ying Zhang, Guohui Tian, Cui-Hua Zhang, Changchun Hua, Weili Ding, Choon Ki Ahn, Environment modeling for service robots from a task execution perspective, IEEE/CAA J. Autom. Sin., (2025).

[3]	Stefano Dafarra, Ugo Pattacini, Giulio Romualdi, Lorenzo Rapetti, Riccardo Grieco, Kourosh Darvish, Gianluca Milani, Enrico Valli, Ines Sorrentino, Paolo Maria Viceconte, et al., icub3 avatar system: Enabling remote fully immersive embodiment of humanoid robots, Sci. Robot. 9 (86) (2024) eadh3834.

[4]	Rhys Newbury, Morris Gu, Lachlan Chumbley, Arsalan Mousavian, Clemens Eppner, Jürgen Leitner, Jeannette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic, et al., Deep learning approaches to grasp synthesis: A review, IEEE Trans. Robot. 39 (5) (2023) 3994-4015.

[5]	Kechun Xu, Zhongxiang Zhou, Jun Wu, Haojian Lu, Rong Xiong, Yue Wang, Grasp, see and place: Efficient unknown object rearrangement with policy structure prior, IEEE Trans. Robot., (2024).

[6]	Ying Zhang, Guohui Tian, Xuyang Shao, Mengyang Zhang, Shaopeng Liu, Semantic grounding for long-term autonomy of mobile robots toward dynamic object search in home environments, IEEE Trans. Ind. Electron. 70 (2) (2022) 1655-1665.

[7]	Manisha Natarajan, Matthew Gombolay, Trust and dependence on robotic decision support, IEEE Trans. Robot., (2024).

[8]	Jian Li, Yadong Mo, Shijie Jiang, Lifang Ma, Ying Zhang, Shimin Wei, Bathing assistive devices and robots for the elderly, Biomim. Intell. Robot., (2025), 100218.

[9]	Qincheng Sheng, Zhongxiang Zhou, Jinhao Li, Xiangyu Mi, Pingyu Xiang, Zhenghan Chen, Haocheng Xu, Shenhan Jia, Xiyang Wu, Yuxiang Cui, et al., A comprehensive review of humanoid robots, SmartBot 1 (1) (2025) e12008.

[10]	Ying Zhang, Guohui Tian, Huanzhao Chen, Exploring the cognitive process for service task in smart home: A robot service mechanism, Future Gener. Comput. Syst. 102 (2020) 588-602.

[11]	Chen Wang, Danfei Xu, Li Fei-Fei, Generalizable task planning through representation pretraining, IEEE Robot. Autom. Lett. 7 (3) (2022) 8299-8306.

[12]	Deshuai Zheng, Jin Yan, Tao Xue, Yong Liu, A knowledge-based task planning approach for robot multi-task manipulation, Complex Intell. Syst. 10 (1) (2024) 193-206.

[13]	Simon Odense, Kamal Gupta, William G. Macready, Neural-guided runtime prediction of planners for improved motion and task planning with graph neural networks, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2022, pp. 12471-12478.

[14]	Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, Christian Muise, Ronald Brachman, Francesca Rossi, Peter Stone, An Introduction to the Planning Domain Definition Language, vol. 13, Springer, (2019).

[15]	Ying Zhang, Guohui Tian, Xuyang Shao, Shaopeng Liu, Mengyang Zhang, Peng Duan, Building metric-topological map to efficient object search for mobile robot, IEEE Trans. Ind. Electron. 69 (7) (2022) 7076-7087.

[16]	Wenrui Zhao, Weidong Chen, Hierarchical POMDP planning for object manipulation in clutter, Robot. Auton. Syst. 139 (2021) 103736.

[17]	Shaopeng Liu, Guohui Tian, Ying Zhang, Mengyang Zhang, Shuo Liu, Service planning oriented efficient object search: A knowledge-based framework for home service robot, Expert Syst. Appl. 187 (2022) 115853.

[18]	Ying Zhang, Guohui Tian, Jiaxing Lu, Mengyang Zhang, Senyan Zhang, Efficient dynamic object search in home environment by mobile robot: A priori knowledge-based approach, IEEE Trans. Veh. Technol. 68 (10) (2019) 9466-9477.

[19]	Xin Li, Wenfu Xu, Zaiqiao Ye, Han Yuan, Humanoid dexterous hands from structure to gesture semantics for enhanced human–robot interaction: A review, Biomim. Intell. Robot., (2025), 100258.

[20]	Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, et al., Foundation models in robotics: Applications, challenges, and the future, Int. J. Robot. Res. 44 (5) (2025) 701-739.

[21]	Chen Yifan, Mingjie Wei, Xuesong Wang, Yuanxing Liu, Jizhe Wang, Hao Song, Longxuan Ma, Donglin Di, Churui Sun, Kaifeng Liu, et al., Embodied AI: A survey on the evolution from perceptive to behavioral intelligence, SmartBot, (2025), Article e70003.

[22]	Shilong Sun, Chiyao Li, Zida Zhao, Haodong Huang, Wenfu Xu, Leveraging large language models for comprehensive locomotion control in humanoid robots design, Biomim. Intell. Robot. 4 (4) (2024) 100187.

[23]	Yeseung Kim, Dohyun Kim, Jieun Choi, Jisang Park, Nayoung Oh, Daehyung Park, A survey on integration of large language models with intelligent robots, Intell. Serv. Robot. 17 (5) (2024) 1091-1107.

[24]	Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, Philip S Yu, Large language models for robotics: A survey, 2023, arXiv preprint arXiv:2311.07226.

[25]	Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou, Large language models for multi-robot systems: A survey, 2025, arXiv preprint arXiv:2502.03814.

[26]	Yongcheng Cui, Ying Zhang, Cui-Hua Zhang, Simon X. Yang, Task cognition and planning for service robots, Intell. Robot. 5 (1) (2025) 119-142.

[27]	Christoforos Mavrogiannis, Francesca Baldini, Allan Wang, Dapeng Zhao, Pete Trautman, Aaron Steinfeld, Jean Oh, Core challenges of social robot navigation: A survey, ACM Trans. Human-Robot Interact. 12 (3) (2023) 1-39.

[28]	Anbalagan Loganathan, Nur Syazreen Ahmad, A systematic review on recent advances in autonomous mobile robot navigation, Eng. Sci. Technol. an Int. J. 40 (2023) 101343.

[29]	Ceng Zhang, Junxin Chen, Jiatong Li, Yanhong Peng, Zebing Mao, Large language models for human–robot interaction: A review, Biomim. Intell. Robot. 3 (4) (2023) 100131.

[30]	Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, et al., A survey of reasoning with foundation models: Concepts, methodologies, and outlook, ACM Comput. Surv. 57 (11) (2025) 1-43.

[31]	Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Xuanhe Zhou, Yufei Huang, Chaojun Xiao, et al., Tool learning with foundation models, ACM Comput. Surv. 57 (4) (2024) 1-40.

[32]	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst., 30 (2017).

[33]	Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al., Improving language understanding by generative pre-training, (2018).

[34]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171-4186.

[35]	Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst. 33 (2020) 1877-1901.

[36]	Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al., Training language models to follow instructions with human feedback, Adv. Neural Inf. Process. Syst. 35 (2022) 27730-27744.

[37]	Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res. 21 (140) (2020) 1-67.

[38]	Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, Parameter-efficient transfer learning for NLP, International Conference on Machine Learning, PMLR, 2019, pp. 2790-2799.

[39]	Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J Sutherland, Ali Kemal Sinop, Exphormer: Sparse transformers for graphs, International Conference on Machine Learning, PMLR, 2023, pp. 31613-31632.

[40]	Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al., Glam: Efficient scaling of language models with mixture-of-experts, International Conference on Machine Learning, PMLR, 2022, pp. 5547-5569.

[41]	William Fedus, Barret Zoph, Noam Shazeer, Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity, J. Mach. Learn. Res. 23 (120) (2022) 1-39.

[42]	Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al., Glm-130b: An open bilingual pre-trained model, 2022, arXiv preprint arXiv:2210.02414.

[43]	Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, et al., Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024, arXiv preprint arXiv:2405.04434.

[44]	Ying Zhang, Maoliang Yin, Heyong Wang, Changchun Hua, Cross-level multi-modal features learning with transformer for rgb-d object recognition, IEEE Trans. Circuits Syst. Video Technol. 33 (12) (2023) 7121-7130.

[45]	Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, Omer Levy, Spanbert: Improving pre-training by representing and predicting spans, Trans. Assoc. Comput. Linguist. 8 (2020) 64-77.

[46]	Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon, Unified language model pre-training for natural language understanding and generation, Adv. Neural Inf. Process. Syst., 32 (2019).

[47]	Pan Zhou, Xingyu Xie, Zhouchen Lin, Shuicheng Yan, Towards understanding convergence and generalization of adamw, IEEE Trans. Pattern Anal. Mach. Intell., (2024).

[48]	Noam Shazeer, Mitchell Stern, Adafactor: Adaptive learning rates with sublinear memory cost, International Conference on Machine Learning, PMLR, 2018, pp. 4596-4604.

[49]	Kai Lv, Yuqing Yang, Tengxiao Liu, Qipeng Guo, Xipeng Qiu, Full parameter fine-tuning for large language models with limited resources, in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 8187-8198.

[50]	Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin A Raffel, Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, Adv. Neural Inf. Process. Syst. 35 (2022) 1950-1965.

[51]	Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al., Lora: Low-rank adaptation of large language models, ICLR 1 (2) (2022) 3.

[52]	Bolin Zhang, Jiahao Wang, Qianlong Du, Jiajun Zhang, Zhiying Tu, Dianhui Chu, A survey on data selection for llm instruction tuning, J. Artificial Intelligence Res., 83 (2025).

[53]	Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu, Understanding the Effects of RLHF on LLM Generalisation and Diversity, in: The Twelfth International Conference on Learning Representations.

[54]	Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi, Reframing instructional prompts to gptk’s language, 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Association for Computational Linguistics (ACL), 2022, pp. 589-612.

[55]	Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa, Large language models are zero-shot reasoners, Adv. Neural Inf. Process. Syst. 35 (2022) 22199-22213.

[56]	Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al., Chain-of-thought prompting elicits reasoning in large language models, Adv. Neural Inf. Process. Syst. 35 (2022) 24824-24837.

[57]	Kevin Yang, Yuandong Tian, Nanyun Peng, Dan Klein, Re3: Generating Longer Stories With Recursive Reprompting and Revision, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 4393-4479.

[58]	Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B Tenenbaum, Leslie Kaelbling, Michael Katz, Generalized planning in pddl domains with pretrained large language models, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 2024, pp. 20256-20264.

[59]	Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, et al., Multi-head rag: Solving multi-aspect problems with LLMs, 2024, arXiv preprint arXiv:2406.05085.

[60]	Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong Park, Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity, NAACL-HLT, (2024).

[61]

Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki, Blended rag: Improving rag (retriever-augmented generation) accuracy with semantic search and hybrid query-based retrievers, 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval, MIPR, IEEE, 2024, pp. 155-161.

[62]	Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi, Self-rag: Learning to retrieve, generate , and critique through self-reflection, in: The Twelfth International Conference on Learning Representations, 2023.

[63]

Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang, Im-rag: Multi-round retrieval-augmented generation through learning inner monologues, in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 730-740.

[64]	Jin Liu, Ying Zhang, Shaohan Bian, Renjie Song, Cui-Hua Zhang, Yanqiao Wei, Changchun Hua, P-RT-BFMT: A prediction-based real-time bidirectional fast marching tree for robot motion planning in dynamic environments, IEEE Trans. Ind. Electron., (2025).

[65]

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg, Progprompt: Generating situated robot task plans using large language models, 2023 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2023, pp. 11523-11530.

[66]	Xiaodong Li, Guohui Tian, Yongcheng Cui, Fine-grained task planning for service robots based on object ontology knowledge via large language models, IEEE Robot. Autom. Lett., (2024).

[67]	Yan Ding, Xiaohan Zhang, Chris Paxton, Shiqi Zhang, Task and motion planning with large language models for object rearrangement, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2023, pp. 2086-2092.

[68]

Allen Z Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Pen Xu, Leila Takayama Takayama, Fei Xia, Jake Varley, et al., Robots that ask for help: Uncertainty alignment for large language model planners, Conference on Robot Learning, CoRL, Proceedings of the Conference on Robot Learning (CoRL), (2023).

[69]	Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello, Delta: Decomposed efficient long-term robot task planning using large language models, 2025 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2025, pp. 10995-11001.

[70]	Shyam Sundar Kannan, Vishnunandan LN Venkatesh, Byung-Cheol Min, Smart-llm: Smart multi-agent robot task planning using large language models, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2024, pp. 12140-12147.

[71]	Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone, Llm+ p: Empowering large language models with optimal planning proficiency, 2023, arXiv preprint arXiv:2304.11477.

[72]	Tomoya Kawabe, Tatsushi Nishi, Ziang Liu, Tomofumi Fujiwara, Task planning for robot manipulator using natural language task input with large language models, 2024 IEEE 20th International Conference on Automation Science and Engineering, CASE, IEEE, 2024, pp. 3484-3489.

[73]	Chang Chen, Hany Hamed, Doojin Baek, Taegu Kang, Yoshua Bengio, Sungjin Ahn, Extendable long-horizon planning via hierarchical multiscale diffusion, 2025, arXiv preprint arXiv:2503.20102.

[74]	Lutfi Eren Erdogan, Hiroki Furuta, Sehoon Kim, Nicholas Lee, Suhong Moon, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks, in: Forty-Second International Conference on Machine Learning.

[75]	Hong Cao, Rong Ma, Yanlong Zhai, Jun Shen, LLM-collab: a framework for enhancing task planning via chain-of-thought and multi-agent collaboration, Appl. Comput. Intell. 4 (2) (2024) 328-348.

[76]	Yue Zhen, Sheng Bi, Lu Xing-tong, Pan Wei-qin, Shi Hai-peng, Chen Zi-rui, Fang Yi-shu, Robot task planning based on large language model representing knowledge with directed graph structures, 2023, arXiv preprint arXiv:2306.05171.

[77]	Jeongeun Park, Seungwon Lim, Joonhyung Lee, Sangbeom Park, Minsuk Chang, Youngjae Yu, Sungjoon Choi, Clara: classifying and disambiguating user commands for reliable interactive robotic agents, IEEE Robot. Autom. Lett. 9 (2) (2023) 1059-1066.

[78]	Hyobin Ong, Youngwoo Yoon, Jaewoo Choi, Minsu Jang, A simple baseline for uncertainty-aware language-oriented task planner for embodied agents, 2024 21st International Conference on Ubiquitous Robots, UR, IEEE, 2024, pp. 677-682.

[79]	Ruoyu Wang, Zhipeng Yang, Zinan Zhao, Xinyan Tong, Zhi Hong, Kun Qian, LLM-based robot task planning with exceptional handling for general purpose service robots, 2024 43rd Chinese Control Conference, CCC, IEEE, 2024, pp. 4439-4444.

[80]	Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang, Integrating action knowledge and LLMs for task planning and situation handling in open worlds, Auton. Robots 47 (8) (2023) 981-997.

[81]	Ying Zhang, Guohui Tian, Xuyang Shao, Safe and efficient robot manipulation: Task-oriented environment modeling and object pose estimation, IEEE Trans. Instrum. Meas. 70 (2021) 1-12.

[82]	Zonghao Mu, Wenyu Zhao, Yue Yin, Xiangming Xi, Wei Song, Jianjun Gu, Shiqiang Zhu, KGGPT: empowering robots with openai’s ChatGPT and knowledge graph, International Conference on Intelligent Robotics and Applications, Springer, 2023, pp. 340-351.

[83]	Ying Zhang, Maoliang Yin, Wenfu Bi, Haibao Yan, Shaohan Bian, Cui-Hua Zhang, Changchun Hua, ZISVFM: Zero-shot object instance segmentation in indoor robotic environments with vision foundation models, IEEE Trans. Robot. 41 (2025) 1568-1580.

[84]	Yinchuan Wang, Bin Ren, Xiang Zhang, Pengyu Wang, Chaoqun Wang, Rui Song, Yibin Li, Max Q-H Meng, ROLO-SLAM: rotation-optimized LiDAR-only SLAM in uneven terrain with ground vehicle, J. Field Robot. 42 (3) (2025) 880-902.

[85]	Guoqin Tang, Qingxuan Jia, Zeyuan Huang, Gang Chen, Ning Ji, Zhipeng Yao, 3D-grounded vision-language framework for robotic task planning: Automated prompt synthesis and supervised reasoning, 2025, arXiv preprint arXiv:2502.08903.

[86]	Zhe Ni, Xiaoxin Deng, Cong Tai, Xinyue Zhu, Qinghongbing Xie, Weihang Huang, Xiang Wu, Long Zeng, Grid: Scene-graph-based instruction-driven robotic task planning, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2024, pp. 13765-13772.

[87]	Daniel Ekpo, Mara Levy, Saksham Suri, Chuong Huynh, Abhinav Shrivastava, VeriGraph: Scene graphs for execution verifiable robot planning, 2024, arXiv preprint arXiv:2411.10446.

[88]	Aoran Mei, Guo-Niu Zhu, Huaxiang Zhang, Zhongxue Gan, ReplanVLM: Replanning robotic tasks with visual language models, IEEE Robot. Autom. Lett., (2024).

[89]	Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan, Embodied task planning with large language models, 2023, arXiv preprint arXiv:2307.01848.

[90]	Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard, Visual language maps for robot navigation, 2023 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2023, pp. 10608-10615.

[91]

Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Qingqing Chen, Fang Wang, Hongjie Hu, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Mutual information-based graph co-attention networks for multimodal prior-guided magnetic resonance imaging segmentation, IEEE Trans. Circuits Syst. Video Technol. 32 (5) (2021) 2512-2526.

[92]	An-An Liu, Hongshuo Tian, Ning Xu, Weizhi Nie, Yongdong Zhang, Mohan Kankanhalli, Toward region-aware attention learning for scene graph generation, IEEE Trans. Neural Netw. Learn. Syst. 33 (12) (2021) 7655-7666.

[93]	Mengyang Zhang, Guohui Tian, Ying Zhang, Hong Liu, Sequential learning for ingredient recognition from images, IEEE Trans. Circuits Syst. Video Technol. 33 (5) (2022) 2162-2175.

[94]	Jiayuan Xie, Wenhao Fang, Yi Cai, Qingbao Huang, Qing Li, Knowledge-based visual question generation, IEEE Trans. Circuits Syst. Video Technol. 32 (11) (2022) 7547-7558.

[95]	Yu Hao, Fan Yang, Nicholas Fang, Yu-Shen Liu, EMBOSR: Embodied spatial reasoning for enhanced situated question answering in 3D scenes, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2024, pp. 9811-9816.

[96]	An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu, Spatialrgpt: Grounded spatial reasoning in vision-language models, Adv. Neural Inf. Process. Syst. 37 (2024) 135062-135093.

[97]

Keisuke Shirai, Cristian C Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori, Vision-language interpreter for robot task planning, 2024 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2024, pp. 2051-2058.

[98]	Zhirong Luan, Yujun Lai, Rundong Huang, Shuanghao Bai, Yuedi Zhang, Haoran Zhang, Qian Wang, Enhancing robot task planning and execution through multi-layer large language models, Sensors 24 (5) (2024) 1687.

[99]	Jiayang Huang, Christian Limberg, Syed Muhammad Nashit Arshad, Qifeng Zhang, Qiang Li, Combining VLM and LLM for enhanced semantic object perception in robotic handover tasks, 2024 WRC Symposium on Advanced Robotics and Automation, WRC SARA, IEEE, 2024, pp. 135-140.

[100]

Jiaqi Zhang, Zinan Wang, Jiaxin Lai, Hongfei Wang, GPTArm: An autonomous task planning manipulator grasping system based on vision–language models, Machines 13 (3) (2025) 247.

[101]

Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi, Gpt-4v (ision) for robotics: Multimodal task planning from human demonstration, IEEE Robot. Autom. Lett., (2024).

[102]

Sichao Liu, Jianjing Zhang, Robert X Gao, Xi Vincent Wang, Lihui Wang, Vision-language model-driven scene understanding and robotic object manipulation, 2024 IEEE 20th International Conference on Automation Science and Engineering, CASE, IEEE, 2024, pp. 21-26.

[103]

Shaopeng Liu, Guohui Tian, Xuyang Shao, Shuo Liu, Behavior cloning-based robot active object detection with automatically generated data and revision method, IEEE Trans. Robot. 39 (1) (2022) 665-680.

[104]

Jingyuan Yang, Xinbo Gao, Leida Li, Xiumei Wang, Jinshan Ding, SOLVER: Scene-object interrelated visual emotion reasoning network, IEEE Trans. Image Process. 30 (2021) 8686-8701.

[105]

Weiwei Gu, Anant Sah, Nakul Gopalan, Interactive visual task learning for robots, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 2024, pp. 10297-10305.

[106]

Ruben Mascaro, Margarita Chli, Scene representations for robotic spatial perception, Annu. Rev. Control. Robot. Auton. Syst., 8 (2024).

[107]

Henry Senior, Gregory Slabaugh, Shanxin Yuan, Luca Rossi, Graph neural networks in vision-language image understanding: a survey, Vis. Comput. 41 (1) (2025) 491-516.

[108]

Ziyuan Jiao, Yida Niu, Zeyu Zhang, Song-Chun Zhu, Yixin Zhu, Hangxin Liu, Sequential manipulation planning on scene graph, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2022, pp. 8203-8210.

[109]

Franklin Kenghagho Kenfack, Feroz Ahmed Siddiky, Ferenc Balint-Benczedi, Michael Beetz, Robotvqa—a scene-graph-and deep-learning-based visual question answering system for robot manipulation, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2020, pp. 9667-9674.

[110]

Shaowei Wang, Lingling Zhang, Longji Zhu, Tao Qin, Kim-Hui Yap, Xinyu Zhang, Jun Liu, Cog-dqa: Chain-of-guiding learning with large language models for diagram question answering, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13969-13979.

[111]

Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, Huaming Wang, Clap learning audio concepts from natural language supervision, ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2023, pp. 1-5.

[112]

Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel, Audioclip: Extending clip to image, text and audio, ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2022, pp. 976-980.

[113]

Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities, in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 6288-6313.

[114]

Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro, Audio Flamingo: a novel audio language model with few-shot learning and dialogue abilities, in: Proceedings of the 41st International Conference on Machine Learning, 2024, pp. 25125-25148.

[115]

Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg, Salm: Speech-augmented language model with in-context learning for speech recognition and translation, ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2024, pp. 13521-13525.

[116]

Paul K Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, et al., Audiopalm: A large language model that can speak and listen, 2023, arXiv preprint arXiv:2306.12925.

[117]

Siyin Wang, Chao-Han Yang, Ji Wu, Chao Zhang, Can whisper perform speech-based in-context learning?, ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2024, pp. 13421-13425.

[118]

Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang, Pengi: An audio language model for audio tasks, Adv. Neural Inf. Process. Syst. 36 (2023) 18090-18108.

[119]

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello, Wav2clip: Learning robust audio representations from clip, ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2022, pp. 4563-4567.

[120]

Yuan Gong, Hongyin Luo, Alexander H Liu, Leonid Karlinsky, James Glass, Listen , Think , and Understand, in: International Conference on Learning Representations, 2024.

[121]

Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Bedi, Aniket Bera, Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks, Trans. Mach. Learn. Res.

[122]

Ying Zhang, Guohui Tian, Xuyang Shao, Jiyu Cheng, Effective safety strategy for mobile robots based on laser-visual fusion in home environments, IEEE Trans. Syst. Man, Cybern.: Syst. 52 (7) (2021) 4138-4150.

[123]

Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al., Palm 2 technical report, 2023, arXiv preprint arXiv:2305.10403.

[124]

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, et al., Audiolm: a language modeling approach to audio generation, IEEE/ACM Trans. Audio, Speech, Lang. Process. 31 (2023) 2523-2533.

[125]

S Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha, MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, in: The Thirteenth International Conference on Learning Representations.

[126]

Leonard Salewski, Stefan Fauth, A. Koepke, Zeynep Akata, Zero-shot audio captioning with audio-language model guidance and audio context keywords, 2023, arXiv preprint arXiv:2311.08396.

[127]

Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, EngSiong Chng, Ruizhe Li, Listen again and choose the right answer: A new paradigm for automatic speech recognition with large language models, Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 666-679.

[128]

Chao Ji, Diyuan Liu, Wei Gao, Shiwu Zhang, Learning-based locomotion control fusing multimodal perception for a bipedal humanoid robot, Biomim. Intell. Robot., (2025), 100213.

[129]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al., Flamingo: a visual language model for few-shot learning, Adv. Neural Inf. Process. Syst. 35 (2022) 23716-23736.

[130]

Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi, Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, International Conference on Machine Learning, PMLR, 2023, pp. 19730-19742.

[131]

Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang, Mm-react: Prompting chatgpt for multimodal reasoning and action, 2023, arXiv preprint arXiv:2303.11381.

[132]

Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao, LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Parameters 7, 13B.

[133]

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny, Minigpt-4: Enhancing vision-language understanding with advanced large language models, in: 12th International Conference on Learning Representations, ICLR 2024, 2024.

[134]

Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee, Visual instruction tuning, Adv. Neural Inf. Process. Syst. 36 (2023) 34892-34916.

[135]

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee, Improved baselines with visual instruction tuning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26296-26306.

[136]

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei, Kosmos-2: Grounding multimodal large language models to the world, 2023, arXiv preprint arXiv:2306.14824.

[137]

Yueyang Yu, Chuanwei Shi, Jiuqi Tang, Sicheng Zheng, Qwen-VL2 model with neftune technique for medical report generation, 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT, IEEE, 2025, pp. 165-168.

[138]

Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, et al., Mplug-owl: Modularization empowers large language models with multimodality, 2023, arXiv preprint arXiv:2304.14178.

[139]

Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Joshua Adrian Cahyono, Jingkang Yang, Chunyuan Li, Ziwei Liu, Otter: A multi-modal model with in-context instruction tuning, IEEE Trans. Pattern Anal. Mach. Intell., (2025).

[140]

Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, et al., Cogcom: Train large vision-language models diving into details through chain of manipulations, (2024).

[141]

Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li, Vigor: Improving visual grounding of large vision language models with fine-grained reward modeling, European Conference on Computer Vision, Springer, 2024, pp. 37-53.

[142]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al., Learning transferable visual models from natural language supervision, International Conference on Machine Learning, PmLR, 2021, pp. 8748-8763.

[143]

Kai Huang, Boyuan Yang, Wei Gao, Modality plug-and-play: Elastic modality adaptation in multimodal llms for embodied AI, 2023, arXiv preprint arXiv:2312.07886.

[144]

Motonari Kambara, Chiori Hori, Komei Sugiura, Kei Ota, Devesh K Jha, Sameer Khurana, Siddarth Jain, Radu Corcodel, Diego Romeres, Jonathan Le Roux, Human action understanding-based robot planning using multimodal llm, in: IEEE International Conference on Robotics and Automation (ICRA) Workshop, 2024.

[145]

LI Yizhi, Ge Zhang, Yinghao Ma, Ruibin Yuan, King Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Moore Wang, Jian Yang, et al., OmniBench: Towards The Future of Universal Omni-Language Models, in: The Thirty-Ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

[146]

Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, et al., LLMBind: A unified modality-task integration framework, 2024, arXiv preprint arXiv:2402.14891.

[147]

Yuwei Zhang, Tong Xia, Aaqib Saeed, Cecilia Mascolo, Respllm: Unifying audio and text with multimodal LLMs for generalized respiratory health prediction, Machine Learning for Health, ML4H, PMLR, 2025, pp. 1053-1066.

[148]

Mingze Ni, Gang Xu, Hongsen Li, Zhaoming Luo, Limin Pang, Binchao Yu, Design of a service robot system based on a multimodal large model, 2024 6th International Symposium on Robotics & Intelligent Manufacturing Technology, ISRIMT, IEEE, 2024, pp. 81-86.

[149]

Tong Lee Chung, Jianxin Pang, Jun Cheng, Empowering robots with multimodal language models for task planning with interaction, 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing, ISCSLP, IEEE, 2024, pp. 358-362.

[150]

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan, Vima: Robot manipulation with multimodal prompts, (2023).

[151]

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al., Inner monologue: Embodied reasoning through planning with language models, Conference on Robot Learning, PMLR, 2023, pp. 1769-1782.

[152]

Yang Liu, Yanchao Zhao, Weichao Guo, Xinjun Sheng, Han Ding, Enhancing household service robots with a dual-arm mobile manipulator and multimodal large language models, 2024 IEEE International Conference on Robotics and Biomimetics, ROBIO, IEEE, 2024, pp. 1815-1820.

[153]

Jiaqi Wang, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Bao Ge, Shu Zhang, Large language models for robotics: Opportunities, challenges, and perspectives, J. Autom. Intell., (2024).

[154]

Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, et al., Palm-e: An embodied multimodal language model, (2023).

[155]

Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Atmaraaj Gopal, Arihiro Yorita, Naoyuki Kubota, Matthias Rätsch, NMM-HRI: Natural multi-modal human-robot interaction with voice and deictic posture via large language model, 2025, arXiv preprint arXiv:2501.00785.

[156]

Guohui Tian, Jian Jiang, Shanmei Wang, Task Reasoning of Service Robots with Fused, in: Proceedings of the 3rd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM2024): Volume 1, Springer Nature, p. 347.

[157]

Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, et al., Gemini robotics: Bringing ai into the physical world, 2025, arXiv preprint arXiv:2503.20020.

[158]

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al., Gr00t n1: An open foundation model for generalist humanoid robots, 2025, arXiv preprint arXiv:2503.14734.

[159]

Muhammad Imran, Norah Almusharraf, Google Gemini as a next generation AI educational tool: a review of emerging educational technology, Smart Learn. Environ. 11 (1) (2024) 22.

[160]

Ketmanto Wangsa, Shakir Karim, Ergun Gide, Mahmoud Elkhodr, A systematic review and comprehensive analysis of pioneering AI chatbot models from education to healthcare: ChatGPT, Bard, Llama, Ernie and Grok, Futur. Internet 16 (7) (2024) 219.

[161]

Chuanneng Sun, Songjun Huang, Dario Pompili, LLM-based multi-agent decision-making: Challenges and future directions, IEEE Robot. Autom. Lett., (2025).

[162]

Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Jinrui Liu, Haoran Li, Dongbin Zhao, He Wang, Robogpt: an llm-based long-term decision-making embodied agent for instruction following tasks, IEEE Trans. Cogn. Dev. Syst., (2025).

[163]

Shaibal Saha, Lanyu Xu, Vision transformers on the edge: A comprehensive survey of model compression and acceleration strategies, Neurocomputing, (2025), 130417.

[164]

Ioannis Sarridis, Christos Koutlis, Giorgos Kordopatis-Zilos, Ioannis Kompatsiaris, Symeon Papadopoulos, InDistill: Information flow-preserving knowledge distillation for model compression, 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV, IEEE, 2025, pp. 9033-9042.

[165]

Chang Liu, Jun Zhao, Enhancing stability and resource efficiency in LLM training for edge-assisted mobile systems, IEEE Trans. Mob. Comput., (2025).

[166]

Ariana Martino, Michael Iannelli, Coleen Truong, Knowledge injection to counter large language model (LLM) hallucination, European Semantic Web Conference, Springer, 2023, pp. 182-185.

[167]

Marc Hanheide, Moritz Göbelbecker, Graham S Horn, Andrzej Pronobis, Kristoffer Sjöö, Alper Aydemir, Patric Jensfelt, Charles Gretton, Richard Dearden, Miroslav Janicek, et al., Robot task planning and explanation in open and uncertain worlds, Artificial Intelligence 247 (2017) 119-150.

[168]

Yuqian Jiang, Nick Walker, Justin Hart, Peter Stone, Open-world reasoning for service robots, Proceedings of the International Conference on Automated Planning and Scheduling, vol. 29, 2019, pp. 725-733.

[169]

Siyuan Wang, Zhuohan Long, Zhihao Fan, Xuan-Jing Huang, Zhongyu Wei, Benchmark self-evolving: A multi-agent framework for dynamic llm evaluation, in: Proceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 3310-3328.

[170]

Yidong Wang, Zhuohao Yu, Wenjin Yao, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, et al., PandaLM: An automatic evaluation benchmark for LLM instruction tuning optimization, ICLR, (2024).

[171]

Yongkang Ding, Xiaoyin Wang, Hao Yuan, Meina Qu, Xiangzhou Jian, Decoupling feature-driven and multimodal fusion attention for clothing-changing person re-identification, Artif. Intell. Rev. 58 (8) (2025) 241.

[172]

Xiaofeng Han, Shunpeng Chen, Zenghuang Fu, Zhe Feng, Lue Fan, Dong An, Changwei Wang, Li Guo, Weiliang Meng, Xiaopeng Zhang, et al., Multimodal fusion and vision-language models: A survey for robot vision, Inf. Fusion, (2025), 103652.

[173]

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Yin Fang, Jeff Z Pan, Ningyu Zhang, Wen Zhang, Lako: Knowledge-driven visual question answering via late knowledge-to-text injection, in: Proceedings of the 11th International Joint Conference on Knowledge Graphs, 2022, pp. 20-29.

[174]

Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan, Florence-2: Advancing a unified representation for a variety of vision tasks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4818-4829.

[175]

Ying Zhang, Cui-Hua Zhang, Xuyang Shao, User preference-aware navigation for mobile robot in domestic via defined virtual area, J. Netw. Comput. Appl. 173 (2021) 102885.

[176]

Rui Zhen, Wenchao Song, Qiang He, Juan Cao, Lei Shi, Jia Luo, Human-computer interaction system: A survey of talking-head generation, Electronics 12 (1) (2023) 218.

[177]

Haonan Duan, Yifan Yang, Daheng Li, Peng Wang, Human–robot object handover: Recent progress and future direction, Biomim. Intell. Robot. 4 (1) (2024) 100145.

[178]

Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, et al., Binding touch to everything: Learning unified multimodal tactile representations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26340-26353.

[179]

Peng Hao, Chaofan Zhang, Dingzhe Li, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, Shuo Wang, Tla: Tactile-language-action model for contact-rich manipulation, 2025, arXiv preprint arXiv:2503.08548.

[180]

Tong Li, Chengshun Yu, Yuhang Yan, Di Song, Yuxin Shuai, Yifan Wang, Gang Chen, VTLG: A vision-tactile-language grasp generation method oriented towards task, Robot. Comput.-Integr. Manuf. 98 (2026) 103152.