ROSGPT:Natural Language Control of Mobile Robots Navigation via Large Language Model

Jiacui HUANG , Mingbo ZHAO , Hongtao ZHANG

Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (3) : 315 -329.

PDF (19607KB)
Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (3) :315 -329. DOI: 10.19884/j.1672-5220.202405007
Information Technology and Artificial Intelligence
research-article

ROSGPT:Natural Language Control of Mobile Robots Navigation via Large Language Model

Author information +
History +
PDF (19607KB)

Abstract

The objective of this work is to develop an innovative system(ROSGPT) that merges large language models(LLMs) with the robot operating system(ROS), facilitating natural language voice control of mobile robots. This integration aims to bridge the gap between human-robot interaction(HRI) and artificial intelligence(AI). ROSGPT integrates several subsystems, including speech recognition, prompt engineering, LLM and ROS, enabling seamless control of robots through human voice or text commands. The LLM component is optimized, with its performance refined from the open-source Llama2 model through fine-tuning and quantization procedures. Through extensive experiments conducted in both real-world and virtual environments, ROSGPT demonstrates its efficacy in meeting user requirements and delivering user-friendly interactive experiences. The system demonstrates versatility and adaptability through its ability to comprehend diverse user commands and execute corresponding tasks with precision and reliability, thereby showcasing its potential for various practical applications in robotics and AI. The demonstration video can be viewed at https://iklxo6z9yv.feishu.cn/docx/Lux3dmTDxoZ5YnxWJTZcxUCWnTh.

Keywords

Llama2 model / large language model(LLM) / automatic speech recognition(ASR) / human-robot interaction(HRI) / robot operating system(ROS) / Habitat simulator

Cite this article

Download citation ▾
Jiacui HUANG, Mingbo ZHAO, Hongtao ZHANG. ROSGPT:Natural Language Control of Mobile Robots Navigation via Large Language Model. Journal of Donghua University(English Edition), 2025, 42(3): 315-329 DOI:10.19884/j.1672-5220.202405007

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

VEMPRALA S H, BONATTI R, BUCKER A, et al. Chat GPT for robotics: design principles and model abilities[J]. IEEE Access, 2024, 12:55682-55696.

[2]

BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[EB/OL].(2020-07-22)[2023-05-06]. https://arxiv.org/abs/2005.14165.

[3]

KOUBAA A, AMMAR A, BOULILA W. Next-generation human-robot interaction with ChatGPT and robot operating system[J]. Software:Practice and Experience, 2025, 55:355-382.

[4]

CANGELOSI A, BONGARD J, FISCHER M H, et al. Embodied intelligence[M]//KACPRZYK J, PEDRYCZ W, eds.Springer Handbook of Computational Intelligence. Berlin: Springer, 2015:697-714.

[5]

MARVIN G, HELLEN N, JJINGO D, et al. Prompt engineering in large language models[C]//Data Intelligence and Cognitive Informatics. Singapore: Springer Nature Singapore, 2024:387-402.

[6]

GIRAY L. Prompt engineering with ChatGPT:a guide for academic writers[J]. Annals of Biomedical Engineering, 2023, 51(12):2629-2633.

[7]

WHITE J, FU Q C, HAYS S, et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT[EB/OL].(2023-02-21)[2023-07-08]. https://arxiv.org/abs/2302.11382v1.

[8]

CHEN S L, WANG W C, CHEN X L, et al. LLaMA-LoRA neural prompt engineering:a deep tuning framework for automatically generating Chinese text logical reasoning thinking chains[J]. Data Intelligence, 2024, 6(2):375-408.

[9]

HASSAN E, BHATNAGAR R, SHAMS M Y. Advancing scientific research in computer science by ChatGPT and LLaMA:a review[C]//Intelligent Manufacturing and Energy Sustainability. Singapore: Springer Nature Singapore, 2024:23-37.

[10]

TOUVRON H, MARTIN L, STONE K, et al. Llama 2:open foundation and fine-tuned chat models[EB/OL].(2023-07-19)[2023-09-18]. https://arxiv.org/abs/2307.09288.

[11]

ROZIèRE B, GEHRING J, GLOECKLE F, et al. Code Llama:open foundation models for code[EB/OL].(2023-08-24)[2024-04-05]. https://arxiv.org/abs/2308.12950v3.

[12]

XU C W, GUO D Y, DUAN N, et al. Baize:an open-source chat model with parameter-efficient tuning on self-chat data[EB/OL].(2023-04-03)[2024-03-04]. https://arxiv.org/abs/2304.01196v4.

[13]

GRUMEZA T R, LAZAR T A, FORTIS A E. Social robots and edge computing:integrating cloud robotics in social interaction[C]//International Conference on Advanced Information Networking and Applications. Cham: Springer, 2024:55-64.

[14]

ZHANG R R, HAN J M, LIU C, et al. LLaMA-Adapter:efficient fine-tuning of language models with zero-init attention[EB/OL].(2023-03-28)[2023-07-28]. https://arxiv.org/abs/2303.16199v3.

[15]

DETTMERS T, PAGNONI A, HOLTZMAN A, et al. QLoRA:efficient finetuning of quantized LLMs[EB/OL].(2019-06-06)[2024-05-01]. https://arxiv.org/pdf/1906.02569.

[16]

FRANTAR E, ASHKBOOS S, HOEFLER T, et al. GPTQ:accurate post-training quantization for generative pre-trained transformers[EB/OL].(2022-10-31)[2023-9-15]. https://arxiv.org/abs/2210.17323v2.

[17]

LECUN Y, DENKER J, SOLA S. Optimal brain damage[J]. Advances in Neural Information Processing Systems, 1989, 2:598-605.

[18]

FRANTAR E, ALISTARH D. Optimal brain compression:a framework for accurate post-training quantization and pruning[J]. Advances in Neural Information Processing Systems, 2022, 35:4475-4488.

[19]

ABID A, ABDALLA A, ABID A, et al. Gradio:hassle-free sharing and testing of ML models in the wild[EB/OL].(2019-06-06)[2024-05-01]. https://arxiv.org/abs/1906.02569v1.

[20]

CAO C, ZHU H B, YANG F, et al.Autonomous exploration development environment and the planning algorithms[C]//2022 International Conference on Robotics and Automation (ICRA). New York: IEEE, 2022:8921-8928.

[21]

CAO C, ZHU H, REN Z, et al. Representation granularity enables time-efficient autonomous exploration in large, complex worlds[J]. Science Robotics, 2023, 8(80):eadf0970.

[22]

SAVVA M, KADIAN A, MAKSYMETS O, et al. Habitat:a platform for embodied AI research[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2019:9338-9346.

[23]

ESTEFO P, SIMMONDS J, ROBBES R, et al. The robot operating system:package reuse and community dynamics[J]. Journal of Systems and Software, 2019, 151:226-242.

[24]

DÉHARBE D, GALVÃO S, MOREIRA A M, Formalizing FreeRTOS:first steps[C]//Formal Methods:Foundations and Applications. Berlin: Springer, 2009:101-117.

[25]

HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2):100-107.

Funding

National Natural Science Foundation of China(61601112)

PDF (19607KB)

167

Accesses

0

Citation

Detail

Sections
Recommended

/