Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey
Junming FAN , Yue YIN , Tian WANG , Wenhang DONG , Pai ZHENG , Lihui WANG
Front. Eng ›› 2025, Vol. 12 ›› Issue (1) : 177 -200.
Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey
human–robot collaboration (HRC) is set to transform the manufacturing paradigm by leveraging the strengths of human flexibility and robot precision. The recent breakthrough of Large Language Models (LLMs) and Vision-Language Models (VLMs) has motivated the preliminary explorations and adoptions of these models in the smart manufacturing field. However, despite the considerable amount of effort, existing research mainly focused on individual components without a comprehensive perspective to address the full potential of VLMs, especially for HRC in smart manufacturing scenarios. To fill the gap, this work offers a systematic review of the latest advancements and applications of VLMs in HRC for smart manufacturing, which covers the fundamental architectures and pretraining methodologies of LLMs and VLMs, their applications in robotic task planning, navigation, and manipulation, and role in enhancing human–robot skill transfer through multimodal data integration. Lastly, the paper discusses current limitations and future research directions in VLM-based HRC, highlighting the trend in fully realizing the potential of these technologies for smart manufacturing.
vision-language models / large language models / human–robot collaboration / smart manufacturing
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
|
| [87] |
|
| [88] |
|
| [89] |
|
| [90] |
|
| [91] |
|
| [92] |
|
| [93] |
|
| [94] |
|
| [95] |
|
| [96] |
|
| [97] |
|
| [98] |
|
| [99] |
|
| [100] |
|
| [101] |
|
| [102] |
|
| [103] |
|
| [104] |
|
| [105] |
|
| [106] |
|
| [107] |
|
| [108] |
|
| [109] |
|
| [110] |
|
| [111] |
|
| [112] |
|
The Author(s). This article is published with open access at link.springer.com and journal. hep.com.cn
/
| 〈 |
|
〉 |