A review on multimodal communications for human-robot collaboration in 5G: from visual to tactile

Zhuorui Wang , Mingkai Chen , Qian Liu

Intelligence & Robotics ›› 2025, Vol. 5 ›› Issue (3) : 579 -606.

PDF
Intelligence & Robotics ›› 2025, Vol. 5 ›› Issue (3) :579 -606. DOI: 10.20517/ir.2025.30
Review

A review on multimodal communications for human-robot collaboration in 5G: from visual to tactile

Author information +
History +
PDF

Abstract

With collaborative advances in wireless communication, artificial intelligence, and sensor technologies, robotic systems are undergoing a revolutionary evolution from single-function actuators to intelligent task processing platforms. In complex dynamic environments, the limitations of conventional unimodal perception have become increasingly apparent, struggling to meet the precision requirements for object attribute recognition and environmental interaction. In the future, deep-integrated multimodal perception technologies will emerge as a predominant trend, where cross-modal communication between vision and tactile sensing represents a critical breakthrough direction for enhancing robotic environmental cognition. Currently, research on multimodal visual-tactile communication remains scarce. Therefore, this paper conducts a comprehensive survey of this emerging field. First, this paper systematically summarizes mature video and tactile communication frameworks. Subsequently, this paper analyzes current implementations of single-modal streaming transmission for visual and tactile data, thereby investigating the state-of-the-art in multimodal visual-tactile communication. Finally, this paper briefly explores the promising prospects of visual-tactile communication technology, highlighting its transformative potential to enable context-aware robotic manipulation and adaptive human-robot collaboration.

Keywords

AI / video communication / multi-model communication / Tactile Internet / human-robot collaboration

Cite this article

Download citation ▾
Zhuorui Wang, Mingkai Chen, Qian Liu. A review on multimodal communications for human-robot collaboration in 5G: from visual to tactile. Intelligence & Robotics, 2025, 5(3): 579-606 DOI:10.20517/ir.2025.30

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Lei SM,Sun MT.Video bridging based on H.261 standard.IEEE Trans Circuits Syst Video Technol1994;4:425-37

[2]

Sikora T.MPEG digital video-coding standards.IEEE Signal Process Mag1997;14:82-100

[3]

Hartung F.Digital watermarking of MPEG-2 coded video in the bitstream domain. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany. Apr 21-24, 1997. pp. 2621-4.

[4]

Marpe D,Sullivan G.The H.264/MPEG4 advanced video coding standard and its applications.IEEE Commun Mag2006;44:134-43

[5]

Sullivan GJ,Han W.Overview of the high efficiency video coding (HEVC) standard.IEEE Trans Circuits Syst Video Technol2012;22:1649-68

[6]

Bross B,Ye Y.Overview of the versatile video coding (VVC) standard and its applications.IEEE Trans Circuits Syst Video Technol2021;31:3736-64

[7]

Muhammad M,De Cola T.Advanced transport satellite protocol. In 2012 IEEE Global Communications Conference (GLOBECOM), Anaheim, USA. Dec 03-07, 2012. IEEE; 2012. pp. 3299-304.

[8]

Stockhammer T.Dynamic adaptive streaming over HTTP -: standards and design principles. In Proceedings of the Second Annual ACM Conference on Multimedia Systems. Association for Computing Machinery; 2011. pp. 133-44.

[9]

Wang D,Ma X.Adaptive wireless video streaming based on edge computing: opportunities and approaches.IEEE Trans Serv Comput2019;12:685-97

[10]

Souane N,Douga Y.Deep reinforcement learning-based approach for video streaming: dynamic adaptive video streaming over HTTP.Appl Sci2023;13:11697

[11]

Abou-Zeid H,Valentin S.Energy-efficient adaptive video transmission: exploiting rate predictions in wireless networks.IEEE Trans Veh Technol2014;63:2013-26

[12]

Zhou L,Chen J.Greening the smart cities: energy-efficient massive content delivery via D2D communications.IEEE Trans Ind Inf2018;14:1626-34

[13]

Min X,Sun W,Zhai G.Perceptual video quality assessment: a survey.Sci China Inf Sci2024;67:4133

[14]

Zhao T,Chen CW.QoE in video transmission: a user experience-driven strategy.IEEE Commun Surv Tutorials2017;19:285-302

[15]

Antonakoglou K,Steinbach E,Dohler M.Toward haptic communications over the 5G Tactile Internet.IEEE Commun Surv Tutorials2018;20:3034-59

[16]

Simpkins A.Robotic tactile sensing: technologies and system (Dahiya, R.S. and Valle, M.; 2013) [On the Shelf].IEEE Robot Automat Mag2013;20:107

[17]

Chen CC,Shih WP.Flexible tactile sensor with high sensitivity utilizing botanical epidermal cell natural micro-structures. In SENSORS, 2012 IEEE, Taipei, Taiwan. Oct 28-31, 2012. IEEE; 2012. p. 1-4.

[18]

Rana A,Duchaine V.An improved soft dielectric for a highly sensitive capacitive tactile sensor.IEEE Sensors J2016;16:7853-63

[19]

Pyo S,Kim M.Development of a flexible three-axis tactile sensor based on screen-printed carbon nanotube-polymer composite.J Micromech Microeng2014;24:075012

[20]

Liu W,Gu C,Fu X.Fingertip piezoelectric tactile sensor array for roughness encoding under varying scanning velocity.IEEE Sensors J2017;17:6867-79

[21]

Massaro A,Cazzato P,Cingolani R.Robot tactile sensing: gold nanocomposites as highly sensitive real-time optical pressure sensors.IEEE Robot Automat Mag2013;20:82-90

[22]

Fujiwara E,Wu YT,Schenkel EA.Optical fiber tactile sensor based on fiber specklegram analysis. In 2017 25th Optical Fiber Sensors Conference (OFS), Jeju, Korea. Apr 24-28, 2017. IEEE; 2017. p. 1-4.

[23]

Alfadhel A.Magnetic nanocomposite cilia tactile sensor.Adv Mater2015;27:7888-92

[24]

Yan Y,Yang Z.Soft magnetic skin for super-resolution tactile sensing with force self-decoupling.Sci Robot2021;6:eabc8801

[25]

Holland O,Prasad RV.The IEEE 1918.1 “Tactile Internet” Standards Working Group and its Standards.Proc IEEE2019;107:256-79

[26]

Sakr N,Zhao J.Human perception-based data reduction for haptic communication in Six-DoF telepresence systems.IEEE Trans Instrum Meas2011;60:3534-46

[27]

Xu Y,Zheng Q,Zhao T.Perception-based prediction for efficient kinesthetic coding.IEEE Signal Process Lett2024;31:2530-4

[28]

Hassen R,Steinbach E.PVC-SLP: perceptual vibrotactile-signal compression based-on sparse linear prediction.IEEE Trans Multimedia2021;23:4455-68

[29]

Steinbach E,Eid M.Haptic codecs for the Tactile Internet.Proc IEEE2019;107:447-70

[30]

Huang K.Consensus-based peer-to-peer control architecture for multiuser haptic interaction over the internet.IEEE Trans Robot2013;29:417-31

[31]

Schuwerk C,Steinbach E.An area-of-interest based communication architecture for shared haptic virtual environments. In 2013 IEEE International Symposium on Haptic Audio Visual Environments and Games (HAVE), Istanbul, Turkey. Oct 26-27, 2013. IEEE; 2013. pp. 57-62.

[32]

Ateya AA,Kirichek R.Multilevel cloud based Tactile Internet system. In 2017 19th International Conference on Advanced Communication Technology (ICACT), PyeongChang, Korea. Feb 19-22, 2017. IEEE; 2017. pp. 105-10.

[33]

Hu Z,Wang T,Li X.Caching as a service: small-cell caching mechanism design for service providers.IEEE Trans Wireless Commun2016;15:6992-7004

[34]

Ansari N.Mobile edge computing empowers Internet of Things.IEICE Trans Commun2018;E101.B:604-19

[35]

Kiani A.Toward hierarchical mobile edge computing: an auction-based profit maximization approach.IEEE Internet Things J2017;4:2082-91

[36]

Hou Z,Li Y,Dohler M.Intelligent communications for Tactile Internet in 6G: requirements, technologies, and challenges.IEEE Commun Mag2021;59:82-8

[37]

Wei X,Zhou L.A QoE-driven Tactile Internet architecture for smart city.IEEE Network2020;34:130-6

[38]

Kokkonis, G.; The Society of Digital Information and Wireless Communication. An open source architecture of a wireless body area network in a medical environment.Int J Digit Inf Wirel Commun2016;6:63-77

[39]

Gokhale V,Chaudhuri S.HoIP: haptics over Internet protocol. In 2013 IEEE International Symposium on Haptic Audio Visual Environments and Games (HAVE), Istanbul, Turkey. Oct 26-27, 2013. IEEE; 2013. pp. 45-50.

[40]

Gokhale V,Dabeer O.HoIP: a point-to-point haptic data communication protocol and its evaluation. In 2015 Twenty First National Conference on Communications (NCC), Mumbai, India. Feb 27 - Mar 01, 2015. IEEE; 2015. p. 1-6.

[41]

Kontogiannis S.Proposed fuzzy real-time haptics protocol carrying haptic data and multisensory streams.Int J Comput Commun Control2020;15:

[42]

Phung H,Jung H,Nguyen CT.Haptic display responsive to touch driven by soft actuator and soft sensor.IEEE/ASME Trans Mechatron2021;26:2495-505

[43]

Uramune R,Hiraki T,Ikeda S.HaPouch: a miniaturized, soft, and wearable haptic display device using a liquid-to-gas phase change actuator.IEEE Access2022;10:16830-42

[44]

Zhu L,Shen J,Mo Y.TapeTouch: a handheld shape-changing device for haptic display of soft objects.IEEE Trans Vis Comput Graph2022;28:3928-38

[45]

Sakr N,Zhao J.A perceptual quality metric for haptic signals. In 2007 IEEE International Workshop on Haptic, Audio and Visual Environments and Games, Ottawa, Canada. Oct 12-14, 2007. IEEE; 2007. pp. 27-32.

[46]

Chaudhari R,Hirche S.Towards an objective quality evaluation framework for haptic data reduction. In 2011 IEEE World Haptics Conference, Istanbul, Turkey. Jun 21-24, 2011. IEEE; 2011. pp. 539-44.

[47]

Hassen R.HSSIM: an objective haptic quality assessment measure for force-feedback signals. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), Cagliari, Italy. May 29 - Jun 01, 2018. IEEE; 2018. p. 1-6.

[48]

Liu X,Deng Y.Vibrotactile quality assessment: hybrid metric design based on SNR and SSIM.IEEE Trans Multimedia2020;22:921-33

[49]

She C,Quek TQS.Radio resource management for ultra-reliable and low-latency communications.IEEE Commun Mag2017;55:72-8

[50]

Nielsen JJ,Popovski P.Ultra-reliable low latency communication using interface diversity.IEEE Trans Commun2018;66:1322-34

[51]

Kotaba R,Balercia T.How URLLC can benefit from NOMA-based retransmissions.IEEE Trans Wireless Commun2021;20:1684-99

[52]

Tanveer J,Ali R.An overview of reinforcement learning algorithms for handover management in 5G ultra-dense small cell networks.Appl Sci2022;12:426

[53]

Yuan Y,Feng H.An iterative matching-stackelberg game model for channel-power allocation in D2D underlaid cellular networks.IEEE Trans Wireless Commun2018;17:7456-71

[54]

Zhang S,Guo H,Kato N.Envisioning device-to-device communications in 6G.IEEE Network2020;34:86-91

[55]

Bennis M,Poor HV.Ultrareliable and low-latency wireless communication: tail, risk, and scale.Proc IEEE2018;106:1834-53

[56]

Yamaguchi A.Combining finger vision and optical tactile sensing: reducing and handling errors while cutting vegetables. In 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), Cancun, Mexico. Nov 15-17, 2016. IEEE; 2016; pp. 1045-51.

[57]

Yuan W,Adelson EH.GelSight: high-resolution robot tactile sensors for estimating geometry and force.Sensors2017;17:2762 PMCID:PMC5751610

[58]

Donlon E,Liu M,Adelson E.GelSlim: a high-resolution, compact, robust, and calibrated tactile-sensing finger. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain. Oct 01-05, 2018. IEEE; 2018. pp. 1927-34.

[59]

Wang S,Romero B.GelSight wedge: measuring high-resolution 3D contact geometry with a compact robot finger. In 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China. May 30 - Jun 05, 2021. IEEE; 2021. pp. 6468-75.

[60]

Gomes DF,Luo S.GelTip: a finger-shaped optical tactile sensor for robotic manipulation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA. Oct 24 2020 - Jan 24 2021, 2021. IEEE; 2021. pp. 9903-9.

[61]

Fan W,Si W,Lepora N.ViTacTip: design and verification of a novel biomimetic physical vision-tactile fusion sensor. In 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan. May 13-17, 2024. IEEE; 2024. pp. 1056-62.

[62]

Kuppuswamy N,Uttamchandani A,Ikeda T.Soft-bubble grippers for robust and perceptive manipulation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA. Oct 24 2020 - Jan 24 2021. IEEE; 2021. pp. 9917-24.

[63]

Zhang L,Jiang Y. Tac3D: a novel vision-based tactile sensor for measuring forces distribution and estimating friction coefficient distribution. arXiv 2022, arXiv:2202.06211. https://arxiv.org/abs/2202.06211. (accessed 26 Jun 2025)

[64]

Liu H,Sun F.Visual–tactile fusion for object recognition.IEEE Trans Automat Sci Eng2017;14:996-1008

[65]

Lee JT,Luo S.“Touching to See” and “Seeing to Feel”: robotic cross-modal sensory data generation for visual-tactile perception. In 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada. May 20-24, 2019. IEEE; 2019. pp. 4276-82.

[66]

Wei F,Shan C.Alignment and multi-scale fusion for visual-tactile object recognition. In 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy. Jul 18-23, 2022. IEEE; 2022. p. 1-8.

[67]

Babadian RP,Amiri M.Fusion of tactile and visual information in deep learning models for object recognition.Inf Fusion2023;92:313-25

[68]

Falco P,Natale C,Lee D.A transfer learning approach to cross-modal object recognition: from visual observation to robotic haptic exploration.IEEE Trans Robot2019;35:987-98

[69]

Zhou L,Chen J.Cross-modal collaborative communications.IEEE Wireless Commun2020;27:112-7

[70]

Yang L,Zhou L.Heterogeneous stream scheduling for cross-modal transmission.IEEE Trans Commun2021;69:6037-49

[71]

Wu D.Cross-modal stream transmission: architecture, strategy, and technology.IEEE Wireless Commun2024;31:134-40

[72]

Tong Q,Liu C,Zhang Y.Cross-modal transmission with active packet loss and restoration for Tactile Internet.IEEE Commun Mag2024;62:70-6

[73]

Wei X,Zhou L,Zhuang W.Toward generic cross-modal transmission strategy.IEEE Trans Commun2024;72:6059-72

[74]

Suo Y,Gao Y.Dynamic transmission mode selection for multi-modal services.IEEE Commun Lett2023;27:911-5

[75]

Li L,Hou R,Lin B.Energy-efficient proactive caching for adaptive video streaming via data-driven optimization.IEEE Int Things J2020;7:5549-61

[76]

Li C,Zou J,Frossard P.QoE-driven mobile edge caching placement for adaptive video streaming.IEEE Trans Multimedia2018;20:965-84

[77]

Chen M,Hu L,Ghoneim A.Edge-CoCaCo: toward joint optimization of computation, caching, and communication on edge cloud.IEEE Wireless Commun2018;25:21-7

[78]

Gao Y,Kang B.Edge intelligence empowered cross-modal streaming transmission.IEEE Network2021;35:236-43

[79]

Yuan Z,Zhou L.Content-aware cross-modal stream transmission.IEEE Wireless Commun Lett2024;13:2507-11

[80]

Gao Y,Zhou L.CRoss-MODAL communications for holographic video streaming.IEEE Wireless Commun2025;32:96-102

[81]

Wei X,Zhou L.Haptic signal reconstruction for cross-modal communications.IEEE Trans Multimedia2022;24:4514-25

[82]

Chen M.Cross-modal reconstruction for tactile signal in human-robot interaction.Sensors2022;22:6517 PMCID:PMC9460542

[83]

Yang Z,Shi Y,Wei X.Fine-grained audio-visual aided haptic signal reconstruction.IEEE Signal Process Lett2024;31:1349-53

[84]

Chen Y,Wu D.Toward general cross-modal signal reconstruction for robotic teleoperation.IEEE Trans Multimedia2024;26:3541-53

[85]

Wei X,Wang H.Perception-aware cross-modal signal reconstruction: from audio-haptic to visual.IEEE Trans Multimedia2023;25:5527-38

[86]

Wei X.AI-enabled cross-modal communications.IEEE Wireless Commun2021;28:182-9

[87]

Farooq A,Raisamo R.Haptic mediation through artificial intelligence: magnetorheological fluid as vibrotactile signal mediator. In 2022 Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS (DTIP), Pont-a-Mousson, France. Jul 11-13, 2022. IEEE; 2022. p. 1-4.

[88]

Cheng L,Di B,Song L.Large language models empower multimodal integrated sensing and communication.IEEE Commun Mag2025;63:190-7

[89]

Lipkova J,Chen B.Artificial intelligence for multimodal data integration in oncology.Cancer Cell2022;40:1095-110 PMCID:PMC10655164

[90]

Shao J,Zhang Q,Wang C.Predicting gene mutation status via artificial intelligence technologies based on multimodal integration (MMI) to advance precision oncology.Semin Cancer Biol2023;91:1-15

[91]

Wang T,Jiang D.A multimodal large language model framework for intelligent perception and decision-making in smart manufacturing.Sensors2025;25:3072 PMCID:PMC12114979

[92]

Sanfilippo F,Girdžiūna M,Kiudys E.A multi-modal auditory-visual-tactile e-learning framework. In: Sanfilippo F, Granmo O, Yayilgan SY, Bajwa IS, editors. Intelligent technologies and applications. Cham: Springer International Publishing; 2022. pp. 119-31.

[93]

Xu C,He B.An active strategy for safe human–robot interaction based on visual–tactile perception.IEEE Syst J2023;17:5555-66

[94]

Xu W,Gong L.Natural teaching for humanoid robot via human-in-the-loop scene-motion cross-modal perception.IR2019;46:404-14

AI Summary AI Mindmap
PDF

440

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/