Multimodal semantic communication system based on graph neural networks

Xinran Ba , Xinguang Zhang , Shufeng Li , Jin Yuan , Jun Hu

Intelligence & Robotics ›› 2025, Vol. 5 ›› Issue (3) : 805 -26.

PDF
Intelligence & Robotics ›› 2025, Vol. 5 ›› Issue (3) :805 -26. DOI: 10.20517/ir.2025.41
Research Article

Multimodal semantic communication system based on graph neural networks

Author information +
History +
PDF

Abstract

Current semantic communication systems primarily use single-modal data and face challenges such as intermodal information loss and insufficient fusion, limiting their ability to meet personalized demands in complex scenarios. To address these limitations, this study proposes a novel multimodal semantic communication system based on graph neural networks. The system integrates graph convolutional networks and graph attention networks to collaboratively process multimodal data and leverages knowledge graphs to enhance semantic associations between image and text modalities. A multilayer bidirectional cross-attention mechanism is introduced to mine fine-grained semantic relationships across modalities. Shapley-value-based dynamic weight allocation optimizes intermodal feature contributions. In addition, a long short-term memory-based semantic correction network is designed to mitigate distortion caused by physical and semantic noise. Experiments performed using multimodal tasks (emotion analysis and visual question answering) demonstrate the superior performance of the system. Under low signal-to-noise ratio conditions, the proposed BERT-ResNet and GCN–GAT enhanced deep semantic communication (BR-GG-DeepSC) model achieves higher accuracy than conventional methods, while reducing the total number of transmitted symbols to approximately 33% of that in conventional approaches. These results validate the robustness, efficiency, and potential of the proposed system for practical deployment in resource-constrained environments.

Keywords

Semantic communication / graph neural networks / multimodal fusion

Cite this article

Download citation ▾
Xinran Ba, Xinguang Zhang, Shufeng Li, Jin Yuan, Jun Hu. Multimodal semantic communication system based on graph neural networks. Intelligence & Robotics, 2025, 5(3): 805-26 DOI:10.20517/ir.2025.41

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Xie H,Li GY.Deep learning enabled semantic communication systems.IEEE Trans Signal Process2021;69:2663-75

[2]

Zhou Q,Zhao Z,Zhang H.Semantic communication with adaptive universal transformer.IEEE Wirel Commun Lett2022;11:453-7

[3]

Lee CH,Chen PH.Deep learning-constructed joint transmission-recognition for Internet of Things.IEEE Access2019;7:76547-61

[4]

Zhang G,Qin Z,Yu G.A unified multi-task semantic communication system for multimodal data.IEEE Trans Commun2024;72:4101-16

[5]

Kipf TN. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. Available online: https://doi.org/10.48550/arXiv.1609.02907. (accessed 29 Sep 2025)

[6]

Veličković P,Casanova A,Lio P. Graph attention networks. arXiv 2017, arXiv:1710.10903. Available online: https://doi.org/10.48550/arXiv.1710.10903. (accessed 29 Sep 2025)

[7]

Han Y,Kundu S,Wang Z.Vision HGNN: an image is more than a graph of nodes. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France. October 01-06, 2023. IEEE; 2023. pp. 19821-31.

[8]

Zhao L,Zhang C,Wang P.T-GCN: a temporal graph convolutional network for traffic prediction.IEEE Trans Intell Transp Syst2020;21:3848-58

[9]

Shannon CE.A mathematical theory of communication.Bell Syst Tech J1948;27:379-423

[10]

Bao J,Dean M,Swami A.Towards a theory of semantic communication. In 2011 IEEE Network Science Workshop, West Point, USA. June 22-24, 2011. IEEE; 2011. pp. 110-7.

[11]

Zhong Y.A theory of semantic information.China Commun2017;14:1-17

[12]

Shi G,Li Y.From semantic communication to semantic-aware networking: model, architecture, and open problems.IEEE Commun Mag2021;59:44-50

[13]

Niu K,Zhang P,Wang S.Semantic communication for 6G.Mobile Commun2021;45:85-90

[14]

Liu W,Bai B.Efficient semantic communication method for bandwidth constrained scenarios.J Xidian Univ2024;51:9-18

[15]

Lu Y,Niu K.Key technologies of semantic communication for industrial networks.Mobile Commun2023;47:18-24

[16]

Cavagna A,Iosifidis A.Semantic communication enabling robust edge intelligence for time-critical IoT applications. In 2023 IEEE International Conference on Communications Workshops (ICC Workshops), Rome, Italy. May 28 - Jun 01, 2023. IEEE; 2023. pp. 1617-22.

[17]

Wang L,Zhou F,Qin Z.Adaptive resource allocation for semantic communication networks.IEEE Trans Commun2024;72:6900-16

[18]

Luo X,Guo Q.Semantic communications: overview, open issues, and future research directions.IEEE Wirel Commun2022;29:210-9

[19]

Farsad N,Goldsmith A.Deep learning for joint source-channel coding of text. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada. April 15-20, 2018. IEEE; 2018. pp. 2326-30.

[20]

Pennington J,Manning C.GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics; 2014. pp. 1532-43.

[21]

Bahdanau D,Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. Available online: https://doi.org/10.48550/arXiv.1409.0473. (accessed 29 Sep 2025)

[22]

Wu Y,Chen Z. Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. Available online: https://doi.org/10.48550/arXiv.1609.08144. (accessed 29 Sep 2025)

[23]

Graves A. Sequence transduction with recurrent neural networks. arXiv 2012, arXiv:1211.3711. Available online: https://doi.org/10.48550/arXiv.1211.3711. (accessed 29 Sep 2025)

[24]

Mikolov T,Corrado G. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. Available online: https://doi.org/10.48550/arXiv.1301.3781. (accessed 29 Sep 2025)

[25]

Sana M. Learning semantics: an opportunity for effective 6G communications. arXiv 2021, arXiv:2110.08049. Available online: https://doi.org/10.48550/arXiv.2110.08049. (accessed 29 Sep 2025)

[26]

Dehghani M,Vinyals O,Kaiser L. Universal transformers. arXiv 2018, arXiv:1807.03819. Available online: https://doi.org/10.48550/arXiv.1807.03819. (accessed 29 Sep 2025)

[27]

He K,Ren S.Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA. June 27-30, 2016. IEEE; 2016. pp. 770-8.

[28]

Xie H,Li GY.Task-oriented multi-user semantic communications for VQA.IEEE Wirel Commun Lett2022;11:553-7

[29]

Xie H,Tao X.Task-oriented multi-user semantic communications.IEEE J Sel Areas Commun2022;40:2584-97

[30]

Antol S,Lu J,Batra D.VQA: visual question answering. In 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile. December 07-13, 2015. IEEE; 2015. pp. 2425-33.

[31]

Devlin J,Lee K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv 2019, arXiv:1810.04805. Available online: https://doi.org/10.48550/arXiv.1810.04805. (accessed 29 Sep 2025)

[32]

Zhou T,Wu J.ResNeXt and Res2Net structures for speaker verification. In 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China. January 19-22, 2021. IEEE; 2021. pp. 301-7.

AI Summary AI Mindmap
PDF

802

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/