DualCL: dual-level contrastive learning model for multi-modal knowledge graph completion

Jie LI; Simin YANG; Linmei HU; Yuqiu DENG

doi:10.1007/s11704-025-50184-8

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) :2101307 DOI: 10.1007/s11704-025-50184-8

Artificial Intelligence

RESEARCH ARTICLE

DualCL: dual-level contrastive learning model for multi-modal knowledge graph completion

Author information +

History +

PDF (1980KB)

Abstract

Knowledge graph completion aims to predict missing factual triples in knowledge graphs, thereby enhancing their completeness. Recent studies have significantly improved the performance of knowledge graph completion by integrating multi-modal information into knowledge graph representation learning. However, two major challenges remain: first, how to effectively align and integrate embeddings from structural, visual, and textual modalities to improve the quality of entity representations; second, how to strengthen the connections among head entities, relations, and tail entities in correct triples, making their associations more cohesive, thereby more clearly distinguishing between correct and incorrect triples. To address these challenges, we propose a Dual-level Contrastive Learning model (DualCL) for multi-modal knowledge graph completion. Specifically, our model consists of two levels of contrastive learning. (1) At the entity level, we employ a multi-modal contrastive representation method to align structural, visual, and textual information of the same entity into a shared embedding space, ensuring semantic consistency across modalities for more effective multi-modal information integration; (2) At the triple level, we enhance the semantic associations among head entities, relations, and tail entities in correct triples through contrastive learning, while optimizing the model’s ability to distinguish between different “entity-relation-entity” combinations. Experimental results demonstrate that our method outperforms recent strong baseline models on multiple link prediction datasets, thereby validating its effectiveness and advantages in knowledge graph completion.

Graphical abstract

Keywords

multi-modal / knowledge graph completion / representation learning / link prediction / contrastive learning

Cite this article

Download citation ▾

Jie LI, Simin YANG, Linmei HU, Yuqiu DENG. DualCL: dual-level contrastive learning model for multi-modal knowledge graph completion. Front. Comput. Sci., 2027, 21 (1) : 2101307 DOI:10.1007/s11704-025-50184-8

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zou X . A survey on application of knowledge graph. Journal of Physics: Conference Series, 2020, 1487: 012016

[2]	Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1247−1250

[3]	Miller G A . WordNet: a lexical database for English. Communications of the ACM, 1995, 38( 11): 39–41

[4]

Yih W T, Chang M W, He X, Gao J. Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 1321−1331

[5]	Zhou H, Young T, Huang M, Zhao H, Xu J, Zhu X. Commonsense knowledge aware conversation generation with graph attention. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 4623−4629

[6]	Huang X, Zhang J, Li D, Li P. Knowledge graph embedding based question answering. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 2019, 105−113

[7]	Lee J, Wang Y, Li J, Zhang M. Multimodal reasoning with multimodal knowledge graph. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 10767−10782

[8]	Zhao X, Tang J. Modeling temporal-spatial correlations for crime prediction. In: Proceedings of 2017 ACM on Conference on Information and Knowledge Management. 2017, 497−506

[9]	Zhao X, Fan W, Liu H, Tang J. Multi-type urban crime prediction. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 4388−4396

[10]	Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, Guo M. RippleNet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018, 417−426

[11]	Hu L, Li C, Shi C, Yang C, Shao C . Graph neural news recommendation with long-term and short-term interest modeling. Information Processing & Management, 2020, 57( 2): 102142

[12]	Wang X, He X, Cao Y, Liu M, Chua T S. KGAT: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 950−958

[13]	Chen J, Fan W, Zhu G, Zhao X, Yuan C, Li Q, Huang Y. Knowledge-enhanced black-box attacks for recommendations. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 108−117

[14]	Kumar A, Singh S S, Singh K, Biswas B . Link prediction techniques, applications, and performance: a survey. Physica A: Statistical Mechanics and its Applications, 2020, 553: 124289

[15]	Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2013, 2787−2795

[16]	Nickel M, Tresp V, Kriegel H P. A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. 2011, 809−816

[17]	Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2D knowledge graph embeddings. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 221

[18]	Xie R, Liu Z, Luan H, Sun M. Image-embodied knowledge representation learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2016, 3140−3146

[19]	Xu D, Xu T, Wu S, Zhou J, Chen E. Relation-enhanced negative sampling for multimodal knowledge graph completion. In: Proceedings of the 30th ACM International Conference on Multimedia. 2022, 3857−3866

[20]	Lee J, Chung C, Lee H, Jo S, Whang J J. VISTA: visual-textual knowledge graph representation learning. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 7314−7328

[21]	Chen X, Zhang N, Li L, Deng S, Tan C, Xu C, Huang F, Si L, Chen H. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2022, 904−915

[22]	Zhang Y, Chen Z, Liang L, Chen H, Zhang W. Unleashing the power of imbalanced modality information for multi-modal knowledge graph completion. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 17120−17130

[23]	Zhang Y, Chen Z, Guo L, Xu Y, Hu B, Liu Z, Zhang W, Chen H. NativE: multi-modal knowledge graph completion in the wild. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 91−101

[24]	Xu D, Zhang Z, Lin Z, Wu X, Zhu Z, Xu T, Zhao X, Zheng Y, Chen E. Multi-perspective improvement of knowledge graph completion with large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 11956−11968

[25]	Wang Z, Zhang J, Feng J, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1112−1119

[26]	Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2181−2187

[27]	Ji G, He S, Xu L, Liu K, Zhao J. Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 687−696

[28]	Sun Z, Deng Z H, Nie J Y, Tang J. RotatE: knowledge graph embedding by relational rotation in complex space. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[29]	Zhang S, Tay Y, Yao L, Liu Q. Quaternion knowledge graph embeddings. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 246

[30]	Cao Z, Xu Q, Yang Z, Cao X, Huang Q. Dual quaternion knowledge graph embeddings. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 6894−6902

[31]	Chami I, Wolf A, Juan D C, Sala F, Ravi S, Ré C. Low-dimensional hyperbolic knowledge graph embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6901−6914

[32]	Nguyen D Q, Nguyen T D, Nguyen D Q, Phung D. A novel embedding model for knowledge base completion based on convolutional neural network. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018, 327−333

[33]	Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[34]	Vashishth S, Sanyal S, Nitin V, Talukdar P P. Composition-based multi-relational graph convolutional networks. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[35]	Xie R, Liu Z, Jia J, Luan H, Sun M. Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2659−2665

[36]	Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. 2018, 225−234

[37]	Lu X, Wang L, Jiang Z, He S, Liu S . MMKRL: a robust embedding approach for multi-modal knowledge graph representation learning. Applied Intelligence, 2022, 52( 7): 7480–7497

[38]	Li X, Zhao X, Xu J, Zhang Y, Xing C. IMF: interactive multimodal fusion model for link prediction. In: Proceedings of the ACM Web Conference 2023. 2023, 2572−2580

[39]	Zhang Y, Chen Z, Guo L, Xu Y, Hu B, Liu Z, Zhang W, Chen H. Multiple heads are better than one: mixture of modality knowledge experts for entity representation learning. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[40]	Zheng S, Wang W, Qu J, Yin H, Chen W, Zhao L. MMKGR: Multi-hop multi-modal knowledge graph reasoning. In: Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE). 2023, 96−109

[41]	Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 8748−8763

[42]	Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala K V, Joulin A, Misra I. ImageBind one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 15180−15190

[43]	Wang Z, Zhao Y, Cheng X, Huang H, Liu J, Tang L, Li L, Wang Y, Yin A, Zhang Z, Zhao Z. Connecting multi-modal contrastive representations. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 970

[44]	Zhang Z, Wang Z, Liu L, Huang R, Cheng X, Ye Z, Liu W, Liu H, Huang H, Zhao Y, Jin T, Zheng S, Zhao Z. Extending multi-modal contrastive representations. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 2915

[45]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[46]	Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of 2014 conference on empirical methods in natural language processing (EMNLP). 2014, 1532−1543

[47]	Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: Proceedings of the 1st International Conference on Learning Representations. 2013

[48]	Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186

[49]	Liu Y, Li H, Garcia-Duran A, Niepert M, Onoro-Rubio D, Rosenblum D S. MMKG: multi-modal knowledge graphs. In: Proceedings of the 16th International Conference on the Semantic Web. 2019, 459−474

[50]	Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255