DualCL: dual-level contrastive learning model for multi-modal knowledge graph completion
Jie LI , Simin YANG , Linmei HU , Yuqiu DENG
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) : 2101307
Knowledge graph completion aims to predict missing factual triples in knowledge graphs, thereby enhancing their completeness. Recent studies have significantly improved the performance of knowledge graph completion by integrating multi-modal information into knowledge graph representation learning. However, two major challenges remain: first, how to effectively align and integrate embeddings from structural, visual, and textual modalities to improve the quality of entity representations; second, how to strengthen the connections among head entities, relations, and tail entities in correct triples, making their associations more cohesive, thereby more clearly distinguishing between correct and incorrect triples. To address these challenges, we propose a Dual-level Contrastive Learning model (DualCL) for multi-modal knowledge graph completion. Specifically, our model consists of two levels of contrastive learning. (1) At the entity level, we employ a multi-modal contrastive representation method to align structural, visual, and textual information of the same entity into a shared embedding space, ensuring semantic consistency across modalities for more effective multi-modal information integration; (2) At the triple level, we enhance the semantic associations among head entities, relations, and tail entities in correct triples through contrastive learning, while optimizing the model’s ability to distinguish between different “entity-relation-entity” combinations. Experimental results demonstrate that our method outperforms recent strong baseline models on multiple link prediction datasets, thereby validating its effectiveness and advantages in knowledge graph completion.
multi-modal / knowledge graph completion / representation learning / link prediction / contrastive learning
| [1] |
|
| [2] |
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1247−1250 |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Huang X, Zhang J, Li D, Li P. Knowledge graph embedding based question answering. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 2019, 105−113 |
| [7] |
|
| [8] |
Zhao X, Tang J. Modeling temporal-spatial correlations for crime prediction. In: Proceedings of 2017 ACM on Conference on Information and Knowledge Management. 2017, 497−506 |
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Zhang Y, Chen Z, Liang L, Chen H, Zhang W. Unleashing the power of imbalanced modality information for multi-modal knowledge graph completion. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 17120−17130 |
| [23] |
|
| [24] |
Xu D, Zhang Z, Lin Z, Wu X, Zhu Z, Xu T, Zhao X, Zheng Y, Chen E. Multi-perspective improvement of knowledge graph completion with large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 11956−11968 |
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
Nguyen D Q, Nguyen T D, Nguyen D Q, Phung D. A novel embedding model for knowledge base completion based on convolutional neural network. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018, 327−333 |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. 2018, 225−234 |
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
Zheng S, Wang W, Qu J, Yin H, Chen W, Zhao L. MMKGR: Multi-hop multi-modal knowledge graph reasoning. In: Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE). 2023, 96−109 |
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of 2014 conference on empirical methods in natural language processing (EMNLP). 2014, 1532−1543 |
| [47] |
|
| [48] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186 |
| [49] |
|
| [50] |
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255 |
Higher Education Press
/
| 〈 |
|
〉 |