Missing data recovery for heterogeneous graphs with incremental multi-source data fusion

Yang LIU , Xiaoxia JIANG , Yuanning CUI , Yu WANG , Wei HU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912614

PDF (1406KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912614 DOI: 10.1007/s11704-025-41420-2
Information Systems
RESEARCH ARTICLE

Missing data recovery for heterogeneous graphs with incremental multi-source data fusion

Author information +
History +
PDF (1406KB)

Abstract

Heterogeneous graphs organize data with nodes and edges, and have been widely used in various graph-centric applications. Often, some data are omitted during manual construction, leading to data reduction and performance degeneration on downstream tasks. Existing methods recover the missing data based on the data already within a single graph, neglecting the fact that graphs from different sources share some common nodes due to scope overlap. In this paper, we concentrate on the missing data recovery task on multi-source heterogeneous graphs under the incremental scenario and design a novel framework to recover the missing data by fusing multi-source complementary data from previously appeared graphs. Our model, namely SIKE, is present with a pre-trained language model and graph-specific adapters. To take advantage of the complementary data of multi-source graphs, we propose an embedding-based data fusion method to gather data among graphs. To evaluate the proposed model, we build two new datasets consisting of multi-source heterogeneous graphs. The experimental results show that our model SIKE achieves significant improvements compared with competitive baseline models, demonstrating the effectiveness of our model and shedding light on multi-source data fusion for data governance.

Graphical abstract

Keywords

data governance / missing data recovery / heterogeneous graph / language model

Cite this article

Download citation ▾
Yang LIU, Xiaoxia JIANG, Yuanning CUI, Yu WANG, Wei HU. Missing data recovery for heterogeneous graphs with incremental multi-source data fusion. Front. Comput. Sci., 2025, 19(12): 1912614 DOI:10.1007/s11704-025-41420-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhang C, Song D, Huang C, Swami A, Chawla N V. Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 793−803

[2]

Chan T H, Wong C H, Shen J, Yin G. Source-aware embedding training on heterogeneous information networks. Data Intelligence, 2023, 5( 3): 604–628

[3]

Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng Q Z, Yu P S. A comprehensive survey on community detection with deep learning. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35( 4): 4682–4702

[4]

Saxena A, Tripathi A, Talukdar P. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 4498−4507

[5]

Guo Q, Zhuang F, Qin C, Zhu H, Xie X, Xiong H, He Q. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 8): 3549–3568

[6]

Galarraga L, Razniewski S, Amarilli A, Suchanek F M. Predicting completeness in knowledge bases. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 2017, 375−383

[7]

Ji S, Pan S, Cambria E, Marttinen P, Yu P S. A survey on knowledge graphs: representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 2): 494–514

[8]

Wang Q, Mao Z, Wang B, Guo L. Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 12): 2724–2743

[9]

Wu T, Khan A, Yong M, Qi G, Wang M. Efficiently embedding dynamic knowledge graphs. Knowledge-Based Systems, 2022, 250: 109124

[10]

Tanon T P, Weikum G, Suchanek F. YAGO 4: a reason-able knowledge base. In: Proceedings of the 17th International Conference on the Semantic Web. 2020, 583−596

[11]

Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6( 2): 167–195

[12]

Vrandečić D, Krötzsch M. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 2014, 57( 10): 78–85

[13]

Daruna A, Gupta M, Sridharan M, Chernova S. Continual learning of knowledge graph embeddings. IEEE Robotics and Automation Letters, 2021, 6( 2): 1128–1135

[14]

Kou X, Lin Y, Liu S, Li P, Zhou J, Zhang Y. Disentangle-based continual graph representation learning. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 2961−2972

[15]

Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186

[16]

Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu A A, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 2017, 114( 13): 3521–3526

[17]

Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2013, 2787−2795

[18]

Wang Z, Zhang J, Feng J, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1112−1119

[19]

Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2181−2187

[20]

Sun Z, Deng Z H, Nie J Y, Tang J. RotatE: knowledge graph embedding by relational rotation in complex space. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[21]

Trouillon T, Welbl J, Riedel S, Gaussier É Bouchard G. Complex embeddings for simple link prediction. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2071−2080

[22]

Balazevic I, Allen C, Hospedales T. TuckER: tensor factorization for knowledge graph completion. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 5184−5193

[23]

Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2D knowledge graph embeddings. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 1811−1818

[24]

Zhong S, Wang J, Yue K, Duan L, Sun Z, Fang Y. Few-shot relation prediction of knowledge graph via convolutional neural network with self-attention. Data Science and Engineering, 2023, 8( 4): 385–395

[25]

Li Z, Liu H, Zhang Z, Liu T, Xiong N N. Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 8): 3961–3973

[26]

Schlichtkrull M, Kipf T N, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proceedings of the 15th International Conference on the Semantic Web. 2018, 593−607

[27]

Vashishth S, Sanyal S, Nitin V, Talukdar P. Composition-based multi-relational graph convolutional networks. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[28]

Guo L, Sun Z, Hu W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2505−2514

[29]

Li Q, Wang D, Feng S, Niu C, Zhang Y. Global graph attention embedding network for relation prediction in knowledge graphs. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 11): 6712–6725

[30]

Li G, Sun Z, Hu W, Cheng G, Qu Y. Position-aware relational transformer for knowledge graph embedding. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35( 8): 11580–11594

[31]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[32]

Zhang W, Zhu Y, Chen M, Geng Y, Huang Y, Xu Y, Song W, Chen H. Structure pretraining and prompt tuning for knowledge graph transfer. In: Proceedings of the ACM Web Conference 2023. 2023, 2581−2590

[33]

Cui Y, Sun Z, Hu W. A prompt-based knowledge graph foundation model for universal in-context reasoning. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024

[34]

Wang B, Shen T, Long G, Zhou T, Wang Y, Chang Y. Structure-augmented text representation learning for efficient knowledge graph completion. In: Proceedings of the Web Conference 2021. 2021, 1737−1748

[35]

Wang L, Zhao W, Wei Z, Liu J. SimKGC: simple contrastive knowledge graph completion with pre-trained language models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 4281−4294

[36]

Liu Y, Sun Z, Li G, Hu W. I know what you do not know: knowledge graph embedding via co-distillation learning. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022, 1329−1338

[37]

Nandi A, Kaur N, Singla P, Mausam. DynaSemble: dynamic ensembling of textual and structure-based models for knowledge graph completion. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 205−216

[38]

Parisi G I, Kemker R, Part J L, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. Neural Networks, 2019, 113: 54–71

[39]

van de Ven G M, Tolias A S. Three scenarios for continual learning. 2019, arXiv preprint arXiv: 1904.07734

[40]

Lopez-Paz D, Ranzato M. Gradient episodic memory for continual learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6470−6479

[41]

Zenke F, Poole B, Ganguli S. Continual learning through synaptic intelligence. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3987−3995

[42]

Rusu A A, Rabinowitz N C, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R. Progressive neural networks. 2016, arXiv preprint arXiv: 1606.04671

[43]

Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, de Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2790−2799

[44]

Lv X, Lin Y, Cao Y, Hou L, Li J, Liu Z, Li P, Zhou J. Do pre-trained models benefit knowledge graph completion? A reliable evaluation and a reasonable approach. In: Proceedings of Findings of the Association for Computational Linguistics. 2022, 3570−3581

[45]

Wang P, Xie X, Wang X, Zhang N. Reasoning through memorization: nearest neighbor knowledge graph embeddings. In: Proceedings of the 12th National CCF Conference on Natural Language Processing and Chinese Computing. 2023, 111−122

[46]

Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I. AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2021, 487−503

[47]

Safavi T, Koutra D. CoDEx: a comprehensive knowledge graph completion benchmark. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 8328−8350

[48]

Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1247−1250

[49]

Meilicke C, Chekol M W, Betz P, Fink M, Stuckeschmidt H. Anytime bottom-up rule learning for large-scale knowledge graph completion. The VLDB Journal, 2024, 33( 1): 131–161

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1406KB)

Supplementary files

Highlights

442

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/