Missing data recovery for heterogeneous graphs with incremental multi-source data fusion
Yang LIU , Xiaoxia JIANG , Yuanning CUI , Yu WANG , Wei HU
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912614
Missing data recovery for heterogeneous graphs with incremental multi-source data fusion
Heterogeneous graphs organize data with nodes and edges, and have been widely used in various graph-centric applications. Often, some data are omitted during manual construction, leading to data reduction and performance degeneration on downstream tasks. Existing methods recover the missing data based on the data already within a single graph, neglecting the fact that graphs from different sources share some common nodes due to scope overlap. In this paper, we concentrate on the missing data recovery task on multi-source heterogeneous graphs under the incremental scenario and design a novel framework to recover the missing data by fusing multi-source complementary data from previously appeared graphs. Our model, namely SIKE, is present with a pre-trained language model and graph-specific adapters. To take advantage of the complementary data of multi-source graphs, we propose an embedding-based data fusion method to gather data among graphs. To evaluate the proposed model, we build two new datasets consisting of multi-source heterogeneous graphs. The experimental results show that our model SIKE achieves significant improvements compared with competitive baseline models, demonstrating the effectiveness of our model and shedding light on multi-source data fusion for data governance.
data governance / missing data recovery / heterogeneous graph / language model
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
Kou X, Lin Y, Liu S, Li P, Zhou J, Zhang Y. Disentangle-based continual graph representation learning. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 2961−2972 |
| [15] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186 |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Balazevic I, Allen C, Hospedales T. TuckER: tensor factorization for knowledge graph completion. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 5184−5193 |
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
Safavi T, Koutra D. CoDEx: a comprehensive knowledge graph completion benchmark. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 8328−8350 |
| [48] |
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1247−1250 |
| [49] |
|
Higher Education Press
/
| 〈 |
|
〉 |