A subgraph matching algorithm based on subgraph index for knowledge graph
Yunhao SUN, Guanyu LI, Jingjing DU, Bo NING, Heng CHEN
A subgraph matching algorithm based on subgraph index for knowledge graph
The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of applications including question answering and semantic search. In this paper, we study the problem of subgraph matching on knowledge graph. Specifically, given a query graph and a data graph , the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of on . Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph. To accelerate subgraph matching on knowledge graph, we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph, called as -Match. The subgraph matching algorithm consists of two key designs. One design is a subgraph index of matching-driven flow graph ( ), which reduces redundant calculations in advance. Another design is a multi-label weight matrix, which evaluates a near-optimal matching tree for minimizing the intermediate candidates. With the aid of these two key designs, all subgraph isomorphic mappings are quickly conducted only by traversing . Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.
knowledge graph / subgraph matching / subgraph index / matching tree
[1] |
Hu S , Zou L , Yu J X , Wang H , Zhao D . Answering natural language questions by subgraph matching over knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 5): 824– 837
|
[2] |
Xu Q, Wang X, Li J, Gan Y, Chai L, Wang J. StarMR: an efficient star-decomposition based query processor for SPARQL basic graph patterns using MapReduce. In: proceedings of Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. 2018, 415-430
|
[3] |
Cai T , Li J , Mian A S , Sellis T , Yu J X . Target-aware holistic influence maximization in spatial social networks. IEEE Transactions on Knowledge and Data Engineering, 2020,
|
[4] |
Shekhar S, Xiong H, Zhou X. Encyclopedia of GIS: Resource Description Framework(RDF). 1st ed. Cham: Springer International Publishing, 2017
|
[5] |
Garey M R, Johnson D S. Computers and Intractability: A Guide to the Theory of NP-Completeness. 1st ed. New York: W. H. Freeman, 1979
|
[6] |
Kim J , Shin H , Han W H , Hong S , Chafi H . Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8( 11): 1238– 1249
|
[7] |
Ingalalli V, Ienco D, Poncelet P, Villata S. Querying RDF data using a multigraph-based approach. In: Proceedings of the 19th International Conference on Extending Database Technology. 2016, 245-256
|
[8] |
Ma H , Langouri M A , Wu Y , Chiang F , Pi J . Ontology-based entity matching in attributed graphs. Proceedings of the VLDB Endowment, 2019, 12( 10): 1195– 1207
|
[9] |
Cordella L P , Foggia P , Sansone C , Vento M . A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 26( 10): 1367– 1372
|
[10] |
He H, Singh A K. Graphs-at-a-time: query language and accessmethods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 405-418
|
[11] |
Zhao P , Han J . On graph query optimization in large networks. Proceedings of the VLDB Endowment, 2010, 3( 1): 340– 351
|
[12] |
Han W, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 337–348
|
[13] |
Bi F, Chang L, Lin X, Qin L, Zhang W. Efficient subgraph matching by postponing cartesian products. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2016, 1199−1214
|
[14] |
Shang H , Zhang Y , Lin X , Yu J X . Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the VLDB Endowment, 2008, 1( 1): 364– 375
|
[15] |
Kim K, Seo I, Han W S, Hong S, Chafi H, Shin H, Jeong G. Turboflux: A fast continuous subgraph matching system for streaming graph data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2018, 411-426
|
[16] |
Ullmann J R . An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23( 1): 31– 42
|
[17] |
Jin X, Lai L. MPMatch: A Multi-core Parallel Subgraph Matching Algorithm. In: Proceedings of IEEE 35th International Conference on Data Engineering Workshops. 2019, 241−248
|
[18] |
Bhattarai B, Liu H, Huang H. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2019, 1447−1462
|
[19] |
Peng P , Zou L , Du Z , Zhao D . Using partial evaluation in holistic subgraph search. Frontiers of Computer Science, 2017, 12( 5): 966– 983
|
[20] |
Ma Y , Yuan Y , Liu M , Wang G , Wang Y . Graph simulation on large scale temporal graphs. GeoInformatica, 2020, 24( 1): 199– 220
|
[21] |
Lin P , Song Q , Wu Y . Fact checking in knowledge graphs with ontological subgraph patterns. Data Science and Engineering, 2018, 3
|
[22] |
Xu Y, Tong Y, Shi Y, Tao Q, Xu Ke, Li W. An Efficient Insertion Operator in Dynamic RideSharing Services. In: Proceedings of IEEE 35th International Conference on Data Engineering. 2019, 1022−1033
|
[23] |
Zou L , Özsu M T , Chen L , Shen X , Huang R , Zhao D . gStore: a graph-based SPARQL query engine. The VLDB Journal, 2014, 23( 4): 565– 590
|
[24] |
Zeng L , Zou L . Redesign of the gStore system. Frontiers of Computer science, 2018, 12( 4): 1– 19
|
[25] |
Wang X , Chai Le , Xu Q , Yang Y , Li J , Wang J , Chai Y . Efficient subgraph matching on large RDF graphs using MapReduce. Data Science and Engineering, 2019, 4
|
[26] |
Xu Q , Wang X , Li J , Zhang Q , Chai L . Distributed subgraph matching on big knowledge graphs using pregel. IEEE Access, 2019, 7
|
[27] |
Malewicz G, Austern M H, Bik, A J C, Dehnert J C. Pregel: A system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 135−146
|
[28] |
Li J , Cai T , Deng K , Wang X , Sellis T , Xia F . Community-diversified influence maximization in social networks. Information Systems, 2020, 92
|
[29] |
Ma Y , Yuan Y , Wang G , Bi X , Wang Z , Wang Y . Rising star evaluation based on extreme learning machine in geo-social networks. Cognitive Computation, 2020, 12( 1): 296– 308
|
[30] |
Wang Y, Tong Y, Long C, Xu P, Xu K, Lv W. Adaptive dynamic bipartite graph matching: a reinforcement learning approach. In: Proceedings of IEEE 35th International Conference on Data Engineering, 2019, 1478−1489
|
[31] |
Zheng W , Zou L , Peng W , Yan X , Song S , Zhao D . Semantic SPARQL similarity search over RDF knowledge graphs. Proceedings of the VLDB Endowment, 2016, 9( 11): 840– 851
|
/
〈 | 〉 |