Please wait a minute...

Frontiers of Computer Science

Front. Comput. Sci.    2018, Vol. 12 Issue (5) : 966-983     https://doi.org/10.1007/s11704-016-5522-6
RESEARCH ARTICLE |
Using partial evaluation in holistic subgraph search
Peng PENG1, Lei ZOU2(), Zhenqin DU2, Dongyan ZHAO2
1. Big Data Provincial Key Laboratory, Hunan University, Changsha 410082, China
2. Institute of Computer Science and Technology, Peking University, Beijing 100871, China
Download: PDF(798 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Because of its wide application, the subgraph matching problem has been studied extensively during the past decade. However, most existing solutions assume that a data graph is a vertex/edge-labeled graph (i.e., each vertex/ edge has a simple label). These solutions build structural indices by considering the vertex labels. However, some real graphs contain rich-content vertices such as user profiles in social networks and HTML pages on the World Wide Web. In this study, we consider the problem of subgraph matching using a more general scenario. We build a structural index that does not depend on any vertex content. Based on the index, we design a holistic subgraph matching algorithm that considers the query graph as a whole and finds one match at a time. In order to further improve efficiency, we propose a “partial evaluation and assembly” framework to find subgraph matches over large graphs. Last but not least, our index has light maintenance overhead. Therefore, our method can work well on dynamic graphs. Extensive experiments on real graphs show that our method outperforms the state-of-the-art algorithms.

Keywords subgraph search      holistic approach      partial evaluation and assembly     
Corresponding Authors: Lei ZOU   
Just Accepted Date: 07 December 2016   Online First Date: 27 November 2017    Issue Date: 21 September 2018
 Cite this article:   
Peng PENG,Lei ZOU,Zhenqin DU, et al. Using partial evaluation in holistic subgraph search[J]. Front. Comput. Sci., 2018, 12(5): 966-983.
 URL:  
http://journal.hep.com.cn/fcs/EN/10.1007/s11704-016-5522-6
http://journal.hep.com.cn/fcs/EN/Y2018/V12/I5/966
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Peng PENG
Lei ZOU
Zhenqin DU
Dongyan ZHAO
1 Zhang S J, Li S R, Yang J. GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 192–203
https://doi.org/10.1145/1516360.1516384
2 Watts D J, Dodds P S, Newman M E J. Identity and search in social networks. Science, 2002, 296(5571): 1302–1305
https://doi.org/10.1126/science.1070120
3 Stocker M, Seaborne A, Bernstein A, Kiefer C, Reynolds D. SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 595–604
https://doi.org/10.1145/1367497.1367578
4 Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355
https://doi.org/10.1137/S0097539702403098
5 Chan E P F, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369
https://doi.org/10.1007/s00778-005-0177-1
6 Jing N, Huang Y W, Rundensteiner E A. Hierarchical encoded path views for path query processing: an optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 1998, 10(3): 409–432
https://doi.org/10.1109/69.687976
7 Cheng J F, Yu J X. On-line exact shortest distance query processing. In: Proceedings of the 12th International Conference on Extending Database Technology. 2009, 481–492
https://doi.org/10.1145/1516360.1516417
8 Wang H X, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 75
9 Trißl S, Leser U. Fast and practical indexing and querying of very large graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 845–856
https://doi.org/10.1145/1247480.1247573
10 Chen Y J, Chen Y B. An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 893–902
https://doi.org/10.1109/ICDE.2008.4497498
11 Shasha D, Wang J T L, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGACTSIGMOD- SIGART Symposium on Principles of Database Systems. 2002, 39–52
https://doi.org/10.1145/543613.543620
12 Yan X F, Yu P S, Han J W. Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2004, 335–346
https://doi.org/10.1145/1007568.1007607
13 He H H, Singh A K. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 405–418
https://doi.org/10.1145/1376616.1376660
14 Zhang S J, Hu M, Yang J. TreePi: a novel graph indexing method. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 966–975
https://doi.org/10.1109/ICDE.2007.368955
15 Zhao P X, Yu J X, Yu P S. Graph indexing: tree+ delta>= graph. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 938–949
16 He H H, Singh A K. Closure-tree: an index structure for graph queries. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 38
17 Tian Y Y, Patel J M. TALE: a tool for approximate large graph matching. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 963–972
https://doi.org/10.1109/ICDE.2008.4497505
18 Zhao P X, Han J W. On graph query optimization in large networks. Proceedings of the VLDB Endowment, 2010, 3(1-2): 340–351
https://doi.org/10.14778/1920841.1920887
19 Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Subgraph search over massive disk resident graphs. In: Proceedings of the 23rd International Conference on Scientific and Statistical Database Management. 2011, 312–321
https://doi.org/10.1007/978-3-642-22351-8_19
20 Sakr S, Elnikety S, He Y X. G-SPARQL: a hybrid engine for querying large attributed graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 335–344
https://doi.org/10.1145/2396761.2396806
21 Sun Z, Wang H Z, Wang H X, Shao B, Li J Z. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment, 2012, 5(9): 788–799
https://doi.org/10.14778/2311906.2311907
22 Han W S, Lee J, Lee J H. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 337–348
https://doi.org/10.1145/2463676.2465300
23 Zhu K, Zhang Y, Lin X M, Zhu G P, Wang W. NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications. 2010, 140–154
https://doi.org/10.1007/978-3-642-12026-8_13
24 Zou L, Chen L, Özsu M T. DistanceJoin: pattern match query in a large graph database. Proceedings of the VLDB Endowment, 2009, 2(1): 886–897
https://doi.org/10.14778/1687627.1687727
25 Yu J X, Zeng X G, Cheng J F. Top-k graph pattern matching over large graphs. In: Proceedings of IEEE International Conference on Data Engineering. 2013, 1033–1044
26 Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2002, 310–321
https://doi.org/10.1145/564691.564727
27 Jiang H F, Wang W, Lu H J, Yu J X. Holistic twig joins on indexed XML documents. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 273–284
https://doi.org/10.1016/B978-012722442-8/50032-X
28 Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of ACM/IEEE Conference on Supercomputing. 1995, 29
https://doi.org/10.1145/224170.224229
29 Wang L, Xiao Y H, Shao B, Wang H X. How to partition a billion-node graph. In: Proceedings of the 30th International Conference on Data Engineering. 2014, 568–579
https://doi.org/10.1109/ICDE.2014.6816682
30 Tretyakov K, Armas-Cervantes A, García-Ba nuelos L, Vilo J, Dumas M. Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management. 2011, 1785–1794
31 Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194: 28–61
https://doi.org/10.1016/j.artint.2012.06.001
32 Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249
https://doi.org/10.14778/2809974.2809985
33 Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42
https://doi.org/10.1145/321921.321925
34 Cordella L P, Foggia P, Sansone C, Vento M. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(10): 1367–1372
https://doi.org/10.1109/TPAMI.2004.75
35 Peng P, Zou L, Chen L, Lin X M, Zhao D Y. Answering subgraph queries over massive disk resident graphs. World Wide Web: Internet and Web Information Systems, 2016, 19(3): 417–448
36 Lee J, Han W S, Kasperovics R, Lee J H. An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB Endowment, 2012, 6(2): 133–144
https://doi.org/10.14778/2535568.2448946
37 Ren X G, Wang J H. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment, 2015, 8(5): 617–628
https://doi.org/10.14778/2735479.2735493
38 Afrati F N, Fotakis D, Ullman J D. Enumerating subgraph instances using Map-Reduce. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 62–73
https://doi.org/10.1109/ICDE.2013.6544814
39 Fan W F, Wang X, Wu Y H, Deng D. Distributed graph simulation: impossibility and possibility. Proceedings of the VLDB Endowment, 2014, 7(12): 1083–1094
https://doi.org/10.14778/2732977.2732983
40 Fan W F, Li J Z, Ma S, Tang N, Wu Y H, Wu Y P. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1): 264–275
https://doi.org/10.14778/1920841.1920878
41 Fan W F, Wang X, Wu Y H. Diversified top-k graph pattern matching. Proceedings of the VLDB Endowment, 2013, 6(13): 1510–1521
https://doi.org/10.14778/2536258.2536263
42 Ma S, Cao Y, Huai J P, Wo T Y. Distributed graph pattern matching. In: Proceedings of the 21st World Wide Web Conference. 2012, 949–958
https://doi.org/10.1145/2187836.2187963
43 Khan A, Li N, Yan X F, Guan Z Y, Chakraborty S, Tao S. Neighborhood based fast graph search in large networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 901–912
https://doi.org/10.1145/1989323.1989418
44 Buneman P, Cong G, Fan W F, Kementsietsidis A. Using partial evaluation in distributed query evaluation. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 211–222
45 Cong G, Fan W F, Kementsietsidis A. Distributed query evaluation with performance guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2007, 509–520
https://doi.org/10.1145/1247480.1247537
46 Cong G, Fan W F, Kementsietsidis A, Li J Z, Liu X M. Partial evaluation for distributed XPath query processing and beyond. ACM Transactions on Database Systems, 2012, 37(4): 32
https://doi.org/10.1145/2389241.2389251
47 Fan W F, Wang X, Wu Y H. Performance guarantees for distributed reachability queries. Proceedings of the VLDB Endowment, 2012, 5(11): 1304–1315
https://doi.org/10.14778/2350229.2350248
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed