Most similar maximal clique query on large graphs

Yun PENG, Yitong XU, Huawei ZHAO, Zhizheng ZHOU, Huimin HAN

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (3) : 143601.

PDF(1249 KB)
PDF(1249 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (3) : 143601. DOI: 10.1007/s11704-019-7235-0
RESEARCH ARTICLE

Most similar maximal clique query on large graphs

Author information +
History +

Abstract

This paper studies the most similar maximal clique query (MSMCQ). Given a graphG and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q. MSMCQ has many real applications including advertising industry, public security, task crowdsourcing and social network, etc. MSMCQ can be studied as a special case of the general set similarity query (SSQ). However, the MCs of G has several specialties from the general sets. Based on the specialties of MCs, we propose a novel index, namely MCIndex. MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time. Specifically, we first construct an inverted index I for all the MCs of G. Since the MCs in a posting list often have a lot of overlaps, MCIndex selects some pivots to cluster the MCs with a small radius. Given a query Q, we compute the distance from the pivots to Q. The clusters of the pivots assured not answer can be pruned by our distance based pruning rule. Since it is NP-hard to construct a minimum MCIndex, we propose to construct a minimal MCIndex on I(v) with an approximation ratio 1+ ln |I(v)|. Since the MCs have properties that are inherent of graph structure, we further propose a SIndex within each cluster of a MCIndex and a structure based pruning rule. SIndex can significantly reduce the number of candidates. Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation, we also propose a binary representation of MCs to improve the efficiency of the intersection size computation. Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.

Keywords

most similar maximal clique / similarity query / graph data

Cite this article

Download citation ▾
Yun PENG, Yitong XU, Huawei ZHAO, Zhizheng ZHOU, Huimin HAN. Most similar maximal clique query on large graphs. Front. Comput. Sci., 2020, 14(3): 143601 https://doi.org/10.1007/s11704-019-7235-0

References

[1]
Hamann M, Röhrs E, Wagner D. Local community detection based on small cliques. Algorithms, 2017, 10(3): 1–22
[2]
Cui W, Xiao Y, Wang H, Wang W. Local search of communities in large graphs. In: Proceedings of the 2014 ACMSIGMOD International Conference on Management of Data. 2014, 991–1002
[3]
Uno T. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 2010, 56(1): 3–16
[4]
Wu Y, Jin R, Li J, Zhang X. Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment, 2015, 8(7): 798–809
[5]
Wang M, Wang C, Yu J X, Zhang J. Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 2015, 8(10): 998–1009
[6]
Cai H, Zheng V W, Zhu F, Chang K C C, Huang Z. From community detection to community profiling. Proceedings of the VLDB Endowment, 2017, 10(7): 817–828
[7]
Palsetia D, Patwary M M A, Hendrix W, Agrawal A, Choudhary A. Clique guided community detection. In: Proceedings of the 2014 IEEE International Conference on Big Data. 2014, 500–509
[8]
Boginski V, Butenko S, Pardalos P M. Mining market data: a network approach. Computers & Operations Research, 2006, 33(11): 3171–3184
[9]
Berry N, Ko T, Moy T, Smrcka J, Turnley J, Wu B. Emergent clique formation in terrorist recruitment. In: Proceedings of the Workshop on Agent Organizations: Theory and Practice of AAAI ’04. 2004, 1–8
[10]
Kose F, Weckwerth W, Linke T, Fiehn O. Visualizing plant metabolomic correlation network using clique-metabolite matrices. Bioinformatics, 2001, 17: 1198–1208
[11]
Cheng J, Zhu L, Ke Y, Chu S. Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1240–1248
[12]
Makino K, Uno T. New algorithms for enumerating all maximal cliques. In: Proceedings of Scandinavian Workshop on Algorithm Theory. 2004, 260–272
[13]
Östergård P R. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 2002, 120(1–3): 197–207
[14]
Liang X, Lu R, Lin X, Shen X. Security and Privacy in Mobile Social Networks. Springer-Verlag New York, 2013
[15]
Sarvari H, Abozinadah E, Mbaziira A, Mccoy D. Constructing and analyzing criminal networks. In: Proceedings of the 2014 IEEE Security and Privacy Workshops. 2014, 84–91
[16]
Schall D. Service-Oriented Crowdsourcing. Springer-Verlag New York, 2012
[17]
Bacon K, Dewan P. Mixed-initiative friend-list creation. In: Proceedings of the 12th European Conference on Computer Supported Cooperative Work. 2011, 293–312
[18]
Cui W, Xiao Y, Wang H, Lu Y, Wang W. Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 277–288
[19]
Matsunaga T, Yonemori C, Tomita E, Muramatsu M. Clique-based data mining for related genes in a biomedical database. BMC Bioinformatics, 2009, 10(1): 205
[20]
Sarawagi S, Kirpal A. Efficient set joins on similarity predicates. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 743–754
[21]
Hadjieleftheriou M, Chandel A, Koudas N, Srivastava D. Fast indexes and algorithms for set similarity selection queries. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 267–276
[22]
Hadjieleftheriou M, Srivastava D. Weighted set-based string similarity. IEEE Data Engineering Bulletin, 2010, 33(1): 25–36
[23]
Culpepper J S, Moffat A. Efficient set intersection for inverted indexing. ACM Transactions on Information System, 2010, 29(1): 1–24
[24]
Wu H, Li G, Zhou L. Ginix: generalized inverted index for keyword search. Tsinghua Science and Technology, 2013, 18(1): 77–87
[25]
Deng D, Li G, Wen H, Feng J. An efficient partition based method for exact set similarity joins. Proceedings of the VLDB Endowment, 2015, 9(4): 360–371
[26]
Yuan L, Qin L, Lin X, Chang L, Zhang W. Diversified top-k clique search. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 387–398
[27]
Wang J, Cheng J, Fu A W C. Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 122–130
[28]
Li C, Lu J, Lu Y. Efficient merging and filtering algorithms for approximate string searches. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 257–266
[29]
Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search. In: Proceedings of the 16th International WorldWideWeb Conference. 2007, 131–140
[30]
Xiao C, Wang W, Lin X, Yu J X. Efficient similarity joins for near duplicate detection. In: Proceedings of the 17th International WorldWide Web Conference. 2008, 131–140
[31]
Wang J, Li G, Feng J. Can we beat the prefix filtering?: An adaptive framework for similarity join and search. In: Proceedings of the 2012 ACM International Conference on Management of Data. 2012, 85–96
[32]
Xiao C, Wang W, Lin X, Shang H. Top-k set similarity joins. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 916–927
[33]
Deng D, Li G, Feng J. A pivotal prefix based filtering algorithm for string similarity search. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 673–684
[34]
Ao N, Zhang F, Wu D, Stones D S, Wang G, Liu X, Liu J, Lin S. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment, 2011, 4(8): 470–481
[35]
Inoue H, Ohara M, Taura K. Faster set intersection with simd instructions by reducing branch mispredictions. Proceedings of the VLDB Endowment, 2014, 8(3): 293–304
[36]
Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 495–506
[37]
Bolin Ding A C K. Fast set intersection in memory. In: Proceedings of the 37th International Conference on Very Large Databases. 2011, 255–266
[38]
Fan Z, Peng Y, Choi B, Xu J, Bhowmick S S. Towards efficient authenticated subgraph query service in outsourced graph databases. IEEE Transactions on Services Computing, 2014, 7(4): 696–713
[39]
Chvatal V. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979, 4(3): 233–235
[40]
Eppstein D, Löffler M, Strash D. Listing all maximal cliques in sparse graphs in near-optimal time. Algorithms and Computation, 2010, 403–414

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1249 KB)

Accesses

Citations

Detail

Sections
Recommended

/