Index-free triangle-based graph local clustering
Zhe YUAN, Zhewei WEI, Fangrui LV, Ji-Rong WEN
Index-free triangle-based graph local clustering
Motif-based graph local clustering (MGLC) is a popular method for graph mining tasks due to its various applications. However, the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs. While some attempts have been made to address the efficiency bottleneck, there is still no applicable algorithm for large scale graphs with billions of edges. In this paper, we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering (TGLC*) to solve the MGLC problem w.r.t. a triangle. TGLC* directly estimates the Personalized PageRank (PPR) vector using random walks with the desired triangle-weighted distribution and proposes the clustering result using a standard sweep procedure. We demonstrate TGLC*’s scalability through theoretical analysis and its practical benefits through a novel visualization layout. TGLC* is the first algorithm to solve the MGLC problem without precomputing the motif weight. Extensive experiments on seven real-world large-scale datasets show that TGLC* is applicable and scalable for large graphs.
graph local clustering / triangle motif / index-free / sampling method / visualization
Zhe Yuan received the PhD degree in computer science from Renmin University of China, China. He is currently an assistant professor with Beijing Foreign Studies University, China. His research interests include graph algorithms, knowledge graph, and interpretable machine learning
Zhewei Wei received the PhD degree in computer science and engineering from The Hong Kong University of Science and Technology, China. He is currently a professor with Renmin University of China, China. His research interests include algorithms for massive data, streaming algorithms, and graph algorithms
Fangrui Lv received the BS and MS degrees from Renmin University of China, China. He is currently an employee at the China Development Bank, China. His research interests include graph algorithm, and data mining
Ji-Rong Wen is a professor, the dean of School of Information and the executive dean of Gaoling School of Artificial Intelligence at Renmin University of China, China. His main research interests include information retrieval, data mining and machine learning. He once was a senior researcher and group manager of the Web Search and Mining Group at Microsoft Research Asia (MSRA)
[1] |
Spielman D A, Teng S H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 81−90
|
[2] |
Andersen R, Chung F, Lang K. Local graph partitioning using pagerank vectors. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. 2006, 475−486
|
[3] |
Liao C S, Lu K, Baym M, Singh R, Berger B . IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics, 2009, 25( 12): i253–i258
|
[4] |
Li S, Gentile C, Karatzoglou A. Graph clustering bandits for recommendation. 2016, arXiv preprint arXiv: 1605.00596
|
[5] |
Feng Y, Yu S, Zhang K, Li X, Ning Z . COMICS: a community property-based triangle motif clustering scheme. PeerJ Computer Science, 2019, 5: e180
|
[6] |
Shi J, Malik J . Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22( 8): 888–905
|
[7] |
Newman M E J . Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103( 23): 8577–8582
|
[8] |
Vinh N X, Epps J, Bailey J . Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 2010, 11: 2837–2854
|
[9] |
Kobourov S G, Pupyrev S, Simonetto P. Visualizing graphs as maps with contiguous regions. In: Proceedings of the 16th Eurographics Conference on Visualization. 2014, 31−35
|
[10] |
Yang J, Leskovec J . Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015, 42( 1): 181–213
|
[11] |
Cheeger J. A lower bound for the smallest eigenvalue of the Laplacian. In: Gunning R C, ed. Problems in Analysis. Princeton: Princeton University Press, 1971, 195−200
|
[12] |
Cox I J, Rao S B, Zhong Y. “Ratio regions”: a technique for image segmentation. In: Proceedings of the 13th International Conference on Pattern Recognition. 1996, 557−564
|
[13] |
Benson A R, Gleich D F, Leskovec J . Higher-order organization of complex networks. Science, 2016, 353( 6295): 163–166
|
[14] |
Tsourakakis C E, Pachocki J, Mitzenmacher M. Scalable motif-aware graph clustering. In: Proceedings of the 26th International Conference on World Wide Web. 2017, 1451−1460
|
[15] |
Watts D J, Strogatz S H . Collective dynamics of ‘small-world’ networks. Nature, 1998, 393( 6684): 440–442
|
[16] |
Krämer N C, Sauer V, Ellison N. The strength of weak ties revisited: further evidence of the role of strong ties in the provision of online social support. Social Media + Society, 2021, 7(2): 20563051211024958
|
[17] |
Yao S. Application of data mining technology in financial fraud identification. In: Proceedings of the 4th International Conference on Information Systems and Computer Aided Education. 2021, 2919−2922
|
[18] |
Zhou S, Yang X, Chang Q. Spatial clustering analysis of green economy based on knowledge graph. Journal of Intelligent & Fuzzy Systems, 2021,
|
[19] |
Foysal K H, Chang H J, Bruess F, Chong J W . SmartFit: smartphone application for garment fit detection. Electronics, 2021, 10( 1): 97
|
[20] |
Zhu D, Shen G, Chen J, Zhou W, Kong X . A higher-order motif-based spatiotemporal graph imputation approach for transportation networks. Wireless Communications and Mobile Computing, 2022, 2022: 1702170
CrossRef
Google scholar
|
[21] |
Yin H, Benson A R, Leskovec J, Gleich D F. Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017, 555−564
|
[22] |
Leskovec J, Sosič R . SNAP: a general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology, 2017, 8( 1): 1
|
[23] |
Ma W, Cai L, He T, Chen L, Cao Z, Li R . Local expansion and optimization for higher-order graph clustering. IEEE Internet of Things Journal, 2019, 6( 5): 8702–8713
|
[24] |
Huang S, Li Y, Bao Z, Li Z. Towards efficient motif-based graph partitioning: an adaptive sampling approach. In: Proceedings of the 37th International Conference on Data Engineering. 2021, 528−539
|
[25] |
Miller R B. Response time in man-computer conversational transactions. In: Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I. 1968, 267−277
|
[26] |
Liu Z, Heer J . The effects of interactive latency on exploratory visual analysis. IEEE Transactions on visualization and Computer Graphics, 2014, 20( 12): 2122–2131
|
[27] |
Andersen R, Chung F. Detecting sharp drops in PageRank and a simplified local partitioning algorithm. In: Proceedings of the 4th International Conference on Theory and Applications of Models of Computation. 2007, 1−12
|
[28] |
Chung F . The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104( 50): 19735–19740
|
[29] |
Kloster K, Gleich D F. Heat kernel based community detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 1386−1395
|
[30] |
Li P, Chien I, Milenkovic O. Optimizing generalized pagerank methods for seed-expansion community detection. Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1050
|
[31] |
Wang H, He M, Wei Z, Wang S, Yuan Y, Du X, Wen J R. Approximate graph propagation. In: Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2021, 1686−1696
|
[32] |
Zhou D, Zhang S, Yildirim M Y, Alcorn S, Tong H, Davulcu H, He J . High-order structure exploration on massive graphs: a local graph clustering perspective. ACM Transactions on Knowledge Discovery from Data, 2021, 15( 2): 1–26
|
[33] |
Casella G, Robert C P, Wells M T. Generalized accept-reject sampling schemes. In: DasGupta A, ed. A Festschrift for Herman Rubin. Institute of Mathematical Statistics, 2004, 342−348
|
[34] |
Paramonov K, Shemetov D, Sharpnack J. Estimating graphlet statistics via lifting. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 587−595
|
[35] |
Guo W, Li Y, Sha M, He B, Xiao X, Tan K L. GPU-accelerated subgraph enumeration on partitioned graphs. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 1067−1082
|
/
〈 | 〉 |