An effective framework for asynchronous incremental graph processing
Xinqiao LV, Wei XIAO, Yu ZHANG, Xiaofei LIAO, Hai JIN, Qiangsheng HUA
An effective framework for asynchronous incremental graph processing
Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from suboptimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.
incremental computation / graph processing / iterative computation / asynchronous / convergence
[1] |
Baluja S, Seth R, Sivakumar D, Jing Y S, Yagnik J, Kumar S, Ravichandran D, Aly M. Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 895–904
CrossRef
Google scholar
|
[2] |
Wang P, Xu B W, Wu Y R, Zhou X Y. Link prediction in social networks: the state-of-the-art. Science China Information Sciences, 2015, 58(1): 1–38
CrossRef
Google scholar
|
[3] |
Shang X Q, Wang Y, Chen B L. Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-seq datasets. Science China Information Sciences, 2016, 59(7): 1–11
CrossRef
Google scholar
|
[4] |
Bhatotia P, Wieder A, Rodrigues R, Acar U A, Pasquin R. Incoop: mapreduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011, 1–14
CrossRef
Google scholar
|
[5] |
Zhang Y F, Gao Q X, Gao L X, Wang C R. iMapreduce: a distributed computing framework for iterative computation. In: Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing. 2011, 1112–1121
CrossRef
Google scholar
|
[6] |
Zhang Y F, Chen S M, Wang Q, Yu G. i2Mapreduce: incremental mapreduce for mining evolving big data. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 1482–1483
CrossRef
Google scholar
|
[7] |
Yin J T, Gao L X. Asynchronous distributed incremental computation on evolving graphs. In: Proceedings of the 2016 Machine Learning and Knowledge Discovery in Databases. 2016, 722–738
CrossRef
Google scholar
|
[8] |
Zhang Y F, Gao Q X, Gao L X, Wang C R. Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel & Distributed Systems, 2014, 25(8): 2091–2100
CrossRef
Google scholar
|
[9] |
Mihaylov S R, Ives Z G, Guha S. Rex: recursive, delta-based datacentric computation. Proceedings of the VLDB Endowment, 2012, 5(11): 1280–1291
CrossRef
Google scholar
|
[10] |
Popa L, Budiu M, Yu Y, Isard M. Dryadinc: reusing work in large-scale computations. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing. 2009, 1–5
|
[11] |
Cheng R, Hong J, Kyrola A, Miao Y S, Weng X T, Wu M, Yang F, Zhou L D, Zhao F, Chen E H. Kineograph: taking the pulse of a fastchanging and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems. 2012, 85–98
CrossRef
Google scholar
|
[12] |
Murray D G, Mcsherry F, Isaacs R, Isard M, Barham P. Naiad: a timely dataflow system. In: Proceedings of the 24th ACM SIGOPS Symposium on Operating Systems Principles. 2013, 439–455
CrossRef
Google scholar
|
[13] |
Gonzalez J E, Low Y C, Gu H J, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30
|
[14] |
Verma S, Leslie L M, Shin Y, Gupta I. An experimental comparison of partitioning strategies in distributed graph processing. Proceedings of the VLDB Endowment, 2017, 10(5): 493–504
CrossRef
Google scholar
|
[15] |
Karypis G, Kumar V. Multilevel graph partitioning schemes. In: Proceedings of the 1995 International Conference on Parallel Processing. 1995, 113–122
|
[16] |
Kwak H, Lee C, Park H, Moon S.What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. 2010, 591–600
CrossRef
Google scholar
|
[17] |
Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. 2010, 1–10
|
[18] |
Low Y C, Gonzalez J E, Kyrola A, Bickson D, Guestrin C E, Hellerstein J. Graphlab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 1–10
|
[19] |
Power R, Li J Y. Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. 2010, 1–14
|
[20] |
Bu Y Y, Howe B, Balazinska M, Ernst M D. Haloop: efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 2010, 3(1): 285–296
CrossRef
Google scholar
|
[21] |
Ekanayake J, Li H, Zhang B J, Gunarathne T, Bae S H, Qiu J, Fox G. Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010, 810–818
CrossRef
Google scholar
|
[22] |
Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135–146
CrossRef
Google scholar
|
[23] |
Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
CrossRef
Google scholar
|
[24] |
Tian Y Y, Balmin A, Corsten S A, Tatikonda S, McPherson J. From “think like a vertex” to “think like a graph”. Proceedings of the VLDB Endowment, 2013, 7(3): 193–204
CrossRef
Google scholar
|
[25] |
Yan D, Cheng J, Lu Y, Ng W. Effective techniques for message reduction and load balancing in distributed graph computation. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1307–1317
CrossRef
Google scholar
|
[26] |
Salihoglu S, Widom J. GPS: a graph processing system. In: Proceedings of the 2013 Conference on Scientific and Statistical Database Management. 2013, 1–12
CrossRef
Google scholar
|
[27] |
Kyrola A, Blelloch G, Guestrin C. Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46
|
[28] |
Xie W L, Wang G Z, Bindel D, Demers A, Gehrke J. Fast iterative graph computation with block updates. Proceedings of the VLDB Endowment, 2013, 6(14): 2014–2025
CrossRef
Google scholar
|
[29] |
Yuan P P, Zhang W Y, Xie C F, Jin H, Liu L, Lee K. Fast iterative graph computation: a path centric approach. In: Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 401–412
|
[30] |
Xie C N, Chen R, Guan H B, Zang B Y, Chen H B. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In: Proceedings of the 20th ACM Sigplan Symposium on Principles and Practice of Parallel Programming. 2015, 194–204
CrossRef
Google scholar
|
[31] |
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M. Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference onWeb Search and Data Mining. 2014, 333–342
CrossRef
Google scholar
|
[32] |
Nishimura J, Ugander J. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 1106–1114
CrossRef
Google scholar
|
[33] |
Abdolrashidi A, Ramaswamy L. Continual and cost-effective partitioning of dynamic graphs for optimizing big graph processing systems. In: Proceedings of the 2016 IEEE International Congress on Big Data. 2016, 18–25
CrossRef
Google scholar
|
/
〈 | 〉 |