A communication-reduced and computation-balanced framework for fast graph computation

Yongli CHENG, Fang WANG, Hong JIANG, Yu HUA, Dan FENG, Lingling ZHANG, Jun ZHOU

PDF(1064 KB)
PDF(1064 KB)
Front. Comput. Sci. ›› 2018, Vol. 12 ›› Issue (5) : 887-907. DOI: 10.1007/s11704-018-6400-1
RESEARCH ARTICLE

A communication-reduced and computation-balanced framework for fast graph computation

Author information +
History +

Abstract

The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graphprocessing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graphprocessing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a highbandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.

Keywords

graph computation / communication decrease / computation balance

Cite this article

Download citation ▾
Yongli CHENG, Fang WANG, Hong JIANG, Yu HUA, Dan FENG, Lingling ZHANG, Jun ZHOU. A communication-reduced and computation-balanced framework for fast graph computation. Front. Comput. Sci., 2018, 12(5): 887‒907 https://doi.org/10.1007/s11704-018-6400-1

References

[1]
Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103–111
CrossRef Google scholar
[2]
Malewicz G, Austern M H, Bik A J C. Pregel: a system for large-scale graph processing. In: Proceedings of ACM International Conference on Management of Data. 2010, 135–146
CrossRef Google scholar
[3]
Salihoglu S, Widom J. GPS: a graph processing system. In: Proceedings of ACM International Conference on Scientific and Statistical Database Management. 2013, 22–32
CrossRef Google scholar
[4]
Ching A, Edunov S, Kabiljo K, Logothetis D, Muthukrishnan S. One trillion edges: graph processing at facebook-scale. Proceedings of the VIDB Endowment, 2015, 8(12): 1804–1815
CrossRef Google scholar
[5]
Wang G, Xie W, Demers A J, Gehrke J. Asynchronous large-scale graph processing made easy. in: Proceedings of International Conference on Innovation Database Research. 2013, 135–146
[6]
Simmhan Y, Kumbhare A, Wickramaarachchi C, Nagarkar S, Ravi S, Raghavendra C, Prasanna V. Goffish: a sub-graph centric framework for large-scale graph analytics. In: Proceedings of European Conference on Parallel Processing. 2014, 451–462
CrossRef Google scholar
[7]
Kyrola A, Blelloch G E, Guestrin G. Graphchi: large-scale graph computation on just a PC. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation. 2012, 31–46
[8]
Cheng Y L, Wang F, Jiang H, Hua Y, Feng D, Wang X N. DD-graph: a highly cost-effective distributed disk-based graph-processing framework. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 2016, 259–262
CrossRef Google scholar
[9]
Cheng Y L, Wang F, Jiang H, Hua Y, Feng D, Wang X N. LCC-graph: a high-performance graph-processing framework with low communication costs. In: Proceedings of IEEE/ACM International Symposium on Quality of Service. 2016, 91–100
[10]
Cheng Y L, Jiang H, Wang F, Hua Y, Feng D. BlitzG: exploiting highbandwidth networks for fast graph processing. In: Proceedings of IEEE International Conference on Computer Communications. 2017, 2340–2348
[11]
Page L. The pagerank citation ranking : bringing order to the web. Stanford Digital Libraries Working Paper, 1998, 9(1): 1–14
[12]
Stefano L D, Bulgarelli A. A simple and efficient connected components labeling algorithm. In: Proceedings of International Conference on Image Analysis and Processing. 1999, 322–327
CrossRef Google scholar
[13]
Fortunato S. Community detection in graphs. Physics Reports, 2010, 486(3): 75–174
CrossRef Google scholar
[14]
Kothari R, Jain V. Learning from labeled and unlabeled data. In: Proceedings of IEEE International Joint Conference on Neural Networks. 2002, 2803–2808
CrossRef Google scholar
[15]
Lee M, Kim E J, Yousif M. Security enhancement in infiniband architecture. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium. 2005, 105–114
[16]
Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. Journal of Scientific Computing, 1998, 20(1): 359–392
CrossRef Google scholar
[17]
Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs. Journal of Bell System Technical, 1970, 49(2): 291–307
CrossRef Google scholar
[18]
Bui T N, Moon B R. Genetic algorithm and graph partitioning. IEEE Transactions on Computers, 1996, 45(7): 841–855
CrossRef Google scholar
[19]
Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
CrossRef Google scholar
[20]
Nishimura J, Ugander J. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. 2013, 1106–1114
CrossRef Google scholar
[21]
Low Y, Bickson D, Gonzalez J E, Guestrin C, Kyrola A. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727
CrossRef Google scholar
[22]
Gonzalez J E, Low Y, Haijie G, Danny B, Carlos G. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation. 2012, 17–30
[23]
Backstrom L, Huttenlocher D, Kleinberg J, Lan X. Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining. 2006, 44–54
CrossRef Google scholar
[24]
Yan D, Cheng J, Xing K, Lu Y, Ng W, Bu Y G. Pregel algorithms for graph connectivity problems with performance guarantees. Proceedings of the VLDB Endowment, 2014, 7(14): 1821–1832
CrossRef Google scholar
[25]
Han M, Daudjee K. Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. Proceedings of the VLDB Endowment, 2015, 8(9): 950–961
CrossRef Google scholar
[26]
Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of ACM Conference on Management of Data. 2012: 517–528
CrossRef Google scholar
[27]
Xin R S, Crankshaw D, Dave A, Gonzalez J E, Franklin M J, Stoica I. Graphx: unifying data-parallel and graph-parallel analytics, Computer Science, 2014, 8(3): 125–137
[28]
Da Y, Yingyi B, Yuanyuan T, Deshpande A. Big graph analytics platforms. Foundations and Trends in Databases, 2017, 7(2): 180–195
[29]
Lu Y, Cheng J, Yan D, Wu H. Large-scale distributed graph computing systems: an experimental evaluation. Proceedings of the VLDB Endowment, 2014, 8(3): 281–292
CrossRef Google scholar
[30]
Zhou C, Gao J, Sun B, Yu J X. Mocgraph: scalable distributed graph processing using message online computing. Proceedings of the VLDB Endowment, 2014, 8(4): 377–388
CrossRef Google scholar
[31]
Yan D, Cheng J, Lu Y, Ng Y. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992
CrossRef Google scholar
[32]
Yan D, Cheng J, Ozsu M T, Lu Y. A general-purpose query-centric framework for querying big graphs. Proceedings of the VLDB Endowment, 2016, 9(7): 564–575
CrossRef Google scholar
[33]
Devine K D, Boman E G, Heaphy R T. New challenges in dynamic load balancing. Applied Numerical Mathematics, 2005, 52(2): 133–152
CrossRef Google scholar
[34]
Cheng J, Liu Q, Li Z, Fan W, Lui J C S. VENUS: vertex-centric streamlined graph computation on a single PC. In: Proceedings of IEEE International Conference on Data Engineering. 2015, 1131–1142
CrossRef Google scholar
[35]
Malicevic J, Roy A, Zwaenepoel W. Scale-up graph processing in the cloud:challenges and solutions. In: Proceedings of International Workshop on Cloud Data and Platforms. 2014, 1–6
CrossRef Google scholar
[36]
Pearce R, Gokhale M, Amato N M. Scaling techniques for massive scale-free graphs in distributed (external) memory. In: Proceedings of International Symposium on Parallel and Distributed Processing. 2013, 825–836
CrossRef Google scholar
[37]
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: scale-out graph processing from secondary storage. In: Proceedings of Symposium on Operating Systems Principles. 2015, 410–424
CrossRef Google scholar
[38]
Zhu X, Han W, Chen W. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of Usenix Conference on Usenix Technical Conference. 2015, 375–386
[39]
Xu N, Chen L, Cui B. LogGP: a log-based dynamic graph partitioning method. Proceedings of the VLDB Endowment, 2014, 7(14): 1917–1928
CrossRef Google scholar
[40]
Ming W, Fan Y, Jilong X, Xiao W, Miao Y. GRAM: scaling graph computation to the trillions. In: Proceedings of ACM Symposium on Cloud Computing. 2015, 408–421

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1064 KB)

Accesses

Citations

Detail

Sections
Recommended

/