Please wait a minute...

Frontiers of Computer Science

Front. Comput. Sci.    2018, Vol. 12 Issue (6) : 1076-1089
IncPregel: an incremental graph parallel computation model
Qiang LIU1, Xiaoshe DONG1, Heng CHEN1(), Yinfeng WANG2
1. School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
2. Shenzhen Institute of Information Technology, Shenzhen 518172, China
Download: PDF(781 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Large-scale graph computation is often required in a variety of emerging applications such as social network computation and Web services. Such graphs are typically large and frequently updated with minor changes. However, re-computing an entire graphwhen a fewvertices or edges are updated is often prohibitively expensive. To reduce the cost of such updates, this study proposes an incremental graph computation model called IncPregel, which leverages the nonafter- effect property of the first-order Markov chain and provides incremental programming abstractions to avoid redundant computation and message communication. This is accomplished by employing an efficient and fine-grained reuse mechanism. We implemented this model on Hama, a popular open source framework based on Pregel, to construct an incremental graph processing system called IncHama. IncHama automatically detects changes in input in order to recognize “changed vertices” and to exchange reusable data by means of shuffling. The evaluation results on large-scale graphs show that, compared with Hama, IncHama is 1.1–2.7 times faster and can reduce communication messages by more than 50% when the incremental edges increase in number from 0.1 to 100k.

Keywords graph computation      Pregel      cloud computing     
Corresponding Authors: Heng CHEN   
Just Accepted Date: 08 November 2016   Online First Date: 20 December 2017    Issue Date: 04 December 2018
 Cite this article:   
Qiang LIU,Xiaoshe DONG,Heng CHEN, et al. IncPregel: an incremental graph parallel computation model[J]. Front. Comput. Sci., 2018, 12(6): 1076-1089.
E-mail this article
E-mail Alert
Articles by authors
Qiang LIU
Xiaoshe DONG
Yinfeng WANG
1 Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113
2 Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 135–146
3 Low Y C, Gonzalez J, Kyrola A, Bickson D, Guestrin C E, Hellerstein J M. GraphLab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 340–349
4 Low Y C, Bickson D, Gonzalez J, Guestrin C E, Kyrola A, Hellerstein J M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the Very Large Data Base Endowment, 2012, 5(8): 716–727
5 Power R, Li J Y. Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–14
6 Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488
7 Wilson C, Sala A, Puttaswamy K P N, Zhao B Y. Beyond social graphs: user interactions in online social networks and their implications. ACM Transactions on the Web, 2012, 6(4): 17
8 Fan W F, Wang X, Wu Y H. Incremental graph pattern matching. ACM Transactions on Database Systems, 2013, 38(3): 18
9 Logothetis D, Olston C, Reed B, Webb K C, Yocum K. Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 51–62
10 Bhatotia P, Wieder A, Rodrigues R, Acar U A, Pasquin R. Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011
11 Sagharichian M, Naderi H, Haghjoo M. ExPregel: a new computational model for large-scale graph processing. Concurrency and Computation: Practice and Experience, 2015, 27(17): 4954–4969
12 Brin S, Page L. Reprint of: the anatomy of a large-scale hypertextual Web search engine. Computer Networks, 2012, 56(18): 3825–3833
13 Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with trustrank. In: Proceedings of the 30th International Conference on Very Large Data Base. 2004, 576–587
14 Bu Y Y, Howe B, Balazinska M, Ernst M D. HaLoop: efficient iterative data processing on large clusters. Proceedings of the Very Large Data Base Endowment, 2010, 3(1–2): 285–296
15 Kang U, Tsourakakis C E, Faloutsos C. Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 229–238
16 Kang U, Tsourakakis C E, Appel A P, Faloutsos C, Leskovec J. Hadi: mining radii of large graphs. ACM Transactions on Knowledge Discovery from Data, 2011, 5(2): 8
17 Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103–111
18 Prabhakaran V, Wu M, Weng X T, McSherry F, Zhou L D, Haridasan M. Managing large graphs on multi-cores with graph awareness. In: Proceedings of USENIX Annual Technical Conference. 2012, 41–52
19 Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 2014, 599–613
20 Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. 2010
21 Gonzalez J E, Low Y C, Gu H J, Bickson D, Guestrin C. PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17–30
22 Chen R, Ding X, Wang P, Chen H B, Zang B Y, Guan H B. Computation and communication efficient graph processing with distributed immutable view. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed computing. 2014, 215–226
23 Desikan P, Pathak N, Srivastava J, Kumar V. Incremental page rank computation on evolving graphs. In: Proceedings of Special Interest Tracks and Posters of the 14th International Conference onWorldWide Web. 2005, 1094–1095
24 Chien S, Dwork C, Kumar R, Simon D R, Sivakumar D. Link evolution: analysis and algorithms. Internet Mathematics, 2004, 1(3): 277–304
25 Popa L, Budiu M, Yu Y, Isard M. DryadInc: reusing work in large-scale computations. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing. 2009
26 Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–15
27 Cheng R, Hong J, Kyrola A, Miao Y S, Weng X T, Wu M, Yang F, Zhou L D, Zhao F, Chen E H. Kineograph: taking the pulse of a fastchanging and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems. 2012, 85–98
28 Lovász L. Random walks on graphs: a survey. Combinatorics, Paul Erdos is Eighty, 1993, 2(1): 1–46
29 Puterman M L. Markov Decision Processes: Discrete Dynamic Stochastic Programming. New York: John Wiley & Sons, 1994
30 Shao Y X, Cui B, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530
Related articles from Frontiers Journals
[1] Yongli CHENG, Fang WANG, Hong JIANG, Yu HUA, Dan FENG, Lingling ZHANG, Jun ZHOU. A communication-reduced and computation-balanced framework for fast graph computation[J]. Front. Comput. Sci., 2018, 12(5): 887-907.
[2] Fei TIAN, Tao QIN, Tie-Yan LIU. Computational pricing in Internet era[J]. Front. Comput. Sci., 2018, 12(1): 40-54.
[3] Xiong FU, Juzhou CHEN, Song DENG, Junchang WANG, Lin ZHANG. Layered virtual machine migration algorithm for network resource balancing in cloud computing[J]. Front. Comput. Sci., 2018, 12(1): 75-85.
[4] Najme MANSOURI. Adaptive data replication strategy in cloud computing for performance improvement[J]. Front. Comput. Sci., 2016, 10(5): 925-935.
[5] Haibao CHEN,Song WU,Hai JIN,Wenguang CHEN,Jidong ZHAI,Yingwei LUO,Xiaolin WANG. A survey of cloud resource management for complex engineering applications[J]. Front. Comput. Sci., 2016, 10(3): 447-461.
[6] Zhaoning ZHANG,Dongsheng LI,Kui WU. Large-scale virtual machines provisioning in clouds:challenges and approaches[J]. Front. Comput. Sci., 2016, 10(1): 2-18.
[7] Bing YU,Yanni HAN,Hanning YUAN,Xu ZHOU,Zhen XU. A cost-effective scheme supporting adaptive service migration in cloud data center[J]. Front. Comput. Sci., 2015, 9(6): 875-886.
[8] Xiong FU,Chen ZHOU. Virtual machine selection and placement for dynamic consolidation in Cloud computing environment[J]. Front. Comput. Sci., 2015, 9(2): 322-330.
[9] Solomon Guadie WORKU,Chunxiang XU,Jining ZHAO. Cloud data auditing with designated verifier[J]. Front. Comput. Sci., 2014, 8(3): 503-512.
[10] Heng WU, Wenbo ZHANG, Jianhua ZHANG, Jun WEI, Tao HUANG. A benefit-aware on-demand provisioning approach for multi-tier applications in cloud computing[J]. Front Comput Sci, 2013, 7(4): 459-474.
[11] Haibo MI, Huaimin WANG, Yangfan ZHOU, Michael Rung-Tsong LYU, Hua CAI, Gang YIN. An online service-oriented performance profiling tool for cloud computing systems[J]. Front Comput Sci, 2013, 7(3): 431-445.
[12] Ling LIU. Computing infrastructure for big data processing[J]. Front Comput Sci, 2013, 7(2): 165-170.
[13] Jian LIN, Li ZHA, Zhiwei XU. Consolidated cluster systems for data centers in the cloud age: a survey and analysis[J]. Front. Comput. Sci., 2013, 7(1): 1-19.
[14] Jianzhong LI , Wei ZHANG , . Cluster based parallel database management system for data intensive computing[J]. Front. Comput. Sci., 2009, 3(3): 302-314.
Full text