Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Jian LIN, Li ZHA, Zhiwei XU

PDF(718 KB)
PDF(718 KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-012-2086-y
REVIEW ARTICLE

Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Author information +
History +

Abstract

In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the main motivations, key concerns, common features, and representative solutions of such systems through a survey and analysis. A typical kind of these systems is generalized as the consolidated cluster system, whose design goal is identified as reducing the overall costs under the quality of service premise. A survey on this kind of systems is given, and the critical issues concerned by such systems are summarized as resource consolidation and runtime coordination. These two issues are analyzed and classified according to the design styles and external characteristics abstracted from the surveyed work. Five representative consolidated cluster systems from both academia and industry are illustrated and compared in detail based on the analysis and classifications. We hope this survey and analysis to be conducive to both design implementation and technology selection of this kind of systems, in response to the constantly emerging challenges on infrastructure and application management in data centers.

Keywords

data center / cloud computing / distributed resource management / consolidated cluster system / resource consolidation / runtime coordination

Cite this article

Download citation ▾
Jian LIN, Li ZHA, Zhiwei XU. Consolidated cluster systems for data centers in the cloud age: a survey and analysis. Front. Comput. Sci., https://doi.org/10.1007/s11704-012-2086-y

References

[1]
Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph A, Katz R, Shenker S, Stoica I. Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11. 2011
[2]
Murthy A C, Douglas C, Konar M, O’Malley O, Radia S, Agarwal S, V V K. Architecture of next generation apache hadoop MapReduce framework. Technical report, Apache Hadoop community, 2011
[3]
Lu X, Lin J, Zha L, Xu Z. Vega LingCloud: a resource single leasing point system to support heterogeneous application modes on shared infrastructure. In: Proceedings of the 9th International Symposium on Parallel and Distributed Processing with Applications, ISPA’11. 2011, 99-106
[4]
Chase J S, Irwin D E, Grit L E, Moore J D, Sprenkle S E. Dynamic virtual clusters in a grid site manager. In: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, HPDC’03. 2003, 90-100
[5]
Ramakrishnan L, Koelbel C, Kee Y, Wolski R, Nurmi D, Gannon D, Obertelli G, YarKhan A, Mandal A, Huang T M, Thyagaraja K, Zagorodnov D. VGrADS: enabling e-Science workflows on grids and clouds with fault tolerance. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC’09. 2009
[6]
Kim H, el-Khamra Y, Jha S, Parashar M. An autonomic approach to integrated HPC grid and cloud usage. In: Proceedings of the 5th IEEE International Conference on e-Science, e-Science’09. 2009, 366-373
[7]
Lu X, Lin J, Zha L. Architecture and key technologies of LingCloud. Journal of Computer Research and Development, 2011, 48(7): 1111-1122
[8]
Baker M, Buyya R. Cluster computing at a glance. In: Buyya R, ed. High Performance Cluster Computing: Architectures and Systems, volume 2. Prentice Hall PTR, 1999, 3-47
[9]
Beloglazov A, Buyya R, Lee Y C, Zomaya A. A taxonomy and survey of energy-efficient data centers and cloud computing systems. In: Zelkowitz M Ved. Advances in Computers, Volume 82. Elsevier B.V., 2011, 47-111
[10]
Wang L, Zhan J, Shi W, Liang Y. In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems, 2012, 23(2): 296-303
CrossRef Google scholar
[11]
Krauter K, Buyya R, Maheswaran M. A taxonomy and survey of grid resource management systems for distributed computing. Software: Practice and Experience, 2002, 32(2): 135-164
CrossRef Google scholar
[12]
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A. Xen and the art of virtualization. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, SOSP’03. 2003, 164-177
[13]
VMware virtualization software. http://www.vmware.com/
[14]
Kivity A, Kamay Y, Laor D, Lublin U, Liguori A. KVM: the Linux virtual machine monitor. In: Proceedings of the 9th Annual Ottawa Linux Symposium, OLS’07. 2007, 225-230
[15]
Mell P, Grance T. The NIST definition of cloud computing. Technical Report SP 800-145, Information Technology Laboratory, National Institute of Standards and Technology, 2011
[16]
Silberstein M, Geiger D, Schuster A, Livny M. Scheduling mixed workloads in multi-grids: the grid execution hierarchy. In: Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC’06. 2006, 291-302
[17]
Manyika J, Chui M, Brown B, Bugin J, Dobbs R, Roxburgh C, Byers A H. Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, 2011
[18]
Litzkow M, Livny M, Mutka M. Condor-a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems, ICDCS’88. 1988, 104-111
[19]
Oracle Corporation. Oracle grid engine: an overview. Technical report, 2010
[20]
Foster I, Zhao Y, Raicu I, Lu S. Cloud computing and grid computing 360-degree compared. In: Proceedings of Grid Computing Environments Workshop, GCE’08. 2008
[21]
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating Systems Design & Implementation, OSDI’04. 2004
[22]
Apache Hadoop. http://hadoop.apache.org/
[23]
Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design & Implementation, OSDI’10. 2010
[24]
Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. In: Proceedings of 2010 IEEE International Conference on Data Mining Workshops, ICDMW’10. 2010, 170-177
[25]
Gropp W, Lusk E, Skjellum A. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, 1994
[26]
MPICH2: High-performance and widely portable MPI. http://www. mcs.anl.gov/research/projects/mpich2/
[27]
Graham R L, Shipman G M, Barrett B, Castain R H, Bosilca G, Lumsdaine A. Open MPI: a high-performance, heterogeneous MPI. In: Proceedings of 2006 IEEE International Conference on Cluster Computing, Cluster’06. 2006
CrossRef Google scholar
[28]
Armbrust M, Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. Above the clouds: a berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, 2009
[29]
Wentzlaff D, Gruenwald III C, Beckmann N, Modzelewski K, Belay A, Youseff L, Miller J, Agarwal A. An operating system for multicore and clouds: mechanisms and implementation. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’10. 2010, 3-14
[30]
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys’10. 2010, 265-278
[31]
Benson T, Akella A, Maltz D A. Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th Annual Conference on Internet Measurement, IMC’10. 2010, 267-280
[32]
Boutaba R, Cheng L, Zhang Q. On cloud computational models and the heterogeneity challenge. Journal of Internet Services and Applications, 2012, 3(1): 77-86
CrossRef Google scholar
[33]
Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX conference on Operating Systems Design & Implementation, OSDI’08. 2008
[34]
Fan Z, Qiu F, Kaufman A, Yoakum-Stover S. GPU cluster for high performance computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC’04. 2004
[35]
Liu J, Chandrasekaran B, Wu J, Jiang W, Kini S, Yu W, Buntinas D, Wyckoff P, Panda D K. Performance comparison of MPI implementations over InfiniBand, myrinet and quadrics. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC’03. 2003
[36]
Greenberg A, Hamilton J, Maltz D A, Patel P. The cost of a cloud: research problems in data center networks. ACM SIGCOMM Computer Communication Review, 2008, 39(1): 68-73
CrossRef Google scholar
[37]
Abadi D J. Data management in the cloud: limitations and opportunities. IEEE Data Engineering Bulletin, 2009, 32(1): 3-12
[38]
Buyya R, Beloglazov A, Abawajy J H. Energy-efficient management of data center resources for cloud computing: a vision, architectural elements, and open challenges. In: Proceedings of the 2010 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA’10. 2010, 6-20
[39]
Ramgovind S, Eloff M M, Smith E. The management of security in cloud computing. In: Proceedings of the 9th Annual Information Security for South Africa Conference, ISSA’10. 2010
[40]
Mehta S, Neogi A. ReCon: a tool to recommend dynamic server consolidation in multi-cluster data centers. In: Proceedings of the 11th IEEE/IFIP Network Operations and Management Symposium, NOMS’08. 2008, 363-370
[41]
Zhan J, Wang L, Tu B, Li Y, Wang P, Zhou W, Meng D. Phoenix cloud: consolidating different computing loads on shared cluster system for large organization. In: Proceedings of the 1st Workshop on Cloud Computing and Its Applications, CCA’08. 2008
[42]
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 2011, 41(1): 23-50
CrossRef Google scholar
[43]
LivnyM. Condor and the cloud-the challenges and the roadmap of condor. http://www.grid.org.il/_Uploads/dbsAttachedFiles/Condor-Cloud- IGT.pdf, 2009
[44]
Linux containers. http://lxc.sourceforge.net/
[45]
Koziolek H. Performance evaluation of component-based software systems: a survey. Performance Evaluation, 2010, 67(8): 634-658
CrossRef Google scholar
[46]
Huai Y, Lee R, Zhang S, Xia C H, Zhang X. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SoCC’11. 2011, 1-14
[47]
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2-4): 323–356
CrossRef Google scholar
[48]
Youseff L, Butrico M, Da Silva D. Toward a unified ontology of cloud computing. In: Proceedings of Grid Computing Environments Workshop, GCE’08. 2008
[49]
Apache Mesos: dynamic resource sharing for clusters. http://incubator. apache.org/mesos/
[50]
Lee G, Chun B, Katz R H. Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIXWorkshop on Hot Topics in Cloud Computing, HotCloud’11. 2011
[51]
Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIXWorkshop on Hot Topics in Cloud Computing, HotCloud’10. 2010
[52]
Apache ZooKeeper. http://zookeeper.apache.org/
[53]
Murthy A C. The next generation of apache hadoop MapReduce. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreducenextgen/, 2011
[54]
Apache HBase. http://hbase.apache.org/
[55]
Seo S, Yoon E J, Kim J, Jin S, Kim J, Maeng S. HAMA: an efficient matrix computation with the MapReduce framework. In: Proceedings of the 2nd International Conference on Cloud Computing Technology and Science, CloudCom’10. 2010, 721-726
[56]
Apache giraph. http://incubator.apache.org/giraph/
[57]
Pandey J. RPC improvements and wire compatibility in apache hadoop. http://hortonworks.com/blog/rpc-improvements-and-wire-compatibility- in-apache-hadoop/, 2012
[58]
Wright D. Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor. In: Proceedings of the LCI International Conference on Linux Clusters: The HPC Revolution. 2001
[59]
Thain G. Condor integrated with hadoop’s map reduce. http:// research.cs.wisc.edu/condor/CondorWeek2010/condor-presentations/ thain-condor-hadoop.pdf, 2010
[60]
Foster I, and Kesselman C. Globus: a metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 1997, 11(2): 115-128
CrossRef Google scholar
[61]
Henderson R. Job scheduling under the portable batch system. In: Feitelson D, Rudolph L, eds. Job Scheduling Strategies for Parallel Processing. LNCS. Springer Berlin / Heidelberg, 1995, 949: 279-294
[62]
Coleman N, Raman R, Livny M, Solomon M. Distributed policy management and comprehension with classified advertisements. Technical Report UW-CS-TR-1481, Computer Sciences Department, University of Wisconsin-Madison, 2003
[63]
Couvares P, Kosar T, Roy A, Weber J, Wenger K. Workflow management in condor. In: Taylor I J, Deelman E, Gannon D B, Shields M, eds. Workflows for e-Science. Springer London, 2007, 357-375
CrossRef Google scholar
[64]
Basney J, Livny M. Deploying a high throughput computing cluster. In: Buyya R, ed. High Performance Cluster Computing: Architectures and Systems, Volume 1. Prentice Hall PTR, 1999, 116-134
[65]
Farrellee M. Condor: cloud scheduler. http://spinningmatt.files.wordpress. com/2010/04/matthewfarrelleeopensourcecloudcomputingforum- 10feb2010.pdf, 2010
[66]
Open grid scheduler: the official open source grid engine. http:// gridscheduler. sourceforge.net/
[67]
Son of grid engine. https://arc.liv.ac.uk/trac/SGE
[68]
Sun microsystems. Sun ONE grid engine, enterprise edition administration and user’s guide. Technical Report 816-4739-11, 2002
[69]
Troger P, Rajic H, Haas A, Domagalski P. Standardization of an API for distributed resource management systems. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID’07. 2007, 619-626
[70]
Gentzsch W. Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGIRD’01 2001, 35-36
[71]
Oracle Corporation. Extreme scalability using oracle grid engine software: managing extreme workloads. Technical report, 2010
[72]
Templeton D. Intro to service domain manager. http://blogs.oracle. com/templedf/entry/service_domain_manager, 2010
[73]
Sotomayor B, Montero R S, Llorente I M, Foster I. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 2009, 13(5): 14-22
CrossRef Google scholar
[74]
Mugler J, Naughton T, Scott S L. OSCAR meta-package system. In: Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications, HPCS’05. 2005, 353-360
[75]
Massie M L, Chun B N, Culler D E. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30(7): 817-840.
CrossRef Google scholar
[76]
Zha L, Li W, Yu H, Xie X, Xiao N, Xu Z. System software for China national grid. In: Proceedings of IFIP International Conference on Network and Parallel Computing, NPC’05. 2005, 14-21
[77]
Lin J, Lu X, Yu L, Zou Y, Zha L. VegaWarden: a uniform user management system for cloud applications. In: Proceedings of the 5th IEEE International Conference on Networking, Architecture and Storage, NAS’10. 2010, 457-464
[78]
Yu L, Zha L, Wang X, Zhou H, Zou Y. GOS security: design and implementation. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems, ICPADS’09. 2009, 955-960
[79]
Steinder M, Whalley I, Carrera D, Gaweda I, Chess D. Server virtualization in autonomic management of heterogeneous workloads. In: Proceedings of the 10th IFIP/IEEE International Symposium on Integrated Network Management, IM’07. 2007, 139-148
[80]
Mateescu G, Gentzsch W, Ribbens C J. Hybrid computing-where HPC meets grid and cloud computing. Future Generation Computer Systems, 2011, 27(5): 440-453
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(718 KB)

Accesses

Citations

Detail

Sections
Recommended

/