A new fragment re-allocation strategy for NoSQL database systems

Zhikun CHEN; Shuqiang YANG; Shuang TAN; Li HE; Hong YIN; Ge ZHANG

doi:10.1007/s11704-014-3480-4

PDF(736 KB)

Front. Comput. Sci. ›› 2015, Vol. 9 ›› Issue (1) : 111-127. DOI: 10.1007/s11704-014-3480-4

RESEARCH ARTICLE

A new fragment re-allocation strategy for NoSQL database systems

Author information +

History +

Abstract

NoSQL databases are famed for the characteristics of high scalability, high availability, and high faulttolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. The method uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithm is used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.

Keywords

fragment allocation / NoSQL database / hypergraph partition / clustering fragments / fragment correlation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zhikun CHEN, Shuqiang YANG, Shuang TAN, Li HE, Hong YIN, Ge ZHANG. A new fragment re-allocation strategy for NoSQL database systems. Front. Comput. Sci., 2015, 9(1): 111‒127 https://doi.org/10.1007/s11704-014-3480-4

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Bryant R, Katz R H, Lazowska E D. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Computing Community Consortium, 2008: 1−15

[2]	Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A H. Big data: The Next Frontier For Innovation, Competition, and Productivity. MacKinsey Global Institute. 2011

[3]	Lohr S. The age of big data (2012). http://www.nytimes.com/

[4]	Noguchi Y. Following digital breadcrumbs to big data gold (2011)

[5]	Noguchi Y. The search for analysts to make sense of big data (2011)

[6]	Kumar K A, Deshpande A, Khuller S. Data placement and replica selection for improving co-location in distributed environments. http://arxiv.org/abs/1302.4168

[7]	Li X. Research of data allocation strategy in distributed database. Dissertation for the Master Degree. Dalian: Dalian University of Technology, 2009

[8]	Navathe S B, Ra M. Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440−450 CrossRef Google scholar

[9]	Andrew B. Mlpart. http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/Partitioning/MLPart/.

[10]	Caldwell A E, Kahng A B, Markov I L. Design and implementation of move-based heuristics for VLSI hypergraph partitioning. Journal of Experimental Algorithmics, 2000, 5: 5 CrossRef Google scholar

[11]	Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69−79

[12]	Alpert C J, Kahng A B. Recent directions in netlist partitioning: a survey, Integration, the VLSI Journal, 1995, 19(1): 1−81

[13]	Karypis G, Kumar V, Multilevel k-way hypergraph partitioning. VLSI Design, 2000, 11(3): 285−300 CrossRef Google scholar

[14]	Selvakkumaran N, Karypis G. Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(3): 504−517 CrossRef Google scholar

[15]	Liu D R, Shekhar S. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 1996, 21(6): 475−496 CrossRef Google scholar

[16]	Yu P L, Lee Y R, Stam A. Multiple-criteria Decision Making: Concepts, Techniques, and Extensions. Plenum Press New York, 1985 CrossRef Google scholar

[17]	Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence. 1997, 558−567 CrossRef Google scholar

[18]	Karypis G, Han E H, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68−75 CrossRef Google scholar

[19]	Lakshman A, Malik P. Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing. 2009 CrossRef Google scholar

[20]	Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35−40 CrossRef Google scholar

[21]	Sarathy R, Shetty B, Sen A. A constrained nonlinear 0−1 program for data allocation. European Journal of Operational Research, 1997, 102(3): 626−647 CrossRef Google scholar

[22]	Menon S. Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(7): 577−585 CrossRef Google scholar

[23]	Ram S, Narasimhan S. Database allocation in a distributed environment: incorporating a concurrency control mechanism and queuing costs. Management Science, 1994, 40(8): 969−983 CrossRef Google scholar

[24]	Karlapalem K, Pun N M. Query-driven data allocation algorithms for distributed database systems. In: Proceeding of the Database and Expert Systems Applications. 1997, 347−356 CrossRef Google scholar

[25]	Chaturvedi A R, Choubey A K, Roan J. Scheduling the allocation of data fragments in a distributed database environment: a machine learning approach. IEEE Transactions on Engineering Management, 1994, 41 (2): 194−207 CrossRef Google scholar

[26]	Corcoran A L, Hale J. A genetic algorithm for fragment allocation in a distributed database system. In: Proceedings of the 1994 ACM Symposium on Applied Computing. 1994, 247−250 CrossRef Google scholar

[27]	March S T, Rho S. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(2): 305−317 CrossRef Google scholar

[28]	Apers P M. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 1988, 13(3): 263−304 CrossRef Google scholar

[29]	Ulus T, Uysal M. Heuristic approach to dynamic data allocation in distributed database systems. Pakistan Journal of Information and Technology, 2003, 2(3): 231−239 CrossRef Google scholar

[30]	Chin A G. Incremental data allocation and reallocation in distributed database systems. In: Proceedings of the Data Warehousing and Web Engineering. 2002, 137−160 CrossRef Google scholar

[31]	Abdalla H I. An efficient approach for data placement in distributed systems. In: Proceedings of the 5th IEEE FTRA International Conference on Multimedia and Ubiquitous Engineering. 2011, 297−301

[32]	Wang T, Lin Z, Yang B, Gao J, Huang A, Yang D, Zhang Q, Tang S, Niu J. Mba: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences, 2012, 55(9): 1935−1948 CrossRef Google scholar

[33]	Du J, Barker K, Alhajj R. Attraction-a global affinity measure for database vertical partitioning. In: Proceedings of International Conference WWW/Internet. 2003, 538−548

[34]	Curino C, Jones E, Zhang Y, Madden S. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1−2): 48−57 CrossRef Google scholar

[35]	Quamar A, Kumar K A, Deshpande A. Sword: scalable workloadaware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 430−441