A new fragment re-allocation strategy for NoSQL database systems
Zhikun CHEN, Shuqiang YANG, Shuang TAN, Li HE, Hong YIN, Ge ZHANG
A new fragment re-allocation strategy for NoSQL database systems
NoSQL databases are famed for the characteristics of high scalability, high availability, and high faulttolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. The method uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithm is used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.
fragment allocation / NoSQL database / hypergraph partition / clustering fragments / fragment correlation
[1] |
Bryant R, Katz R H, Lazowska E D. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Computing Community Consortium, 2008: 1−15
|
[2] |
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A H. Big data: The Next Frontier For Innovation, Competition, and Productivity. MacKinsey Global Institute. 2011
|
[3] |
Lohr S. The age of big data (2012).
|
[4] |
Noguchi Y. Following digital breadcrumbs to big data gold (2011)
|
[5] |
Noguchi Y. The search for analysts to make sense of big data (2011)
|
[6] |
Kumar K A, Deshpande A, Khuller S. Data placement and replica selection for improving co-location in distributed environments.
|
[7] |
Li X. Research of data allocation strategy in distributed database. Dissertation for the Master Degree. Dalian: Dalian University of Technology, 2009
|
[8] |
Navathe S B, Ra M. Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440−450
CrossRef
Google scholar
|
[9] |
Andrew B. Mlpart.
|
[10] |
Caldwell A E, Kahng A B, Markov I L. Design and implementation of move-based heuristics for VLSI hypergraph partitioning. Journal of Experimental Algorithmics, 2000, 5: 5
CrossRef
Google scholar
|
[11] |
Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69−79
|
[12] |
Alpert C J, Kahng A B. Recent directions in netlist partitioning: a survey, Integration, the VLSI Journal, 1995, 19(1): 1−81
|
[13] |
Karypis G, Kumar V, Multilevel k-way hypergraph partitioning. VLSI Design, 2000, 11(3): 285−300
CrossRef
Google scholar
|
[14] |
Selvakkumaran N, Karypis G. Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(3): 504−517
CrossRef
Google scholar
|
[15] |
Liu D R, Shekhar S. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 1996, 21(6): 475−496
CrossRef
Google scholar
|
[16] |
Yu P L, Lee Y R, Stam A. Multiple-criteria Decision Making: Concepts, Techniques, and Extensions. Plenum Press New York, 1985
CrossRef
Google scholar
|
[17] |
Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence. 1997, 558−567
CrossRef
Google scholar
|
[18] |
Karypis G, Han E H, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68−75
CrossRef
Google scholar
|
[19] |
Lakshman A, Malik P. Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing. 2009
CrossRef
Google scholar
|
[20] |
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35−40
CrossRef
Google scholar
|
[21] |
Sarathy R, Shetty B, Sen A. A constrained nonlinear 0−1 program for data allocation. European Journal of Operational Research, 1997, 102(3): 626−647
CrossRef
Google scholar
|
[22] |
Menon S. Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(7): 577−585
CrossRef
Google scholar
|
[23] |
Ram S, Narasimhan S. Database allocation in a distributed environment: incorporating a concurrency control mechanism and queuing costs. Management Science, 1994, 40(8): 969−983
CrossRef
Google scholar
|
[24] |
Karlapalem K, Pun N M. Query-driven data allocation algorithms for distributed database systems. In: Proceeding of the Database and Expert Systems Applications. 1997, 347−356
CrossRef
Google scholar
|
[25] |
Chaturvedi A R, Choubey A K, Roan J. Scheduling the allocation of data fragments in a distributed database environment: a machine learning approach. IEEE Transactions on Engineering Management, 1994, 41 (2): 194−207
CrossRef
Google scholar
|
[26] |
Corcoran A L, Hale J. A genetic algorithm for fragment allocation in a distributed database system. In: Proceedings of the 1994 ACM Symposium on Applied Computing. 1994, 247−250
CrossRef
Google scholar
|
[27] |
March S T, Rho S. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(2): 305−317
CrossRef
Google scholar
|
[28] |
Apers P M. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 1988, 13(3): 263−304
CrossRef
Google scholar
|
[29] |
Ulus T, Uysal M. Heuristic approach to dynamic data allocation in distributed database systems. Pakistan Journal of Information and Technology, 2003, 2(3): 231−239
CrossRef
Google scholar
|
[30] |
Chin A G. Incremental data allocation and reallocation in distributed database systems. In: Proceedings of the Data Warehousing and Web Engineering. 2002, 137−160
CrossRef
Google scholar
|
[31] |
Abdalla H I. An efficient approach for data placement in distributed systems. In: Proceedings of the 5th IEEE FTRA International Conference on Multimedia and Ubiquitous Engineering. 2011, 297−301
|
[32] |
Wang T, Lin Z, Yang B, Gao J, Huang A, Yang D, Zhang Q, Tang S, Niu J. Mba: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences, 2012, 55(9): 1935−1948
CrossRef
Google scholar
|
[33] |
Du J, Barker K, Alhajj R. Attraction-a global affinity measure for database vertical partitioning. In: Proceedings of International Conference WWW/Internet. 2003, 538−548
|
[34] |
Curino C, Jones E, Zhang Y, Madden S. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1−2): 48−57
CrossRef
Google scholar
|
[35] |
Quamar A, Kumar K A, Deshpande A. Sword: scalable workloadaware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 430−441
|
[36] |
Quamar A. Scaling Transactional Workloads on the Cloud. Technical report MD. 2013
|
/
〈 | 〉 |