Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

Najme MANSOURI

PDF(1652 KB)
PDF(1652 KB)
Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (3) : 391-408. DOI: 10.1007/s11704-014-3146-2
RESEARCH ARTICLE

Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

Author information +
History +

Abstract

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to createmultiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.

Keywords

data replication / data grid / optorSim / job scheduling / simulation

Cite this article

Download citation ▾
Najme MANSOURI. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments. Front. Comput. Sci., 2014, 8(3): 391‒408 https://doi.org/10.1007/s11704-014-3146-2

References

[1]
Foster I, Kesselman C. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004
[2]
Foster I, Kesselman C, Tuecke S. The anatomy of the grid: enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 2001, 15: 200−222
CrossRef Google scholar
[3]
Balasangameshwara J, Raju N. A hybrid policy for fault tolerant load balancing in grid computing environments. Journal of Network and Computer Applications, 2012, 35: 412−422
CrossRef Google scholar
[4]
Li K, Tong Z, Liu D, Azghi T T, Liao X. A PTS-PGATS based approach for data-intensive scheduling in data grids. Frontiers of Computer Science in China, 2011, 5(4): 513−525
CrossRef Google scholar
[5]
Jianjin J, Guangwen Y. An optimal replication strategy for data grid systems. Frontiers of Computer Science in China, 2007, 1(3): 338−348
CrossRef Google scholar
[6]
Amjad T, Sher M, Daud A. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 2012, 28: 337−349
CrossRef Google scholar
[7]
Bsoul M, Khasawneh A, Abdallah E, Kilani Y. Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 2011, 34: 575−580
CrossRef Google scholar
[8]
Muthuvelua N, Vecchiola C, Chai I, Chikkannan E, Buyya R. Task granularity policies for deploying bag-of-task applications on global grids. Future Generation Computer Systems, 2013, 29: 170−181
CrossRef Google scholar
[9]
Mansouri N, Dastghaibyfard G H. Job scheduling and dynamic data replication in data grid environment. Journal of Supercomputing, 2013, 64: 204−225
CrossRef Google scholar
[10]
Zhang J, Lee B S, Tang X, Yeo C K. A model to predict the optimal performance of the hierarchical data grid. Future Generation Computer Systems, 2010, 26: 1−11
CrossRef Google scholar
[11]
Kolodziej J, Khan A U, Xhafa F. Genetic algorithms for energy-aware scheduling in computational grids. In: Proceedings of the 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC). 2011, 17−24
[12]
BIRN.
[13]
LHC accelerator project. http://www-td.fnal.gov/LHC/USLHC.html
[14]
Cameron D, Casey J, Guy L, Kunszt P, Lemaitre S, McCance G, Stockinger H, Stockinger K, Andronico G, Bell W, Ben-Akiva I, Bosio D, Chytracek R, Domenici A, Donno F, Hoschek W, Laure E, Lucio L, Millar P, Salconi L, Segal B, Silander M. Replica management in the European Data Grid Project. Journal of Grid Computer, 2004, 2(4): 341−351
CrossRef Google scholar
[15]
EU Data Grid project.
[16]
IVOA. http://www.ivoa.net/pub/info/
[17]
PPDG. http://www.ppdg.net
[18]
GriPhyN: the Grid physics network project.
[19]
CERN. Compact Muon Solenoid (CMS). http://public.web.cern.ch/public/en/lhc/CMS-en.htmlS; 2011
[20]
Holtman K. CMS Data Grid System over view and requirements. The Compact Muon Solenoid (CMS) Experiment Note 2001/037. 2001
[21]
Holtman K. a model of a virtual data grid application. Lecture Notes in Computer Science, 2001, 2110: 711−720
CrossRef Google scholar
[22]
McClatchey R, Anjum A, Stockinger H, Ali A, Willers I, Thomas M. Data Intensive and Network Aware (DIANA) grid scheduling. Journal of Grid Computing, 2007, 5: 43−64
CrossRef Google scholar
[23]
Dang N N, Lim S B. Combination of replication and scheduling in data grid. International Journal of Computer Science and Network Security, 2007, 7(3): 304−308
[24]
Liu C, Baskiyar S. A scalable grid scheduler for real-time applications. International Journal of Computers and Their Applications, 2009, 16(1): 34−42
[25]
Chang R S, Chen P H. Complete and fragmented replica selection and retrieval in data grids. Future Generation Computer Systems, 2007, 23: 536−546
CrossRef Google scholar
[26]
Mansouri N, Dastghaibyfard G H, Mansouri E. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711−722
CrossRef Google scholar
[27]
Song H J, Liu J, Jakobsen D, Zhang X, Taura K, Chien A. The MicroGrid: a scientific tool for modeling computational grids. Scientifics Programming, 2000, 8(3): 127−141
[28]
Takefusa A, Matsuoka S, Nakada H, Aida K, Nagashima U. Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999, 97−104
[29]
Casanova H. SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 430−437
CrossRef Google scholar
[30]
Buyya R, Murshed M. GridSim: a toolkit for modeling and simulation of distributed resource management and scheduling for grid computing. The Journal of Concurrency and Computation: Practice and Experience, 2002, 14: 1175−1200
CrossRef Google scholar
[31]
Bell W H, Cameron D G, Capozza L, Millar A P, Stockinger K, Zini F. Optorsim: a grid simulator for studying dynamic data replication strategies. International Journal of High Performance Computing Applications, 2003, 17(4): 1−20
CrossRef Google scholar
[32]
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the 2nd International Workshop on Grid Computing, 2001, 75−86
[33]
Park S M, Kim J H, Go Y B, Yoon W S. Dynamic grid replication strategy based on internet hierarchy. Lecture Note in Computer Science, 2003, 1001: 1324−1331
[34]
Sashi K, Thanamani A. Dynamic replication in a data grid using a modified BHR region based algorithm. Future Generation Computer Systems, 2011, 27(2): 202−210
CrossRef Google scholar
[35]
Horri A, Sepahvand R, Dastghaibyfard G H. A hierarchical scheduling and replication strategy. International Journal of Computer Science and Network Security, 2008, 8(8): 30−35
[36]
Chang R, Chang J, Lin S. Job scheduling and data replication on data grids. Future Generation Computer Systems, 2007, 23(7): 846−860
CrossRef Google scholar
[37]
Mansouri N, Dastghaibyfard G H. A dynamic replica management strategy in data grid. Journal of Network and Computer Applications, 2012, 35(4): 1297−1303
CrossRef Google scholar
[38]
Tang M, Lee B S, Yao C K, Tang X Y. Dynamic replication algorithm for the multi-tier Data Grid. Future Generation Computer Systems, 2005, 21(5): 775−790
CrossRef Google scholar
[39]
Shorfuzzaman M, Graham P, Eskicioglu R. Adaptive popularity-driven replica placement in hierarchical data grids. The Journal of Supercomputing, 2010, 51: 374−392
CrossRef Google scholar
[40]
Abdullah A, Othman M, Ibrahim H, Sulaiman M N, Othman A T. Decentralized replication strategies for P2P based scientific data grid. In: Proceedings of the 2008 International Symposium on Information Technology. 2008, 3: 1−8
[41]
Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T. Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Generation Computer Systems, 2012, 28(3): 544−553
CrossRef Google scholar
[42]
Shorfuzzaman M, Rasit Eskicioglu P G, QoS-aware distributed replica placement in hierarchical data grids. In: Proceedings of the 2011 International Conference on Advanced Information Networking and Applications. 2011: 291−299
CrossRef Google scholar
[43]
Taheri J, Lee Y C, Zomaya A Y, Siegel H J. A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Computers & Operations Research, 2012 (in press)
CrossRef Google scholar
[44]
Zhang J, Lee B, Tang X, Yeo C. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102−109
[45]
Tang M, Lee B S, Tang X, Yeo C. The impact of data replication on job scheduling performance in the data grid. Future Generation Computer System, 2006, 22(3): 254−268
CrossRef Google scholar
[46]
Vazhkudai S. Enabling the co-allocation of grid data transfers. In: Proceedings of the 4th International Workshop on Grid Computing. 2003, 44−51

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(1652 KB)

Accesses

Citations

Detail

Sections
Recommended

/