PDF
Abstract
Service computing is an emerging and distributed computing mode in cloud service systems, and has become an interesting research direction for both academia and industry. Note that the cloud service systems always display new characteristics, such as stochasticity, large scale, loose coupling, concurrency, non-homogeneity and heterogeneity, thus their load balancing investigation has been more interesting, difficult and challenging until now. By using resource management and job scheduling, this paper proposes an integrated, real-time and dynamic control mechanism for large-scale cloud service systems and their load balancing through combining supermarket models with not only work stealing models but also scheduling of public reserved resource. To this end, this paper provides a novel stochastic model with weak interactions by means of nonlinear Markov processes. To overcome theoretical difficulties growing out of the state explosion in high-dimensional stochastic systems, this paper applies the mean-field theory to develop a macro computational technique in terms of an infinite-dimensional system of mean-field equations. Furthermore, this paper proves the asymptotic independence of the large-scale cloud service system, and show how to compute the fixed point by virtue of an infinite-dimensional system of nonlinear equations. Based on the fixed point, this paper provides effective numerical computation for performance analysis of this system under a high approximate precision. Therefore, we hope that the methodology and results given in this paper can be applicable to the study of more general large-scale cloud service systems.
Keywords
Large-scale cloud service system
/
resource management
/
job scheduling
/
supermarket model
/
work stealing model
/
scheduling of public reserved resource
Cite this article
Download citation ▾
Feifei Yang, Yanping Jiang, Quanlin Li.
Mean-field Macro Computation in Large-scale Cloud Service Systems with Resource Management and Job Scheduling.
Journal of Systems Science and Systems Engineering, 2019, 28(2): 238-261 DOI:10.1007/s11518-018-5399-z
| [1] |
Anselmi J, Gaujal B. Performance evaluation ofwork stealing for streaming applications. 13th International Conference on Principles of Distributed Systems, 2009
|
| [2] |
Berenbrink P, Friedetzky T, Goldberg L A. The natural work-stealing algorithm is stable. SIAM Journal on Computing, 2003, 32(5): 1260-1279.
|
| [3] |
Blumofe R D, Leiserson C E. Scheduling multi-threaded computations by work stealing. Journal of the ACM, 1999, 46(5): 720-748.
|
| [4] |
Bramson M, Lu Y, Prabhakar B. Randomized load balancing with general service time distributions. ACM SIGMETRICS Performance Evaluation Review, 2010, 38(1): 275-286.
|
| [5] |
Bramson M, Lu Y, Prabhakar B. Asymptotic independence of queues under randomized load balancing. Queueing Systems, 2012, 71(3): 247-292.
|
| [6] |
Bramson M, Lu Y, Prabhakar B. Decay of tails at equilibrium for FIFO join the shortest queue networks. The Annals of Applied Probability, 2013, 23(5): 1841-1878.
|
| [7] |
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A, Buyya R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience, 2011, 41(1): 23-50.
|
| [8] |
Ethier S N, Kurtz T G. Markov Processes: Characterization and Convergence, 2009, Hoboken, New Jersey: John Wiley & Sons, Inc..
|
| [9] |
Gast N, Gaujal B. A mean field model of work stealing in large-scale systems. ACM SIGMETRICS Performance Evaluation Review, 2010, 38(1): 13-24.
|
| [10] |
Graham C. Chaoticity on path space for a queueing network with selection of the shortest queue among several. Journal of Applied Probability, 2000, 37(1): 198-211.
|
| [11] |
Graham C. Functional central limit theorems for a large network in which customers join the shortest of several queues. Probability Theory and Related Fields, 2005, 131(1): 97-120.
|
| [12] |
Harchol-Balter M, Li C, Osogami T, Scheller-Wolf A, Squillante M S. Analysis of task assignment with cycle stealing under central queue. Proceedings of the 23rd International Conference on Distributed Computing Systems, 2003
|
| [13] |
Hendler D, Shavit N. Non-blocking steal-half work queues. Proceedings of the 21st Annual Symposium on Principles of Distributed Computing, 2002
|
| [14] |
Iosup A, Ostermann S, Yigitbasi M N, Prodan R, Fahringer T, Epema D. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(6): 931-945.
|
| [15] |
Jennings B, Stadler R. Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management, 2015, 23(3): 567-619.
|
| [16] |
Li Q L. Tail probabilities in queueing processes. Asia-Pacific Journal of Operational Research, 2014, 31(2): 1-31.
|
| [17] |
Li Q L, Dai G, Lui J C S, Wang Y. The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dynamic Systems, 2014, 24(4): 473-522.
|
| [18] |
Li Q L, Du Y, Dai G, Wang M. On a doubly dynamically controlled supermarket model with impatient customers. Computers & Operations Research, 2015, 55(1): 76-87.
|
| [19] |
Li Q L, Lui J C S. Block-structured supermarket models. Discrete Event Dynamic Systems, 2016, 26(2): 147-182.
|
| [20] |
Li Q L, Yang F. Mean-field analysis for heterogeneous work stealing models. 14th International Conference on Information Technologies and Mathematical Modelling, 2015
|
| [21] |
Lin C, Tian Y, Yao M. Green network and green evaluation: Mechanism, modeling and evaluation. Chinese Journal of Computers, 2012, 34(4): 593-612.
|
| [22] |
Lu Y, Xie Q, Kliot G, Geller A, Larus J R, Greenberg A. Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation, 2011, 68(11): 1056-1071.
|
| [23] |
Luczak M, McDiarmid C. Asymptotic distributions and chaos for the supermarket model. Electronic Journal of Probability, 2007, 12(1): 75-99.
|
| [24] |
Manvi S S, Shyam G K. Resource management for infrastructure as a service (iaas) in cloud computing: a survey. Journal of Network & Computer Applications, 2014, 41(1): 424-440.
|
| [25] |
Minnebo W, Van Houdt B. Pull versus push mechanism in large distributed networks: Closed formresults. Proceedings of the 24th International Teletraffic Congress, 2012
|
| [26] |
Minnebo W, Van Houdt B. Improved rate-based pull and push strategies in large distributed networks. In the IEEE 21st International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems San Francisco, 2013
|
| [27] |
Mitzenmacher M D. The power of two choices in randomized load balancing, 1996, Berkeley, USA: University of California
|
| [28] |
Mitzenmacher M D. Analyses of load stealing models based on families of differential equations. Theory of Computing Systems, 2000, 34(1): 77-98.
|
| [29] |
Moreno I S, Garraghan P, Townend P, Xu J. Analysis, modeling and simulation of workload patterns in a large-scale utility cloud. IEEE Transactions on Cloud Computing, 2014, 2(2): 208-221.
|
| [30] |
Osogami T, Harchol-Balter M, Scheller-Wolf A. Analysis of cycle stealing with switching cost. Journal of the ACM, 2003, 31(1): 184-195.
|
| [31] |
Sotiriadis S, Bessis N, Antonopoulos N, Anjum A. SimIC: Designing a new inter-cloud simulation platform for integrating large-scale resource management. In the IEEE 27th International Conference on Advanced Information Networking and Applications, 2013
|
| [32] |
Squillante M S. Stochastic analysis of multiserver systems. ACM SIGMETRICS Performance Evaluation Review, 2007, 34(4): 44-51.
|
| [33] |
Squillante M S, Nelson R D. Analysis of task migration in shared-memory multiprocessor scheduling. ACM SIGMETRICS Performance Evaluation Review, 1991, 19(1): 143-155.
|
| [34] |
Stolyar A L. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems, 2015, 80(4): 341-361.
|
| [35] |
Turner S R. The effect of increasing routing choice on resource pooling. Probability in the Engineering and Informational Sciences, 1998, 12(1): 109-124.
|
| [36] |
van der Boor M, Borst S C, van Leeuwaarden J S, Mukherjee D. Scalable load balancing in networked systems: A survey of recent advances, 2018 1-69.
|
| [37] |
Van Houdt B. Performance comparison of aggressive push and traditional pull strategies in large distributed systems. In the 8th International Conference on Quantitative Evaluation of Systems, 2011
|
| [38] |
Vvedenskaya N D, Dobrushin R L, Karpelevich F I. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems of Information Transmission, 1996, 32(1): 20-34.
|
| [39] |
Vvedenskaya N D, Suhov Y M. Dobrushin’s meanfield approximation for a queue with dynamic routing. Markov Processes and Related Fields, 1997, 13(1): 493-526.
|
| [40] |
Wuhib F, Yanggratoke R, Stadler R. Allocating compute and network resources under management objectives in large-scale clouds. Journal of Network and Systems Management, 2015, 23(1): 111-136.
|