Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers

Hao Liu , Xiaonyu Hu , Ran Wang , Jie Hao , Qiang Wu , Hongke Zhang

›› 2026, Vol. 12 ›› Issue (2) : 236 -251.

PDF
›› 2026, Vol. 12 ›› Issue (2) :236 -251. DOI: 10.1016/j.dcan.2025.11.006
Regular Papers
research-article
Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers
Author information +
History +
PDF

Abstract

The explosive proliferation of Large Language Models (LLMs) imposes significant energy and operational bur-dens on Geographically Distributed Data Centers (GDDCs), thereby demanding an efficient mechanism for LLMs task scheduling. While prior geo-distributed scheduling methods reduce cost and carbon emissions by exploiting regional heterogeneity, they largely overlook model and data reuse opportunities and the uncertainty of LLM execution times. In this paper, we introduce GCOS, to the best of our knowledge, the first green scheduling framework that incorporates a dual-cache system for both data and models, while jointly optimizing task assign-ment and cache migration. We firstly propose a dual-cache mechanism that decouples model and data caching to enable fine-grained reuse and minimize redundant transmissions. Subsequently, we propose the Multi-Agent Cache-aware Cooperative Scheduling (MACCS) algorithm, which leverages reinforcement learning to optimize task placement with a focus on minimizing both carbon emissions and cost. Additionally, we design a lightweight execution time predictor, DiPTree, to address the high variability in task execution times. Extensive experiments on real-world datasets demonstrate that GCOS reduces overall cost by up to 92.6 % and carbon emissions by 90.3 %, significantly outperforming existing baselines.

Keywords

Large language model / Geographically distributed data center / Green communication / Task scheduling / Multi-agent reinforcement learning

Cite this article

Download citation ▾
Hao Liu, Xiaonyu Hu, Ran Wang, Jie Hao, Qiang Wu, Hongke Zhang. Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers. , 2026, 12(2): 236-251 DOI:10.1016/j.dcan.2025.11.006

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Hao Liu: Writing-original draft, Methodology; Xiaonyu Hu: In-vestigation, Funding acquisition; Ran Wang: Writing-review & edit-ing, Supervision; Jie Hao: Writing-review & editing, Project admin-istration; Qiang Wu: Project administration, Formal analysis; Hongke Zhang: Supervision, Resources.

Declaration of competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the 2024 National Society Project for Supporting National Strategies, under the program titled “Key Tech-nology Roadmap for AI-Oriented Computing Power Networks”.

References

[1]

Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al., A survey on evaluation of large language models, ACM Trans. Intell. Syst. Technol. 15 (3) (2024) 1-45.

[2]

N. Maslej, L. Fattorini, E. Brynjolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, H. Ngo, J.C. Niebles, V. Parli, et al., Artificial intelligence index report 2023, arXiv:2310.03715, (2023).

[3]

S. Zhang, M. Xu, W.Y.B. Lim, D. Niyato,Sustainable aigc workload scheduling of geo-distributed data centers: a multi-agent reinforcement learning approach, in: GLOBECOM 2023-2023 IEEE Global Communications Conference, IEEE, 2023, pp. 3500-3505.

[4]

N. Hogade, S. Pasricha, Game-theoretic deep reinforcement learning to minimize carbon emissions and energy costs for AI inference workloads in geo-distributed data centers, IEEE Trans. Sustain. Comput. 10 (4) (2025) 628-641.

[5]

Z. Zhou, F. Liu, S. Chen, Z. Li, A truthful and efficient incentive mechanism for de-mand response in green datacenters, IEEE Trans. Parall. Distrib. Syst. 31 (1) (2018) 1-15.

[6]

Electricitymaps, Electricitymaps: get carbon electricity data for your research project. https://www.electricitymaps.com/research2025. (accessed 20 March 2025).

[7]

Q. Hu, Z. Ye, Z. Wang, G. Wang, M. Zhang, Q. Chen, P. Sun, D. Lin, X. Wang, Y. Luo, et al., Characterization of large language model development in the datacenter,in:Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), USENIX Association, 2024, pp. 709-729.

[8]

S. Chen, J. Li, Q. Yuan, H. He, S. Li, J. Yang, Two-timescale joint optimization of task scheduling and resource scaling in multi-data center system based on multi-agent deep reinforcement learning, IEEE Trans. Parall. Distrib. Syst. 35 (12) (2024) 2331-2346.

[9]

J. Lang, X. Zheng, Y. Sun, Z. Ding, Online job scheduling for energy cost optimiza-tion in geo-distributed data centers considering data locality: a multi-agent rein-forcement learning approach, in: Proceedings of the 2024 IEEE/IAS Industrial and Commercial Power System Asia, IEEE, 2024, pp. 748-753.

[10]

M. Hussain, L.-F. Wei, A. Rehman, A. Hussain, M. Ali, M.H. Javed, An electricity price and energy-efficient workflow scheduling in geographically distributed cloud data centers, J. King Saud Univ. -Comput. Inf. Sci. 36 (8) (2024) 102170.

[11]

B. Acun, B. Lee, F. Kazhamiaka, K. Maeng, U. Gupta, M. Chakkaravarthy, D. Brooks, C.-J. Wu, Carbon explorer: a holistic framework for designing carbon aware dat-acenters,in:Proceedings of the 28th ACM International Conference on Architec-tural Support for Programming Languages and Operating Systems, ACM, 2023, pp. 118-132.

[12]

Y. Jiang, R.B. Roy, R. Kanakagiri, D. Tiwari, WaterWise: co-optimizing carbon-and water-footprint toward environmentally sustainable cloud computing, in: Proceed-ings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, ACM, 2025, pp. 297-311.

[13]

A. Radovanović, R. Koningstein, I. Schneider, B. Chen, A. Duarte, B. Roy, D. Xiao, M. Haridasan, P. Hung, N. Care, et al., Carbon-aware computing for datacenters, IEEE Trans. Power Syst. 38 (2) (2022) 1270-1280.

[14]

A. Choudhury, Y. Wang, T. Pelkonen, K. Srinivasan, A. Jain, S. Lin, D. David, S. Soleimanifard, M. Chen, A. Yadav, et al., MAST: Global scheduling of ML training across geo-distributed datacenters at hyperscale,in:Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), USENIX Association, 2024, pp. 563-580.

[15]

C. Guo, F. Luo, J. Yang, Z. Cai, Transactive operational framework for internet data centers in geo-distributed local energy markets, IEEE Trans. Cloud Comput. 11 (2) (2022) 1133-1143.

[16]

M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes, Omega: flexible, scal-able schedulers for large compute clusters,in:Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013, pp. 351-364.

[17]

A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, J. Wilkes, Large-scale cluster management at google with borg, in: Proceedings of the Tenth European Conference on Computer Systems, ACM, 2015, pp. 1-17.

[18]

B. Burns, B. Grant, D. Oppenheimer, E. Brewer, J. Wilkes, Borg, omega, and ku-bernetes: lessons learned from three container-management systems over a decade, Queue 14 (1) (2016) 70-93.

[19]

G. Yeung, D. Borowiec, A. Friday, R. Harper, P. Garraghan, Towards GPU utilization prediction for cloud deep learning, in: 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20), USENIX Association, 2020, pp. 1-9.

[20]

Z. Yang, Z. Ye, T. Fu, J. Luo, X. Wei, Y. Luo, X. Wang, Z. Wang, T. Zhang, Tear up the bubble boom: lessons learned from a deep learning research and development cluster, in: 2022 IEEE 40th International Conference on Computer Design (ICCD), IEEE, 2022, pp. 672-680.

[21]

K. Mahajan, A. Balasubramanian, A. Singhvi, S. Venkataraman, A. Akella, A. Phan- ishayee, S. Chawla, Themis: fair and efficient GPU cluster scheduling,in:17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), USENIX Association, 2020, pp. 289-304.

[22]

Z. Ye, P. Sun, W. Gao, T. Zhang, X. Wang, S. Yan, Y. Luo, Astraea: a fair deep learning scheduler for multi-tenant gpu clusters, IEEE Trans. Parall. Distrib. Syst. 33 (11) (2021) 2781-2793.

[23]

A. Kumar, K. Subramanian, S. Venkataraman, A. Akella, Doing more by doing less: how structured partial backpropagation improves deep learning clusters, in: Pro-ceedings of the 2nd ACM International Workshop on Distributed Machine Learning, ACM, 2021, pp. 15-21.

[24]

S. Rajasekaran, M. Ghobadi, A. Akella, CASSINI: network-aware job scheduling in machine learning clusters,in:21st USENIX Symposium on Networked Systems De-sign and Implementation (NSDI 24), USENIX Association, 2024, pp. 1403-1420.

[25]

S. Jayaram Subramanya, D. Arfeen, S. Lin, A. Qiao, Z. Jia, G.R. Ganger, Sia: heterogeneity-aware, goodput-optimized ML-cluster scheduling,in:Proceedings of the 29th Symposium on Operating Systems Principles, ACM, 2023, pp. 642-657.

[26]

P. Wiesner, I. Behnke, D. Scheinert, K. Gontarska, L. Thamsen, Let’s wait awhile: how temporal workload shifting can reduce carbon emissions in the cloud, in: Proceed-ings of the 22nd International Middleware Conference, ACM, 2021, pp. 260-272.

[27]

W.A. Hanafy, Q. Liang, N. Bashir, A. Souza, D. Irwin, P. Shenoy, Going green for less green: optimizing the cost of reducing cloud carbon emissions,in:Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2024, pp. 479-496.

[28]

W.A. Hanafy, Q. Liang, N. Bashir, D. Irwin, P. Shenoy, Carbonscaler: leveraging cloud workload elasticity for optimizing carbon-efficiency, in: Proceedings of the ACM on Measurement and Analysis of Computing Systems, ACM, 2023, pp. 1-28.

[29]

K. Xu, D. Sun, H. Tian, J. Zhang, K. Chen, GREEN: carbon-efficient resource schedul-ing for machine learning clusters,in:22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), USENIX Association, 2025, pp. 999-1014.

[30]

S. Choi, I. Koo, J. Ahn, M. Jeon, Y. Kwon, EnvPipe: performance-preserving DNN training framework for saving energy, in: 2023 USENIX Annual Technical Confer-ence (USENIX ATC 23), USENIX Association, 2023, pp. 851-864.

[31]

J. You, J.-W. Chung, M. Chowdhury, Zeus: understanding and optimizing GPU en-ergy consumption of DNN training,in:20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), USENIX Association, 2023, pp. 119-139.

[32]

M. Hussain, L.-F. Wei, A. Rehman, F. Abbas, A. Hussain, M. Ali, Deadline-constrained energy-aware workflow scheduling in geographically distributed cloud data centers, Future Gen. Comput. Syst. 132 (2022) 211-222.

[33]

B. Hu, Z. Cao, M. Zhou, Energy-minimized scheduling of real-time parallel work-flows on heterogeneous distributed computing systems, IEEE Trans. Serv. Comput. 15 (5) (2021) 2766-2779.

[34]

A. Souza, N. Bashir, J. Murillo, W. Hanafy, Q. Liang, D. Irwin, P. Shenoy, Ecovisor: a virtual energy system for carbon-efficient applications,in:Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2023, pp. 252-265.

[35]

Z. Li, R. Wang, K. Zhu, C. Yi, L. Liu, D. Niyato, Joint energy and computation work-load management for geo-distributed data centers in smart grid, in: Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications, IEEE, 2021, pp. 2410-2416.

[36]

J. Zheng, A.A. Chien, S. Suh, Mitigating curtailment and carbon emissions through load migration between data centers, Joule 4 (10) (2020) 2208-2222.

[37]

Z. Hu, B. Li, J. Luo, Time-and cost-efficient task scheduling across geo-distributed data centers, IEEE Trans. Parall. Distrib. Syst. 29 (3) (2017) 705-718.

[38]

A. Jayanetti, S. Halgamuge, R. Buyya, Multi-agent deep reinforcement learning framework for renewable energy-aware workflow scheduling on distributed cloud data centers, IEEE Trans. Parall. Distrib. Syst. 35 (4) (2024) 604-615.

[39]

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discov-ering clusters in large spatial databases with noise,in:Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, ACM, 1996, pp. 226-231.

[40]

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, Y. Wu, The surprising ef-fectiveness of ppo in cooperative multi-agent games, in: Proceedings of the 36th Conference on Neural Information Processing Systems, Curran Associates, 2022, pp. 24611-24624.

[41]

R. Lowe, Y.I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Proceedings of the 31st Conference on Neural Information Processing Systems, Curran Associates, 2017, pp. 1-12.

[42]

J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson, Counterfactual multi-agent policy gradients, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, 2018, pp. 1-9.

[43]

G.L. Nemhauser, L.A. Wolsey, M.L. Fisher, An analysis of approximations for maxi-mizing submodular set functions—I, Math. Programm. 14 (1) (1978) 265-294.

[44]

D. Pisinger, P. Toth, Handbook of Combinatorial Optimization, New York, fourth edition, 1998.

[45]

W. Liu, W. Cai, K. Jiang, G. Cheng, Y. Wang, J. Wang, J. Cao, L. Xu, C. Mu, C. Sun,XuanCe: a comprehensive and unified deep reinforcement learning library, arXiv:2312.16248, (2023).

[46]

Y. Wang, Y. Chen, Z. Li, X. Kang, Z. Tang, X. He, R. Guo, X. Wang, Q. Wang, A.C. Zhou, et al., BurstGPT: a real-world workload dataset to optimize LLM serving sys-tems, arXiv:2401.17644. (2024).

[47]

Q. Weng, L. Yang, Y. Yu, W. Wang, X. Tang, G. Yang, L. Zhang, Beware of fragmen-tation: scheduling GPU-Sharing workloads with fragmentation gradient descent,in:Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC 23), USENIX Association, 2023, pp. 995-1008.

[48]

Q. Hu, P. Sun, S. Yan, Y. Wen, T. Zhang, Characterization and prediction of deep learning workloads in large-scale gpu datacenters, in: Proceedings of the Interna-tional Conference for High Performance Computing, Networking, Storage and Anal-ysis, IEEE, 2021, pp. 1-15.

[49]

M. Jeon, S. Venkataraman, A. Phanishayee, J. Qian, W. Xiao, F. Yang, Analysis of large-scale multi-tenant GPU clusters for DNN training workloads, in: Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), USENIX Asso-ciation, 2019, pp. 947-960.

[50]

Ollama, Ollama: GitHub Repository. 2025. (accessed: 05 February 2025). https://github.com/ollama/ollama/blob/main/README.md

[51]

Zjh-819, LLMDataHub: Awesome Datasets for LLM Training. 2023. (accessed: 15 July 2025). https://github.com/Zjh-819/LLMDataHub

[52]

D. Adami, S. Giordano, M. Pagano, S. Roma, Virtual machines migration in a cloud data center scenario: an experimental analysis,in:Proceedings of the 2013 IEEE International Conference on Communications (ICC), IEEE, 2013, pp. 2578-2582.

[53]

C. Li, Q. Cai, Y. Lou, Optimal data placement strategy considering capacity limitation and load balancing in geographically distributed cloud, Future Gen. Comput. Syst., 127 (2022) 142-159.

[54]

J. Aslan, K. Mayers, J.G. Koomey, C. France, Electricity intensity of internet data transmission: untangling the estimates, J. Industr. Ecol. 22 (4) (2018) 785-798.

PDF

4

Accesses

0

Citation

Detail

Sections
Recommended

/