Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers✩

Hao Liu; Xiaonyu Hu; Ran Wang; Jie Hao; Qiang Wu; Hongke Zhang

doi:10.1016/j.dcan.2025.11.006

›› 2026, Vol. 12 ›› Issue (2) :236 -251. DOI: 10.1016/j.dcan.2025.11.006

Regular Papers

research-article

Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers^✩

Hao Liu ^a^,^b
, Xiaonyu Hu ^c
, Ran Wang ^a^,^b^,^d
, Jie Hao ^a^,^b^,^*
, Qiang Wu ^e
, Hongke Zhang ^f^,^g

Author information +

^a College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 210016, Nanjing, China

^b Collaborative Innovation Center of Novel Software Technology and Industrialization, 210016, Nanjing, China

^c China Institute of Communications, 100029, Beijing, China

^d College of Computer and Data Science, Fuzhou University, 350108, Fuzhou, China

^e College of Computer Science and Technology, Zhejiang University, 310058, Hangzhou, China

^f School of Electronic and Information Engineering, Beijing Jiaotong University, 100044, Beijing, China

^g National Engineering Research Center for Mobile Private Networks, 100044, Beijing, China

^* E-mail addresses: liuhaocs@nuaa.edu.cn (H. Liu), huxiaonv@china-cic.cn (X. Hu), wangran@nuaa.edu.cn (R. Wang), haojie@nuaa.edu.cn (J. Hao), wuqiang@nuaa.edu.cn (Q. Wu), hkzhang@bjtu.edu.cn (H. Zhang).

Hao Liu received the MS degree in Computer Technology from Nanjing University of Aeronautics and Astronautics (NUAA) in 2023. He is currently pursuing the PhD degree at NUAA’s College of Computer Science and Technology, with research interests in energy-eﬃcient AI systems.

Xiaonyu Hu is a doctoral student at the Institute of Sustainable Development, Macau University of Science and Technology, and business supervisor of the China Institute of Communications. Her research focuses on computing power networks, digital economy, and industrial Internet.

Ran Wang is a Professor at the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. He obtained his BE (2011) from Harbin Institute of Technology and PhD (2016) from Nanyang Technological Uni-versity, Singapore. His research interests cover telecommunication networking and cloud computing.

Jie Hao received her BS (2007) from Beijing University of Posts and Telecommunications and PhD (2014) from the University of Chinese Academy of Sciences. She served as a post-doctoral fellow at Nanyang Technological University, Singapore (2014-2015), and is now an Associate Professor at the College of Computer Science and Technology, Nan-jing University of Aeronautics and Astronautics, China. Her research focuses on wireless sensing and advanced networks.

Qiang Wu is a Qiushi Distinguished Professor in the College of Computer Science and Technology at Zhejiang University, Hangzhou, China. Prof. Wu is a Fellow of the China Institute of Communications, an Executive Member of the Service Computing Technical Committee of CCF. His primary research areas include future networks, cybersecurity, and space-earth integration networks. He has won two national second-class awards (2009, 2014) and holds over 100 authorized patents.

Hongke Zhang received his MS (1988) and PhD (1992) in Electrical and Communi-cation Systems from the University of Electronic Science and Technology of China. Elected as an Academician of the Chinese Academy of Engineering in 2021, he is a Professor at Beijing Jiaotong University and Director of the National Engineer-ing Laboratory on Internet Technology for Next-Generation Internet. He has authored over 10 books, holds more than 70 patents, and excels in communications and com-puter networks. He serves as the Chief Scientist of the National Basic Research Pro-gram (973 Program) and is a member of the editorial boards of several international journals.

Show less

History +

PDF

Abstract

The explosive proliferation of Large Language Models (LLMs) imposes significant energy and operational bur-dens on Geographically Distributed Data Centers (GDDCs), thereby demanding an eﬃcient mechanism for LLMs task scheduling. While prior geo-distributed scheduling methods reduce cost and carbon emissions by exploiting regional heterogeneity, they largely overlook model and data reuse opportunities and the uncertainty of LLM execution times. In this paper, we introduce GCOS, to the best of our knowledge, the first green scheduling framework that incorporates a dual-cache system for both data and models, while jointly optimizing task assign-ment and cache migration. We firstly propose a dual-cache mechanism that decouples model and data caching to enable fine-grained reuse and minimize redundant transmissions. Subsequently, we propose the Multi-Agent Cache-aware Cooperative Scheduling (MACCS) algorithm, which leverages reinforcement learning to optimize task placement with a focus on minimizing both carbon emissions and cost. Additionally, we design a lightweight execution time predictor, DiPTree, to address the high variability in task execution times. Extensive experiments on real-world datasets demonstrate that GCOS reduces overall cost by up to 92.6 % and carbon emissions by 90.3 %, significantly outperforming existing baselines.

Keywords

Large language model / Geographically distributed data center / Green communication / Task scheduling / Multi-agent reinforcement learning

Cite this article

Download citation ▾

Hao Liu, Xiaonyu Hu, Ran Wang, Jie Hao, Qiang Wu, Hongke Zhang. Green scheduling for LLM workloads with model and data reuse across geo-distributed data centers^✩. , 2026, 12(2): 236-251 DOI:10.1016/j.dcan.2025.11.006

登录浏览全文

4963

注册一个新账户忘记密码

CRediT authorship contribution statement

Hao Liu: Writing-original draft, Methodology; Xiaonyu Hu: In-vestigation, Funding acquisition; Ran Wang: Writing-review & edit-ing, Supervision; Jie Hao: Writing-review & editing, Project admin-istration; Qiang Wu: Project administration, Formal analysis; Hongke Zhang: Supervision, Resources.

Declaration of competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the 2024 National Society Project for Supporting National Strategies, under the program titled “Key Tech-nology Roadmap for AI-Oriented Computing Power Networks”.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al., A survey on evaluation of large language models, ACM Trans. Intell. Syst. Technol. 15 (3) (2024) 1-45.

[2]	N. Maslej, L. Fattorini, E. Brynjolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, H. Ngo, J.C. Niebles, V. Parli, et al., Artificial intelligence index report 2023, arXiv:2310.03715, (2023).

[3]	S. Zhang, M. Xu, W.Y.B. Lim, D. Niyato,Sustainable aigc workload scheduling of geo-distributed data centers: a multi-agent reinforcement learning approach, in: GLOBECOM 2023-2023 IEEE Global Communications Conference, IEEE, 2023, pp. 3500-3505.

[4]	N. Hogade, S. Pasricha, Game-theoretic deep reinforcement learning to minimize carbon emissions and energy costs for AI inference workloads in geo-distributed data centers, IEEE Trans. Sustain. Comput. 10 (4) (2025) 628-641.

[5]	Z. Zhou, F. Liu, S. Chen, Z. Li, A truthful and eﬃcient incentive mechanism for de-mand response in green datacenters, IEEE Trans. Parall. Distrib. Syst. 31 (1) (2018) 1-15.

[6]	Electricitymaps, Electricitymaps: get carbon electricity data for your research project. https://www.electricitymaps.com/research2025. (accessed 20 March 2025).

[7]

Q. Hu, Z. Ye, Z. Wang, G. Wang, M. Zhang, Q. Chen, P. Sun, D. Lin, X. Wang, Y. Luo, et al., Characterization of large language model development in the datacenter,in:Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), USENIX Association, 2024, pp. 709-729.

[8]	S. Chen, J. Li, Q. Yuan, H. He, S. Li, J. Yang, Two-timescale joint optimization of task scheduling and resource scaling in multi-data center system based on multi-agent deep reinforcement learning, IEEE Trans. Parall. Distrib. Syst. 35 (12) (2024) 2331-2346.

[9]

J. Lang, X. Zheng, Y. Sun, Z. Ding, Online job scheduling for energy cost optimiza-tion in geo-distributed data centers considering data locality: a multi-agent rein-forcement learning approach, in: Proceedings of the 2024 IEEE/IAS Industrial and Commercial Power System Asia, IEEE, 2024, pp. 748-753.

[10]	M. Hussain, L.-F. Wei, A. Rehman, A. Hussain, M. Ali, M.H. Javed, An electricity price and energy-eﬃcient workflow scheduling in geographically distributed cloud data centers, J. King Saud Univ. -Comput. Inf. Sci. 36 (8) (2024) 102170.

[11]

B. Acun, B. Lee, F. Kazhamiaka, K. Maeng, U. Gupta, M. Chakkaravarthy, D. Brooks, C.-J. Wu, Carbon explorer: a holistic framework for designing carbon aware dat-acenters,in:Proceedings of the 28th ACM International Conference on Architec-tural Support for Programming Languages and Operating Systems, ACM, 2023, pp. 118-132.

[12]	Y. Jiang, R.B. Roy, R. Kanakagiri, D. Tiwari, WaterWise: co-optimizing carbon-and water-footprint toward environmentally sustainable cloud computing, in: Proceed-ings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, ACM, 2025, pp. 297-311.

[13]	A. Radovanović, R. Koningstein, I. Schneider, B. Chen, A. Duarte, B. Roy, D. Xiao, M. Haridasan, P. Hung, N. Care, et al., Carbon-aware computing for datacenters, IEEE Trans. Power Syst. 38 (2) (2022) 1270-1280.

[14]

A. Choudhury, Y. Wang, T. Pelkonen, K. Srinivasan, A. Jain, S. Lin, D. David, S. Soleimanifard, M. Chen, A. Yadav, et al., MAST: Global scheduling of ML training across geo-distributed datacenters at hyperscale,in:Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), USENIX Association, 2024, pp. 563-580.

[15]	C. Guo, F. Luo, J. Yang, Z. Cai, Transactive operational framework for internet data centers in geo-distributed local energy markets, IEEE Trans. Cloud Comput. 11 (2) (2022) 1133-1143.

[16]	M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes, Omega: flexible, scal-able schedulers for large compute clusters,in:Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013, pp. 351-364.

[17]	A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, J. Wilkes, Large-scale cluster management at google with borg, in: Proceedings of the Tenth European Conference on Computer Systems, ACM, 2015, pp. 1-17.

[18]	B. Burns, B. Grant, D. Oppenheimer, E. Brewer, J. Wilkes, Borg, omega, and ku-bernetes: lessons learned from three container-management systems over a decade, Queue 14 (1) (2016) 70-93.

[19]	G. Yeung, D. Borowiec, A. Friday, R. Harper, P. Garraghan, Towards GPU utilization prediction for cloud deep learning, in: 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20), USENIX Association, 2020, pp. 1-9.

[20]	Z. Yang, Z. Ye, T. Fu, J. Luo, X. Wei, Y. Luo, X. Wang, Z. Wang, T. Zhang, Tear up the bubble boom: lessons learned from a deep learning research and development cluster, in: 2022 IEEE 40th International Conference on Computer Design (ICCD), IEEE, 2022, pp. 672-680.

[21]	K. Mahajan, A. Balasubramanian, A. Singhvi, S. Venkataraman, A. Akella, A. Phan- ishayee, S. Chawla, Themis: fair and eﬃcient GPU cluster scheduling,in:17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), USENIX Association, 2020, pp. 289-304.

[22]	Z. Ye, P. Sun, W. Gao, T. Zhang, X. Wang, S. Yan, Y. Luo, Astraea: a fair deep learning scheduler for multi-tenant gpu clusters, IEEE Trans. Parall. Distrib. Syst. 33 (11) (2021) 2781-2793.

[23]	A. Kumar, K. Subramanian, S. Venkataraman, A. Akella, Doing more by doing less: how structured partial backpropagation improves deep learning clusters, in: Pro-ceedings of the 2nd ACM International Workshop on Distributed Machine Learning, ACM, 2021, pp. 15-21.

[24]	S. Rajasekaran, M. Ghobadi, A. Akella, CASSINI: network-aware job scheduling in machine learning clusters,in:21st USENIX Symposium on Networked Systems De-sign and Implementation (NSDI 24), USENIX Association, 2024, pp. 1403-1420.

[25]	S. Jayaram Subramanya, D. Arfeen, S. Lin, A. Qiao, Z. Jia, G.R. Ganger, Sia: heterogeneity-aware, goodput-optimized ML-cluster scheduling,in:Proceedings of the 29th Symposium on Operating Systems Principles, ACM, 2023, pp. 642-657.

[26]	P. Wiesner, I. Behnke, D. Scheinert, K. Gontarska, L. Thamsen, Let’s wait awhile: how temporal workload shifting can reduce carbon emissions in the cloud, in: Proceed-ings of the 22nd International Middleware Conference, ACM, 2021, pp. 260-272.

[27]

W.A. Hanafy, Q. Liang, N. Bashir, A. Souza, D. Irwin, P. Shenoy, Going green for less green: optimizing the cost of reducing cloud carbon emissions,in:Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2024, pp. 479-496.

[28]	W.A. Hanafy, Q. Liang, N. Bashir, D. Irwin, P. Shenoy, Carbonscaler: leveraging cloud workload elasticity for optimizing carbon-eﬃciency, in: Proceedings of the ACM on Measurement and Analysis of Computing Systems, ACM, 2023, pp. 1-28.

[29]	K. Xu, D. Sun, H. Tian, J. Zhang, K. Chen, GREEN: carbon-eﬃcient resource schedul-ing for machine learning clusters,in:22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), USENIX Association, 2025, pp. 999-1014.

[30]	S. Choi, I. Koo, J. Ahn, M. Jeon, Y. Kwon, EnvPipe: performance-preserving DNN training framework for saving energy, in: 2023 USENIX Annual Technical Confer-ence (USENIX ATC 23), USENIX Association, 2023, pp. 851-864.

[31]	J. You, J.-W. Chung, M. Chowdhury, Zeus: understanding and optimizing GPU en-ergy consumption of DNN training,in:20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), USENIX Association, 2023, pp. 119-139.

[32]	M. Hussain, L.-F. Wei, A. Rehman, F. Abbas, A. Hussain, M. Ali, Deadline-constrained energy-aware workflow scheduling in geographically distributed cloud data centers, Future Gen. Comput. Syst. 132 (2022) 211-222.

[33]	B. Hu, Z. Cao, M. Zhou, Energy-minimized scheduling of real-time parallel work-flows on heterogeneous distributed computing systems, IEEE Trans. Serv. Comput. 15 (5) (2021) 2766-2779.

[34]	A. Souza, N. Bashir, J. Murillo, W. Hanafy, Q. Liang, D. Irwin, P. Shenoy, Ecovisor: a virtual energy system for carbon-eﬃcient applications,in:Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, 2023, pp. 252-265.

[35]	Z. Li, R. Wang, K. Zhu, C. Yi, L. Liu, D. Niyato, Joint energy and computation work-load management for geo-distributed data centers in smart grid, in: Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications, IEEE, 2021, pp. 2410-2416.

[36]	J. Zheng, A.A. Chien, S. Suh, Mitigating curtailment and carbon emissions through load migration between data centers, Joule 4 (10) (2020) 2208-2222.

[37]	Z. Hu, B. Li, J. Luo, Time-and cost-eﬃcient task scheduling across geo-distributed data centers, IEEE Trans. Parall. Distrib. Syst. 29 (3) (2017) 705-718.

[38]	A. Jayanetti, S. Halgamuge, R. Buyya, Multi-agent deep reinforcement learning framework for renewable energy-aware workflow scheduling on distributed cloud data centers, IEEE Trans. Parall. Distrib. Syst. 35 (4) (2024) 604-615.

[39]	M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discov-ering clusters in large spatial databases with noise,in:Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, ACM, 1996, pp. 226-231.

[40]	C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, Y. Wu, The surprising ef-fectiveness of ppo in cooperative multi-agent games, in: Proceedings of the 36th Conference on Neural Information Processing Systems, Curran Associates, 2022, pp. 24611-24624.

[41]	R. Lowe, Y.I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Proceedings of the 31st Conference on Neural Information Processing Systems, Curran Associates, 2017, pp. 1-12.

[42]	J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson, Counterfactual multi-agent policy gradients, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, 2018, pp. 1-9.

[43]	G.L. Nemhauser, L.A. Wolsey, M.L. Fisher, An analysis of approximations for maxi-mizing submodular set functions—I, Math. Programm. 14 (1) (1978) 265-294.

[44]	D. Pisinger, P. Toth, Handbook of Combinatorial Optimization, New York, fourth edition, 1998.

[45]	W. Liu, W. Cai, K. Jiang, G. Cheng, Y. Wang, J. Wang, J. Cao, L. Xu, C. Mu, C. Sun,XuanCe: a comprehensive and unified deep reinforcement learning library, arXiv:2312.16248, (2023).

[46]	Y. Wang, Y. Chen, Z. Li, X. Kang, Z. Tang, X. He, R. Guo, X. Wang, Q. Wang, A.C. Zhou, et al., BurstGPT: a real-world workload dataset to optimize LLM serving sys-tems, arXiv:2401.17644. (2024).

[47]	Q. Weng, L. Yang, Y. Yu, W. Wang, X. Tang, G. Yang, L. Zhang, Beware of fragmen-tation: scheduling GPU-Sharing workloads with fragmentation gradient descent,in:Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC 23), USENIX Association, 2023, pp. 995-1008.

[48]	Q. Hu, P. Sun, S. Yan, Y. Wen, T. Zhang, Characterization and prediction of deep learning workloads in large-scale gpu datacenters, in: Proceedings of the Interna-tional Conference for High Performance Computing, Networking, Storage and Anal-ysis, IEEE, 2021, pp. 1-15.

[49]	M. Jeon, S. Venkataraman, A. Phanishayee, J. Qian, W. Xiao, F. Yang, Analysis of large-scale multi-tenant GPU clusters for DNN training workloads, in: Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), USENIX Asso-ciation, 2019, pp. 947-960.

[50]	Ollama, Ollama: GitHub Repository. 2025. (accessed: 05 February 2025). https://github.com/ollama/ollama/blob/main/README.md

[51]	Zjh-819, LLMDataHub: Awesome Datasets for LLM Training. 2023. (accessed: 15 July 2025). https://github.com/Zjh-819/LLMDataHub

[52]	D. Adami, S. Giordano, M. Pagano, S. Roma, Virtual machines migration in a cloud data center scenario: an experimental analysis,in:Proceedings of the 2013 IEEE International Conference on Communications (ICC), IEEE, 2013, pp. 2578-2582.

[53]	C. Li, Q. Cai, Y. Lou, Optimal data placement strategy considering capacity limitation and load balancing in geographically distributed cloud, Future Gen. Comput. Syst., 127 (2022) 142-159.

[54]	J. Aslan, K. Mayers, J.G. Koomey, C. France, Electricity intensity of internet data transmission: untangling the estimates, J. Industr. Ecol. 22 (4) (2018) 785-798.