Workload Balancing in Hospitals using Dynamic Programming with a Multi-Agent C-QMIX Algorithm

Shijin Cai; Yanrong Li; Lai Wei; Wei Jiang

doi:10.1007/s11518-026-5742-8

Journal of Systems Science and Systems Engineering ›› 2026, Vol. 35 ›› Issue (3) :368 -384. DOI: 10.1007/s11518-026-5742-8

Article

research-article

Workload Balancing in Hospitals using Dynamic Programming with a Multi-Agent C-QMIX Algorithm

Author information +

History +

PDF

Abstract

Excessive workloads have a negative impact on healthcare quality. However, hospitalizations exhibit a varying distribution throughout the week, with a peak on Mondays and a decrease on weekends. This imbalance creates disparities for patients to have the same medical services level during peak times compared to non-peak times. So, we propose a dynamic programming model to minimize variance in daily service amounts by optimizing the number of admissions to smooth hospitalization demand and prevent overloads. Furthermore, we limit the maximum waiting size and service capacity to ensure system efficiency. However, the exponential growth of the state space with varying lengths of patient stays makes it challenging for traditional dynamic programming solutions. Therefore, we introduce a multi-agent constrained QMIX (C-QMIX) reinforcement learning algorithm to deal with complex states and get a stable solution. Finally, 81 weeks of data from a tertiary hospital in western China are used to test the algorithm. The results indicate a maximum reduction of 14% in variance while maintaining reasonable levels of average waiting time and overall service quantity and reducing the excessive occupancy rate to mitigate medical risks and enhance healthcare quality.

Keywords

Dynamic programming / multi-agent reinforcement learning / medical management / data-driven / decision control

Cite this article

Download citation ▾

Shijin Cai, Yanrong Li, Lai Wei, Wei Jiang. Workload Balancing in Hospitals using Dynamic Programming with a Multi-Agent C-QMIX Algorithm. Journal of Systems Science and Systems Engineering, 2026, 35 (3) : 368-384 DOI:10.1007/s11518-026-5742-8

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Berry Jaeker J A, Tucker A L. Past the point of speeding up: The negative effects of workload saturation on efficiency and patient severity. Management Science, 2016, 63(4): 1042-1062.

[2]	Bertsekas D P. Nonlinear Programming, 1999. Belmont, MA, Athena Scientific

[3]	Daghistani T A, Elshawi R, Sakr S, et al.. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. International Journal of Cardiology, 2019, 288: 140-147.

[4]	Dai J G, Shi P. Inpatient overflow: An approximate dynamic programming approach. Manufacturing & Service Operations Management, 2019, 21(4): 894-911.

[5]	Dong L, He Z, Song C, et al.. A review of mobile robot motion planning methods: From classical motion planning workflows to reinforcement learning-based architectures. Journal of Systems Engineering and Electronics, 2023, 34(2): 439-459.

[6]	Eriksson C O, Stoner R C, Eden K B, et al.. The association between hospital capacity strain and inpatient outcomes in highly developed countries: A systematic review. Journal of General Internal Medicine, 2017, 32(6): 686-696.

[7]	Gutierrez E, Rubli A. Shocks to hospital occupancy and mortality: Evidence from the 2009 H1N1 pandemic. Management Science, 2021, 67(9): 5943-5952.

[8]	Kc D S, Terwiesch C. Impact of workload on service time and patient safety: An econometric analysis of hospital operations. Management Science, 2009, 55(9): 1486-1498.

[9]	Kc D S, Terwiesch C. Benefits of surgical smoothing and spare capacity: An econometric analysis of patient flow. Production and Operations Management, 2017, 26(9): 1663-1684.

[10]	Kuntz L, Mennicken R, Scholtes S. Stress on the ward: Evidence of safety tipping points in hospitals. Management Science, 2015, 61(4): 754-771.

[11]	Lee C, Tai W, Ng K K. System dynamic modelling of patient flow and transferral problem in a mixed public-private healthcare system: A case study of Hong Kong SAR. Journal of Systems Science and Systems Engineering, 2020, 29(5): 590-608.

[12]	Li M, Wang Y, Du M, et al.. Working hours associated with the quality of nursing care, missed nursing care, and nursing practice environment in China: A multicenter cross-sectional study. Journal of Nursing Management, 2023, 2023: 7984880.

[13]	Li X, Krumholz H M, Yip W, et al.. Quality of primary health care in China: Challenges and recommendations. The Lancet, 2020, 395(10239): 1802-1812.

[14]	Mnih V, Kavukcuoglu K, Silver D, et al.. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.

[15]	Oliehoek F A, Amato C. A Concise Introduction to Decentralized POMDPs, 2016. Cham, Switzerland, Springer International Publishing.

[16]	Oroojlooy A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53(11): 13677-13722.

[17]	Rashid T, Farquhar G, Peng B, et al.. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Proceedings of the 34th Conference on Neural Information Processing Systems, 2020. Virtual Event, December 6–12.

[18]	Rashid T, Samvelyan M, De Witt C S, et al.. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 2020, 21(178): 1-51

[19]	Son K, Kim D, Kang W J, et al.. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, 2019. June 9–15.

[20]	Sunehag P, Lever G, Gruslys A, et al.. Value-decomposition networks for cooperative multi-agent learning based on team reward. Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, 2018. July 10–15.

[21]	Tampuu A, Matiisen T, Kodelja D, et al.. Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 2017, 12(4): e0172395.

[22]	Tsai J C H, Weng S J, Liu S C, et al.. Adjusting daily inpatient bed allocation to smooth emergency department occupancy variation. Healthcare, 2020, 8(2): 78.

[23]	Wang C, Yang F, Li Q L. Optimal decision of dynamic bed allocation and patient admission with buffer wards during an epidemic. Mathematics, 2023, 11(3): 687.