Cooperative multi-agent game based on reinforcement learning

Hongbo Liu

doi:10.1016/j.hcc.2024.100205

High-Confidence Computing ›› 2024, Vol. 4 ›› Issue (1) :100205 DOI: 10.1016/j.hcc.2024.100205

Research Articles

research-article

Cooperative multi-agent game based on reinforcement learning

Hongbo Liu

Author information +

History +

PDF (1167KB)

Abstract

Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains. However, it is also concomitant with a set of formidable challenges, which include the effective allocation of credit values to each agent, real-time collaboration among heterogeneous agents, and an appropriate reward function to guide agent behavior. To handle these issues, we propose an innovative solution named the Graph Attention Counterfactual Multiagent Actor-Critic algorithm (GACMAC). This algorithm encompasses several key components: First, it employs a multi-agent actor-critic framework along with counterfactual baselines to assess the individual actions of each agent. Second, it integrates a graph attention network to enhance real-time collaboration among agents, enabling heterogeneous agents to effectively share information during handling tasks. Third, it incorporates prior human knowledge through a potential-based reward shaping method, thereby elevating the convergence speed and stability of the algorithm. We tested our algorithm on the StarCraft Multi-Agent Challenge (SMAC) platform, which is a recognized platform for testing multi-agent algorithms, and our algorithm achieved a win rate of over 95% on the platform, comparable to the current state-of-the-art multi-agent controllers.

Keywords

Collaborative multi-agent / Reinforcement learning / Credit distribution / Multi-agent communication / Reward shaping

Cite this article

Download citation ▾

Hongbo Liu. Cooperative multi-agent game based on reinforcement learning. High-Confidence Computing, 2024, 4(1): 100205 DOI:10.1016/j.hcc.2024.100205

登录浏览全文

4963

注册一个新账户忘记密码

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	K. Zhang, Z. Yang, T. Başar, Multi-agent reinforcement learning: A selective overview of theories and algorithms,in: Handbook of Reinforcement Learning and Control, Springer, 2021, pp. 321-384.

[2]	M. Świechowski, K. Godlewski, B. Sawicki, J. Mańdziuk, Monte Carlo tree search: A review of recent modifications and applications, Artif. Intell. Rev. 56 (3) (2023) 2497-2562.

[3]	N.C. Thompson, K. Greenewald, K. Lee, G.F. Manso, The computational limits of deep learning, 2020, arXiv preprint arXiv:2007.05558.

[4]	R. Lowe, Y.I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, I. Mordatch, Multiagent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, vol. 30, 2017.

[5]	K. Son, D. Kim, W.J. Kang, D.E. Hostallero, Y. Yi, Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,in:International Conference on Machine Learning, PMLR, 2019, pp. 5887-5896.

[6]	T. Rashid, M. Samvelyan, C.S. De Witt, G. Farquhar, J. Foerster, S. Whiteson, Monotonic value function factorisation for deep multi-agent reinforcement learning, J. Mach. Learn. Res. 21 (1) (2020) 7234-7284.

[7]	G. Papoudakis, F. Christianos, L. Schäfer, S.V. Albrecht, Benchmarking multiagent deep reinforcement learning algorithms in cooperative tasks, 2020, arXiv preprint arXiv:2006.07869.

[8]	C. Zhu, M. Dastani, S. Wang, A survey of multi-agent reinforcement learning with communication, 2022, arXiv preprint arXiv:2203.08975.

[9]	J. Hao, T. Yang, H. Tang, C. Bai, J. Liu, Z. Meng, P. Liu, Z. Wang, Exploration in deep reinforcement learning: From single-agent to multiagent domain, IEEE Trans. Neural Netw. Learn. Syst. (2023).

[10]	M. Samvelyan, T. Rashid, C.S. De Witt, G. Farquhar, N. Nardelli, T.G. Rudner, C.-M. Hung, P.H. Torr, J. Foerster, S. Whiteson, The starcraft multi-agent challenge, 2019, arXiv preprint arXiv:1902.04043.

[11]	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.

[12]	V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, nature 518 (7540) (2015) 529-533.

[13]	X. Huang, X. Zhang, J. Ling, X. Cheng, Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch, Appl. Intell. 53 (20) (2023) 23457-23469, http://dx.doi.org/10.1007/S10489-023-04689-Z.

[14]	M.M. Çelikok, Model-based Multi-agent Reinforcement Learning for AI Assistants (Ph.D. thesis), Aalto University, Espoo, Finland, 2023, URL https://aaltodoc.aalto.fi/handle/123456789/120725.

[15]	K. Jiang, Y. Wang, H. Wang, Z. Liu, Q. Han, A. Zhou, C. Xiang, Z. Cai, A reinforcement learning-based incentive mechanism for task allocation under spatiotemporal crowdsensing, IEEE Trans. Comput. Soc. Syst. (2023) 1-11, http://dx.doi.org/10.1109/TCSS.2023.3263821.

[16]	H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in:Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, (no. 1) 2016.

[17]	M. Hausknecht, P. Stone, Deep recurrent q-learning for partially observable mdps, in:2015 Aaai Fall Symposium Series, 2015.

[18]	I. Sorokin, A. Seleznev, M. Pavlov, A. Fedorov, A. Ignateva, Deep attention recurrent Q-network, 2015, arXiv preprint arXiv:1512.01693.

[19]	G. Papoudakis, F. Christianos, A. Rahman, S.V. Albrecht, Dealing with non-stationarity in multi-agent deep reinforcement learning, 2019, arXiv preprint arXiv:1906.04737.

[20]	A.O. Castaneda, Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms, Edinburgh: School of Informatics, University of Edinburgh, 2016.

[21]	M. Tan, Multi-agent reinforcement learning: Independent vs. Cooperative agents, in:Proceedings of the Tenth International Conference on Machine Learning, 1993, pp. 330-337.

[22]	G. Chen, A new framework for multi-agent reinforcement learning-centralized training and exploration with decentralized execution via policy distillation, 2019, arXiv preprint arXiv:1910.09152.

[23]	Z. Duan, W. Li, Z. Cai, Distributed auctions for task assignment and scheduling in mobile crowdsensing systems, in: 2017 IEEE 37th International Conference on Distributed Computing Systems, ICDCS, 2017, pp. 635-644, http://dx.doi.org/10.1109/ICDCS.2017.121.

[24]	Z. Cai, Z. Duan, W. Li, Exploiting multi-dimensional task diversity in distributed auctions for mobile crowdsensing, IEEE Trans. Mob. Comput. 20 (8) (2021) 2576-2591, http://dx.doi.org/10.1109/TMC.2020.2987881.

[25]	J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson, Counterfactual multi-agent policy gradients, in:Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, (no. 1) 2018.

[26]	S. Sukhbaatar, R. Fergus, et al., Learning multiagent communication with backpropagation, in: Advances in Neural Information Processing Systems, vol. 29, 2016.

[27]	R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992) 229-256.

[28]	S. Guicheng, W. Yang, Review on dec-pomdp model for marl algorithms, in:Smart Communications, Intelligent Algorithms and Interactive Methods:Proceedings of 4th International Conference on Wireless Communications and Applications, ICWCA 2020, Springer, 2022, pp. 29-35.

[29]	S. Iqbal, F. Sha, Actor-attention-critic for multi-agent reinforcement learning, in:International Conference on Machine Learning, PMLR, 2019, pp. 2961-2970.

[30]	R.S. Sutton, Learning to predict by the methods of temporal differences, Mach. Learn. 3 (1988) 9-44.

[31]	D.H. Wolpert, K. Tumer, Optimal payoff functions for members of collectives, Adv. Complex Syst. 4 (02n03) (2001) 265-279.

[32]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems, vol. 30, 2017.

[33]	E. Jang, S. Gu, B. Poole, Categorical reparameterization with gumbelsoftmax, 2016, arXiv preprint arXiv:1611.01144.

[34]	T. Sun, Y. Huang, Z. Xie, J. Pang, X. Chen, Z. Cai, A deep reinforcement learning perspective on adaptive federated dropout, in: 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems, MASS, 2023, pp. 422-426, http://dx.doi.org/10.1109/MASS58611.2023.00059.

[35]	P. Sunehag, G. Lever, A. Gruslys, W.M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J.Z. Leibo, K. Tuyls, et al., Value-decomposition networks for cooperative multi-agent learning, 2017, arXiv preprint arXiv: 1706.05296.