Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation

Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU

PDF(7587 KB)
PDF(7587 KB)
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (6) : 186331. DOI: 10.1007/s11704-023-2733-5
RESEARCH ARTICLE

Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation

Author information +
History +

Abstract

Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning (MARL). Nowadays, existing works mainly focus on improving the communication efficiency of agents, neglecting that real-world communication is much more challenging as there may exist noise or potential attackers. Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration. In this paper, we posit that the ego system

trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy. In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks. Furthermore, as naive adversarial training may impede the generalization ability of the ego system, we design an attacker population generation approach based on evolutionary learning. Finally, the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness, meaning that both the ego system and the attackers are adaptable. Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.

Graphical abstract

Keywords

multi-agent communication / adversarial training / robustness validation / reinforcement learning

Cite this article

Download citation ▾
Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU. Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation. Front. Comput. Sci., 2024, 18(6): 186331 https://doi.org/10.1007/s11704-023-2733-5

Lei Yuan received the BSc degree in Department of Electronic Engineering in 2016 from Tsinghua University, China and the MSc degree from Chinese Aeronautical Establishment, China in 2019. He is currently pursuing the PhD degree with the Department of Computer Science and Technology, Nanjing University, China. His current research interests mainly include machine learning, reinforcement learning, and multi-agent reinforcement learning

Feng Chen received his BSc degree in Automation from School of Artificial Intelligence, Nanjing University, China in 2022. He is currently pursuing the MSc degree with the School of Artificial Intelligence, Nanjing University, China. His research interests include multi-agent reinforcement learning, multiagent system

Zongzhang Zhang received his PhD degree in computer science from University of Science and Technology of China, China in 2012. He was a research fellow at the School of Computing, National University of Singapore, Singapore from 2012 to 2014, and a visiting scholar at the Department of Aeronautics and Astronautics, Stanford University, USA from 2018 to 2019. He is currently an associate professor at the National Key Laboratory for Novel Software Technology, Nanjing University, China. He has co-authored more than 50 research papers. His research interests include reinforcement learning, intelligent planning, and multi-agent learning

Yang Yu received the PhD degree in computer science from Nanjing University, China in 2011, and is currently a professor at the School of Artificial Intelligence, Nanjing University, China. His research interests include machine learning, mainly reinforcement learning and derivative-free optimization for learning. Prof. Yu was granted the CCF-IEEE CS Young Scientist Award in 2020, recognized as one of the AI’s 10 to Watch by IEEE Intelligent Systems, and received the PAKDD Early Career Award in 2018. His team won the Champion of the 2018 OpenAI Retro Contest on transfer reinforcement learning and the 2021 ICAPS Learning to Run a Power Network Challenge with Trust. He served as Area Chairs for NeurIPS, ICML, IJCAI, AAAI, etc

References

[1]
Zhu C, Dastani M, Wang S. A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975
[2]
Ding Z, Huang T, Lu Z. Learning individually inferred communication for multi-agent cooperation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1851
[3]
Wang R, He X, Yu R, Qiu W, An B, Rabinovich Z. Learning efficient multi-agent communication: an information bottleneck approach. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 919
[4]
Xue D, Yuan L, Zhang Z, Yu Y. Efficient multi-agent communication via Shapley message value. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence. 2022, 578−584
[5]
Guan C, Chen F, Yuan L, Wang C, Yin H, Zhang Z, Yu Y. Efficient multi-agent communication via self-supervised information aggregation. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022
[6]
Foerster J N, Assael Y M, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2145−2153
[7]
Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing. In: Proceedings of the 9th International Conference on Learning Representations. 2021
[8]
Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, Zhang C. Multi-agent incentive communication via decentralized teammate modeling. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 9466−9474
[9]
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D. Adversarial attacks and defences: a survey. 2018, arXiv preprint arXiv: 1810.00069
[10]
Moos J, Hansel K, Abdulsamad H, Stark S, Clever D, Peters J . Robust reinforcement learning: a review of foundations and recent advances. Machine Learning and Knowledge Extraction, 2022, 4( 1): 276–315
[11]
Zhang H, Chen H, Xiao C, Li B, Liu M, Boning D S, Hsieh C J. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1765
[12]
Oikarinen T, Zhang W, Megretski A, Daniel L, Weng T W. Robust deep reinforcement learning through adversarial loss. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 26156−26167
[13]
Xu M, Liu Z, Huang P, Ding W, Cen Z, Li B, Zhao D. Trustworthy reinforcement learning against intrinsic vulnerabilities: robustness, safety, and generalizability. 2022, arXiv preprint arXiv: 2209.08025
[14]
Pan X, Seita D, Gao Y, Canny J. Risk averse robust adversarial reinforcement learning. In: Proceedings of 2019 International Conference on Robotics and Automation. 2019, 8522−8528
[15]
Zhang H, Chen H, Boning D S, Hsieh C J. Robust reinforcement learning on state observations with learned optimal adversary. In: Proceedings of the 9th International Conference on Learning Representations. 2021
[16]
Lin J, Dzeparoska K, Zhang S Q, Leon-Garcia A, Papernot N. On the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of 2020 IEEE Security and Privacy Workshops. 2020, 62−68
[17]
Hu Y, Zhang Z. Sparse adversarial attack in multi-agent reinforcement learning. 2022, arXiv preprint arXiv: 2205.09362
[18]
Xue W, Qiu W, An B, Rabinovich Z, Obraztsova S, Yeo C K. Mis-spoke or mis-lead: achieving robustness in multi-agent communicative reinforcement learning. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 1418−1426
[19]
Vinitsky E, Du Y, Parvate K, Jang K, Abbeel P, Bayen A. Robust reinforcement learning using adversarial populations. 2020, arXiv preprint arXiv: 2008.01825
[20]
Wang T, Wang J, Zheng C, Zhang C. Learning nearly decomposable value functions via communication minimization. In: Proceedings of the 8th International Conference on Learning Representations. 2020
[21]
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J. TarMAC: targeted multi-agent communication. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1538−1546
[22]
Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2252−2260
[23]
Lowe R, Foerster J N, Boureau Y, Pineau J, Dauphin Y N. On the pitfalls of measuring emergent communication. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 693−701
[24]
Eccles T, Bachrach Y, Lever G, Lazaridou A, Graepel T. Biases for emergent communication in multi-agent reinforcement learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1176
[25]
Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y. Learning agent communication under limited bandwidth by message pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 5142−5149
[26]
Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y . Learning multi-agent communication with double attentional deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2020, 34( 1): 32
[27]
Wang Y, Zhong F, Xu J, Wang Y. ToM2C: target-oriented multi-agent communication and cooperation with theory of mind. In: Proceedings of the 10th International Conference on Learning Representations. 2021
[28]
Zhang S Q, Zhang Q, Lin J. Efficient communication in multi-agent reinforcement learning via variance based control. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 291
[29]
Zhang S Q, Zhang Q, Lin J. Succinct and robust multi-agent communication with temporal message control. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1449
[30]
Mitchell R, Blumenkamp J, Prorok A. Gaussian process based message filtering for robust multi-agent cooperation in the presence of adversarial communication. 2020, arXiv preprint arXiv: 2012.00508
[31]
Sun Y, Zheng R, Hassanzadeh P, Liang Y, Feizi S, Ganesh S, Huang F. Certifiably robust policy learning against adversarial multi-agent communication. In: Proceedings of the 11th International Conference on Learning Representations. 2023
[32]
OroojlooyJadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. 2019, arXiv preprint arXiv: 1908.03963
[33]
Christianos F, Papoudakis G, Rahman M A, Albrecht S V. Scaling multi-agent reinforcement learning with selective parameter sharing. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 1989−1998
[34]
Wang J, Ren Z, Han B, Ye J, Zhang C. Towards understanding cooperative multi-agent Q-learning with value factorization. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 29142−29155
[35]
Papoudakis G, Christianos F, Rahman A, Albrecht S V. Dealing with non-stationarity in multi-agent deep reinforcement learning. 2019, arXiv preprint arXiv: 1906.04737
[36]
Peng Z, Li Q, Hui K M, Liu C, Zhou B. Learning to simulate self-driven particles system with coordinated policy optimization. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 10784−10797
[37]
Kouzehgar M, Meghjani M, Bouffanais R. Multi-agent reinforcement learning for dynamic ocean monitoring by a swarm of buoys. In: Proceedings of Global Oceans 2020. 2020, 1−8
[38]
Wang J, Xu W, Gu Y, Song W, Green T C. Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 3271−3284
[39]
Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, Yu Y. Multi-agent dynamic algorithm configuration. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022
[40]
Guo J, Chen Y, Hao Y, Yin Z, Yu Y, Li S. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2022, 114−121
[41]
Li S, Wu Y, Cui X, Dong H, Fang F, Russell S. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4213−4220
[42]
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6382−6393
[43]
T. van der Heiden T, Salge C, Gavves E, van Hoof H. Robust multi-agent reinforcement learning with social empowerment for coordination and communication. 2020, arXiv preprint arXiv: 2012.08255
[44]
Zhang K, Sun T, Tao Y, Genc S, Mallya S, Başar T. Robust multi-agent reinforcement learning with model uncertainty. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 887
[45]
Phan T, Gabor T, Sedlmeier A, Ritz F, Kempter B, Klein C, Sauer H, Schmid R, Wieghardt J, Zeller M, Linnhoff-Popien C. Learning and testing resilience in cooperative multi-agent systems. In: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. 2020, 1055−1063
[46]
Phan T, Belzner L, Gabor T, Sedlmeier A, Ritz F, Linnhoff-Popien C. Resilient multi-agent reinforcement learning with adversarial value decomposition. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 11308−11316
[47]
Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846
[48]
Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T . Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364( 6443): 859–865
[49]
Qian H, Yu Y . Derivative-free reinforcement learning: a review. Frontiers of Computer Science, 2021, 15( 6): 156336
[50]
Derek K, Isola P. Adaptable agent populations via a generative model of policies. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2021, 3902−3913
[51]
Parker-Holder J, Pacchiano A, Choromanski K, Roberts S. Effective diversity in population based reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1515
[52]
Luo F M, Xu T, Lai H, Chen X H, Zhang W, Yu Y. A survey on model-based reinforcement learning. 2022, arXiv preprint arXiv: 2206.09328
[53]
Zhao R, Song J, Haifeng H, Gao Y, Wu Y, Sun Z, Wei Y. Maximum entropy population based training for zero-shot human-AI coordination. 2021, arXiv preprint arXiv: 2112.11701
[54]
Xue K, Wang Y, Yuan L, Guan C, Qian C, Yu Y. Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957
[55]
Wang Y, Xue K, Qian C. Evolutionary diversity optimization with clustering-based selection for reinforcement learning. In: Proceedings of the 10th International Conference on Learning Representations. 2021
[56]
Cully A, Demiris Y . Quality and diversity optimization: a unifying modular framework. IEEE Transactions on Evolutionary Computation, 2018, 22( 2): 245–259
[57]
Chatzilygeroudis K, Cully A, Vassiliades V, Mouret J B. Quality-diversity optimization: a novel branch of stochastic optimization. In: Pardalos P M, Rasskazova V, Vrahatis M N, eds. Black Box Optimization, Machine Learning, and No-Free Lunch Theorems. Cham: Springer, 2021, 109−135
[58]
Lim B, Grillotti L, Bernasconi L, Cully A. Dynamics-aware quality-diversity for efficient learning of skill repertoires. In: Proceedings of 2022 International Conference on Robotics and Automation. 2022, 5360−5366
[59]
Pierrot T, Richard G, Beguir K, Cully A. Multi-objective quality diversity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2022, 139−147
[60]
Chalumeau F, Boige R, Lim B, Macé V, Allard M, Flajolet A, Cully A, Pierrot T. Neuroevolution is a competitive alternative to reinforcement learning for skill discovery. 2022, arXiv preprint arXiv: 2210.03516
[61]
Samvelyan M, Rashid T, de Witt C S, Farquhar G, Nardelli N, Rudner T G J, Hung C M, Torr P H S, Foerster J, Whiteson S. The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 2186−2188
[62]
Oliehoek F A, Amato C. A Concise Introduction to Decentralized POMDPs. Cham: Springer, 2016
[63]
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D . Human-level control through deep reinforcement learning. Nature, 2015, 518( 7540): 529–533
[64]
Gronauer S, Diepold K . Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 2022, 55( 2): 895–943
[65]
Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis K G, Wan Y, Lewis F L, Cansever D, eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021, 321−384
[66]
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 4295−4304
[67]
Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 2974−2982
[68]
Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1587−1596
[69]
Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2019, 81−89
[70]
Zhou Z, Fu W, Zhang B, Wu Y. Continuously discovering novel strategies via reward-switching policy optimization. In: Proceedings of the 10th International Conference on Learning Representations. 2022
[71]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010
[72]
Zhou Z H . Open-environment machine learning. National Science Review, 2022, 9( 8): nwac123

Acknowledgements

This work was supported by the National Key R&D Program of China (2020AAA0107200), the National Natural Science Foundation of China (Grant Nos. 61921006, 61876119, 62276126), the Natural Science Foundation of Jiangsu (BK20221442), and the Program B for Outstanding PhD Candidate of Nanjing University. We thank Ziqian Zhang and Lihe Li for their useful suggestions.

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author(s) 2023. This article is published with open access at link.springer.com and journal.hep.com.cn
AI Summary AI Mindmap
PDF(7587 KB)

Accesses

Citations

Detail

Sections
Recommended

/