Open and real-world human-AI coordination by heterogeneous training with communication

Cong GUAN , Ke XUE , Chunpeng FAN , Feng CHEN , Lichao ZHANG , Lei YUAN , Chao QIAN , Yang YU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194314

PDF (3226KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (4) : 194314 DOI: 10.1007/s11704-024-3797-6
Artificial Intelligence
RESEARCH ARTICLE

Open and real-world human-AI coordination by heterogeneous training with communication

Author information +
History +
PDF (3226KB)

Abstract

Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.

Graphical abstract

Keywords

human-AI coordination / multi-agent reinforcement learning / communication / open-environment coordination / real-world coordination

Cite this article

Download citation ▾
Cong GUAN, Ke XUE, Chunpeng FAN, Feng CHEN, Lichao ZHANG, Lei YUAN, Chao QIAN, Yang YU. Open and real-world human-AI coordination by heterogeneous training with communication. Front. Comput. Sci., 2025, 19(4): 194314 DOI:10.1007/s11704-024-3797-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Klein G, Woods D D, Bradshaw J M, Hoffman R R, Feltovich P J . Ten challenges for making automation a “team player” in joint human-agent activity. IEEE Intelligent Systems, 2004, 19( 6): 91–95

[2]

Dafoe A, Bachrach Y, Hadfield G, Horvitz E, Larson K, Graepel T . Cooperative AI: machines must learn to find common ground. Nature, 2021, 593( 7857): 33–36

[3]

Hernandez-Leal P, Kartal B, Taylor M E . A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2019, 33( 6): 750–797

[4]

Du W, Ding S F . A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artificial Intelligence Review, 2021, 54( 5): 3215–3238

[5]

Oroojlooy A, Hajinezhad D . A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53( 11): 13677–13722

[6]

Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6382−6393

[7]

Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo J Z, Tuyls K, Graepel T. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 2085−2087

[8]

Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 4295−4304

[9]

Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A M, Wu Y. The surprising effectiveness of PPO in cooperative multi-agent games. In: Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022, 24611−24624

[10]

Gorsane R, Mahjoub O, De Kock R J, Dubb R, Singh S, Pretorius A. Towards a standardised performance evaluation protocol for cooperative marl. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 2022, 5510−5521

[11]

Hu H, Lerer A, Peysakhovich A, Foerster J. “Other-play” for zero-shot coordination. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 409

[12]

Carroll M, Shah R, Ho M K, Griffiths T, Seshia S A, Abbeel P, Dragan A. On the utility of learning about humans for human-AI coordination. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 465

[13]

Yuan L, Li L, Zhang Z, Chen F, Zhang T, Guan C, Yu Y, Zhou Z H. Learning to coordinate with anyone. In: Proceedings of the 5th International Conference on Distributed Artificial Intelligence, 2023, 4

[14]

Zhou Z H . Open-environment machine learning. National Science Review, 2022, 9( 8): nwac123

[15]

Liu X, Liang J, Liu D Y, Chen R, Yuan S M . Weapon-target assignment in unreliable peer-to-peer architecture based on adapted artificial bee colony algorithm. Frontiers of Computer Science, 2022, 16( 1): 161103

[16]

Parmar J, Chouhan S, Raychoudhury V, Rathore S . Open-world machine learning: applications, challenges, and opportunities. ACM Computing Surveys, 2023, 55( 10): 205

[17]

Yuan L, Zhang Z, Li L, Guan C, Yu Y. A survey of progress on cooperative multi-agent reinforcement learning in open environment. 2023, arXiv preprint arXiv: 2312.01058

[18]

Stone P, Kaminka G A, Kraus S, Rosenschein J S. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1504−1509

[19]

Mirsky R, Carlucho I, Rahman A, Fosong E, Macke W, Sridharan M, Stone P, Albrecht S V. A survey of ad Hoc teamwork research. In: Proceedings of the 19th European Conference on Multi-Agent Systems. 2022, 275−293

[20]

Lupu A, Cui B, Hu H, Foerster J. Trajectory diversity for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 7204−7213

[21]

Strouse D J, McKee K R, Botvinick M, Hughes E, Everett R. Collaborating with humans without human data. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 14502−14515

[22]

Zhao R, Song J, Yuan Y, Hu H, Gao Y, Wu Y, Sun Z, Yang W. Maximum entropy population-based training for zero-shot human-AI coordination. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 689

[23]

Yu C, Gao J, Liu W, Xu B, Tang H, Yang J, Wang Y, Wu Y. Learning zero-shot cooperation with humans, assuming humans are biased. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[24]

Wang X, Zhang S, Zhang W, Dong W, Chen J, Wen Y, Zhang W. Quantifying zero-shot coordination capability with behavior preferring partners. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[25]

Kapetanakis S, Kudenko D. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems. 2004, 1258−1259

[26]

Wang C, Pérez-D’Arpino C, Xu D, Li F F, Liu K, Savarese S. Co-GAIL: Learning diverse strategies for human-robot collaboration. In: Proceedings of the 5th Conference on Robot Learning. 2022, 1279−1290

[27]

Xue K, Wang Y, Guan C, Yuan L, Fu H, Fu Q, Qian C, Yu Y. Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957

[28]

Cabrera C, Paleyes A, Thodoroff P, Lawrence N D. Real-world machine learning systems: a survey from a data-oriented architecture perspective. 2023, arXiv preprint arXiv: 2302.04810

[29]

Davenport T H, Ronanki R. Artificial intelligence for the real world. Harvard Business Review, 2018, 96(1): 108−116

[30]

Fontaine M C, Hsu Y C, Zhang Y, Tjanaka B, Nikolaidis S. On the importance of environments in human-robot coordination. In: Proceedings of the 17th Robotics: Science and Systems 2021. 2021

[31]

Busoniu L, Babuska R, De Schutter B . A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38( 2): 156–172

[32]

Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis K G, Wan Y, Lewis F L, Cansever D, eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021, 321−384

[33]

Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar T K S, Koenig S, Choset H . Primal: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4( 3): 2378–2385

[34]

Wang J, Xu W, Gu Y, Song W, Green T C. Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Proceedings of the 35th Conference on Advances in Neural Information Processing Systems. 2021, 3271−3284

[35]

Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, Yu Y. Multi-agent dynamic algorithm configuration. In: Proceedings of the 36th Conference on Advances in Neural Information Processing Systems. 2022, 20147−20161

[36]

Wen M, Kuba J G, Lin R, Zhang W, Wen Y, Wang J, Yang Y. Multi-agent reinforcement learning is a sequence modeling problem. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 16509−16521

[37]

Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, Hung C, Torr P H S, Foerster J N, Whiteson S. The starcraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 2186−2188

[38]

Bard N, Foerster J N, Chandar S, Burch N, Lanctot M, Song H F, Parisotto E, Dumoulin V, Moitra S, Hughes E, Dunning I, Mourad S, Larochelle H, Bellemare M G, Bowling M . The hanabi challenge: A new frontier for AI research. Artificial Intelligence, 2020, 280: 103216

[39]

Zhu C, Dastani M, Wang S. A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975

[40]

Zhang F, Jia C, Li Y C, Yuan L, Yu Y, Zhang Z. Discovering generalizable multi-agent coordination skills from multi-task offline data. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[41]

Wang X, Zhang Z, Zhang W. Model-based multi-agent reinforcement learning: Recent progress and prospects. 2022, arXiv preprint arXiv: 2203.10603

[42]

Guo J, Chen Y, Hao Y, Yin Z, Yu Y, Li S. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2022

[43]

Yuan L, Zhang Z, Xue K, Yin H, Chen F, Guan C, Li L, Qian C, Yu Y. Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 1319

[44]

Foerster J N, Assael Y M, De Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2145−2153

[45]

Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2252−2260

[46]

Ding Z, Huang T, Lu Z. Learning individually inferred communication for multi-agent cooperation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1851

[47]

Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y. Learning agent communication under limited bandwidth by message pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 5142−5149

[48]

Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, Zhang C. Multi-agent incentive communication via decentralized teammate modeling. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 9466−9474

[49]

Zhang S Q, Zhang Q, Lin J. Efficient communication in multi-agent reinforcement learning via variance based control. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 291

[50]

Zhang S Q, Zhang Q, Lin J. Succinct and robust multi-agent communication with temporal message control. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1449

[51]

Guan C, Chen F, Yuan L, Wang C, Yin H, Zhang Z, Yu Y. Efficient multi-agent communication via self-supervised information aggregation. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 1020−1033

[52]

Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J. TarMAC: Targeted multi-agent communication. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1538−1546

[53]

Guan C, Chen F, Yuan L, Zhang Z, Yu Y. Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. 2023, arXiv preprint arXiv: 2302.09605

[54]

Yuan L, Jiang T, Li L, Chen F, Zhang Z, Yu Y. Robust multi-agent communication via multi-view message certification. 2023, arXiv preprint arXiv: 2305.13936

[55]

Yuan L, Chen F, Zhang Z, Yu Y . Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation. Frontiers of Computer Science, 2024, 18( 6): 186331

[56]

Gwak J, Jung J, Oh R, Park M, Rakhimov M A K, Ahn J . A review of intelligent self-driving vehicle software research. KSII Transactions on Internet and Information Systems (TIIS), 2019, 13( 11): 5299–5320

[57]

Andrychowicz O M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L L, Zaremba W . Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 2020, 39( 1): 3–20

[58]

Engelbart D C. Augmenting human intellect: a conceptual framework. Stanford Research Institute, 2023

[59]

Carter S, Nielsen M . Using artificial intelligence to augment human intelligence. Distill, 2017, 2( 12): e9

[60]

Hu H, Lerer A, Cui B, Pineda L, Brown N, Foerster J N. Off-belief learning. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4369−4379

[61]

Treutlein J, Dennis M, Oesterheld C, Foerster J. A new formalism, method and open issues for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 10413−10423

[62]

Li Y, Zhang S, Sun J, Du Y, Wen Y, Wang X, Pan W. Cooperative open-ended learning framework for zero-shot coordination. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 844

[63]

Oliehoek F A, Amato C. A Concise Introduction to Decentralized POMDPs. Cham: Springer, 2016

[64]

Xue W, Qiu W, An B, Rabinovich Z, Obraztsova S, Yeo C K. Mis-spoke or mis-lead: Achieving robustness in multi-agent communicative reinforcement learning. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 1418−1426

[65]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017, arXiv preprint arXiv: 1712.01815

[66]

Tesauro G . TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6( 2): 215–219

[67]

Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846

[68]

Lucas K, Allen R E. Any-play: an intrinsic augmentation for zero shot coordination. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 853–861

[69]

Mondal W U, Agarwal M, Aggarwal V, Ukkusuri S V . On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC). Journal of Machine Learning Research, 2022, 23( 1): 129

[70]

Kuba J G, Feng X, Ding S, Dong H, Wang J, Yang Y. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative MARL. 2022, arXiv preprint arXiv: 2208.01682

[71]

Charakorn R, Manoonpong P, Dilokthanakul N. Generating diverse cooperative agents by learning incompatible policies. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[72]

Lou X, Guo J, Zhang J, Wang J, Huang K, Du Y. PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination. In: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems. 2023, 679−688

[73]

Zheng S, Trott A, Srinivasa S, Naik N, Gruesbeck M, Parkes D C, Socher R. The AI economist: Improving equality and productivity with AI-Driven tax policies. 2020, arXiv preprint arXiv: 2004. 13332

[74]

Bäck T. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. New York: Oxford University Press, 1996

[75]

Hao H, Zhang X, Zhou A . Enhancing SAEAs with unevaluated solutions: A case study of relation model for expensive optimization. Science China Information Sciences, 2024, 67( 2): 120103

[76]

Wang Y, Xue K, Qian C. Evolutionary diversity optimization with clustering-based selection for reinforcement learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[77]

Demšar J . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (3226KB)

Supplementary files

FCS-23797-OF-CG_suppl_1

2020

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/