Self organizing optimization and phase transition in reinforcement learning minority game system

Si-Ping Zhang, Jia-Qi Dong, Hui-Yu Zhang, Yi-Xuan Lü, Jue Wang, Zi-Gang Huang

PDF(6336 KB)
PDF(6336 KB)
Front. Phys. ›› 2024, Vol. 19 ›› Issue (4) : 40201. DOI: 10.1007/s11467-023-1378-z
RESEARCH ARTICLE

Self organizing optimization and phase transition in reinforcement learning minority game system

Author information +
History +

Abstract

Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback−Leibler divergence and give the parameter position where the period-two appears.

Graphical abstract

Keywords

oscillatory evolution / collective behaviors / phase transition / reinforcement learning / minority game

Cite this article

Download citation ▾
Si-Ping Zhang, Jia-Qi Dong, Hui-Yu Zhang, Yi-Xuan Lü, Jue Wang, Zi-Gang Huang. Self organizing optimization and phase transition in reinforcement learning minority game system. Front. Phys., 2024, 19(4): 40201 https://doi.org/10.1007/s11467-023-1378-z

References

[1]
D.J. Sumpter, Collective Animal Behavior, Princeton University Press, 2010
[2]
A. Procaccini , A. Orlandi , A. Cavagna , I. Giardina , F. Zoratto , D. Santucci , F. Chiarotti , C. K. Hemelrijk , E. Alleva , G. Parisi , C. Carere , Propagating waves in starling . Sturnus vulgaris, flocks under predation. Anim. Behav., 2011, 82(4): 759
CrossRef ADS Google scholar
[3]
H. King , S. Ocko , L. Mahadevan . Termite mounds harness diurnal temperature oscillations for ventilation. Proc. Natl. Acad. Sci. USA, 2015, 112(37): 11589
CrossRef ADS Google scholar
[4]
C. R. Reid , T. Latty . Collective behaviour and swarm intelligence in slime moulds. FEMS Microbiol. Rev., 2016, 40(6): 798
CrossRef ADS Google scholar
[5]
Y. T. Lin , X. P. Han , B. K. Chen , J. Zhou , B. H. Wang . Evolution of innovative behaviors on scale-free networks. Front. Phys., 2018, 13(4): 130308
CrossRef ADS Google scholar
[6]
L. M. Ying , J. Zhou , M. Tang , S. G. Guan , Y. Zou . Mean-field approximations of fixation time distributions of evolutionary game dynamics on graphs. Front. Phys., 2018, 13(1): 130201
CrossRef ADS Google scholar
[7]
N. T. Ouellette . A physics perspective on collective animal behavior. Phys. Biol., 2022, 19(2): 021004
CrossRef ADS Google scholar
[8]
H. Murakami , M. S. Abe , Y. Nishiyama . Toward comparative collective behavior to discover fundamental mechanisms underlying behavior in human crowds and nonhuman animal groups. J. Robot. Mechatron., 2023, 35(4): 922
CrossRef ADS Google scholar
[9]
I. B. Muratore , S. Garnier . Ontogeny of collective behaviour. Philos. Trans. R. Soc. Lond. B, 2023, 378(1874): 20220065
CrossRef ADS Google scholar
[10]
Y. Liang , J. P. Huang . Robustness of critical points in a complex adaptive system: Effects of hedge behavior. Front. Phys., 2013, 8(4): 461
CrossRef ADS Google scholar
[11]
W.B. Arthur, Inductive reasoning and bounded rationality, Am. Econ. Rev. 84(2), 406 (1994), 106th Annual Meeting of the American-Economic-Association, BOSTON, MA, JAN 03-05, 1994
[12]
D. Challet , Y. Zhang . Emergence of cooperation and organization in an evolutionary game. Physica A, 1997, 246(3‒4): 407
CrossRef ADS Google scholar
[13]
T. Zhou , B. H. Wang , P. L. Zhou , C. X. Yang , J. Liu . Self-organized Boolean game on networks. Phys. Rev. E, 2005, 72(4): 046139
CrossRef ADS Google scholar
[14]
Z. G. Huang , J. Q. Zhang , J. Q. Dong , L. Huang , Y. C. Lai . Emergence of grouping in multi-resource minority game dynamics. Sci. Rep., 2012, 2(1): 703
CrossRef ADS Google scholar
[15]
J. Q. Zhang , Z. G. Huang , J. Q. Dong , L. Huang , Y. C. Lai . Controlling collective dynamics in complex minority-game resource-allocation systems. Phys. Rev. E, 2013, 87(5): 052808
CrossRef ADS Google scholar
[16]
J. Q. Dong , Z. G. Huang , L. Huang , Y. C. Lai . Triple grouping and period-three oscillations in minority-game dynamics. Phys. Rev. E, 2014, 90(6): 062917
CrossRef ADS Google scholar
[17]
A. Cuesta , O. Abreu , D. Alvear . Methods for measuring collective behaviour in evacuees. Saf. Sci., 2016, 88: 54
CrossRef ADS Google scholar
[18]
X. H. Li , G. Yang , J. P. Huang . Chaotic−periodic transition in a two-sided minority game. Front. Phys., 2016, 11(4): 118901
CrossRef ADS Google scholar
[19]
L. Chen . Complex network minority game model for the financial market modeling and simulation. Complexity, 2020, 2020: 8877886
CrossRef ADS Google scholar
[20]
S. Biswas , A. K. Mandal . Parallel Minority Game and its application in movement optimization during an epidemic. Physica A, 2021, 561: 125271
CrossRef ADS Google scholar
[21]
T. Ritmeester , H. Meyer-Ortmanns . Minority games played by arbitrageurs on the energy market. Physica A, 2021, 573: 125927
CrossRef ADS Google scholar
[22]
B. Majumder , T. G. Venkatesh . Mobile data offloading based on minority game theoretic framework. Wirel. Netw., 2022, 28(7): 2967
CrossRef ADS Google scholar
[23]
J. Linde , D. Gietl , J. Sonnemans , J. Tuinstra . The effect of quantity and quality of information in strategy tournaments. J. Econ. Behav. Organ., 2023, 211: 305
CrossRef ADS Google scholar
[24]
D. Carlucci , P. Renna , S. Materi , G. Schiuma . Intelligent decision-making model based on minority game for resource allocation in cloud manufacturing. Manage. Decis., 2020, 58(11): 2305
CrossRef ADS Google scholar
[25]
A. Swain , W. E. Fagan . Group size and decision making: experimental evidence for minority games in fish behaviour. Anim. Behav., 2019, 155: 9
CrossRef ADS Google scholar
[26]
T. Ritmeester , H. Meyer-Ortmanns . The cavity method for minority games between arbitrageurs on financial markets. J. Stat. Mech., 2022, 2022(4): 043403
CrossRef ADS Google scholar
[27]
Y. Deng , F. Bao , Y. Kong , Z. Ren , Q. Dai . Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst., 2017, 28(3): 653
CrossRef ADS Google scholar
[28]
Z.JiangD.XuJ.Liang, A deep reinforcement learning framework for the financial Portfolio management problem, arXiv: 1706.10059 (2017)
[29]
H.YangX.Y. LiuS.ZhongA.Walid, in: Proceedings of the First ACM International Conference on AI in Finance, ICAIF’20, Association for Computing Machinery, New York, NY, USA, 2021
[30]
J. A. Cruz , D. S. Wishart . Applications of machine learning in cancer prediction and prognosis. Cancer Inform., 2007, 2: 59
CrossRef ADS Google scholar
[31]
J. J. Tompson , A. Jain , Y. LeCun , C. Bregler . Joint training of a convolutional network and a graphical model for human pose estimation. Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, 1: 1799
CrossRef ADS Google scholar
[32]
D. Silver , A. Huang , C. J. Maddison , A. Guez , L. Sifre , G. van den Driessche , J. Schrittwieser , I. Antonoglou , V. Panneershelvam , M. Lanctot , S. Dieleman , D. Grewe , J. Nham , N. Kalchbrenner , I. Sutskever , T. Lillicrap , M. Leach , K. Kavukcuoglu , T. Graepel , D. Hassabis . Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484
CrossRef ADS Google scholar
[33]
V. Mnih , K. Kavukcuoglu , D. Silver , A. A. Rusu , J. Veness , M. G. Bellemare , A. Graves , M. Riedmiller , A. K. Fidjeland , G. Ostrovski , S. Petersen , C. Beattie , A. Sadik , I. Antonoglou , H. King , D. Kumaran , D. Wierstra , S. Legg , D. Hassabis . Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529
CrossRef ADS Google scholar
[34]
H. Huang , Y. Cai , H. Xu , H. Yu . A multiagent minority-game-based demand-response management of smart buildings toward peak load reduction. IEEE Trans. Comput. Aided Des. Integrated Circ. Syst., 2017, 36(4): 573
CrossRef ADS Google scholar
[35]
M.HesselJ.ModayilH.Van HasseltT.SchaulG.OstrovskiW.DabneyD.HorganB.PiotM.AzarD.Silver, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 (2018)
[36]
S. P. Zhang , J. Q. Zhang , L. Chen , X. D. Liu . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301
CrossRef ADS Google scholar
[37]
L. Wang , D. Jia , L. Zhang , P. Zhu , M. Perc , L. Shi , Z. Wang . Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning. Nonlinear Dyn., 2022, 108(2): 1837
CrossRef ADS Google scholar
[38]
J. Xu , L. Wang , Y. Liu , H. Xue . Event-triggered optimal containment control for multi-agent systems subject to state constraints via reinforcement learning. Nonlinear Dyn., 2022, 109(3): 1651
CrossRef ADS Google scholar
[39]
S. P. Zhang , J. Q. Dong , L. Liu , Z. G. Huang , L. Huang , Y. C. Lai . Reinforcement learning meets minority game: Toward optimal resource allocation. Phys. Rev. E, 2019, 99(3): 032302
CrossRef ADS Google scholar
[40]
S. P. Zhang , J. Q. Zhang , Z. G. Huang , B. H. Guo , Z. X. Wu , J. Wang . Collective behavior of artificial intelligence population: Transition from optimization to game. Nonlinear Dyn., 2019, 95(2): 1627
CrossRef ADS Google scholar
[41]
S. P. Zhang , J. Q. Zhang , L. Chen , X. D. Liu . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301
CrossRef ADS Google scholar
[42]
A.V. BanerjeeE.Duflo, Poor economics: A radical rethinking of the way to fight global poverty, Public Affairs, 2012
[43]
C. J. Watkins , P. Dayan . Q-learning. Mach. Learn., 1992, 8: 279
CrossRef ADS Google scholar
[44]
M. Cao , A. S. Morse , B. D. Anderson . Coordination of an asynchronous multi-agent system via averaging. IFAC Proceedings Volumes, 2005, 38(1): 17
CrossRef ADS Google scholar
[45]
H. L. Zeng , M. Alava , E. Aurell , J. Hertz , Y. Roudi . Maximum likelihood reconstruction for Ising models with asynchronous updates. Phys. Rev. Lett., 2013, 110(21): 210601
CrossRef ADS Google scholar
[46]
J. Q. Zhang , Z. G. Huang , Z. X. Wu , R. Su , Y. C. Lai . Controlling herding in minority game systems. Sci. Rep., 2016, 6(1): 20925
CrossRef ADS Google scholar
[47]
K. Binder . Theory of first-order phase transitions. Rep. Prog. Phys., 1987, 50(7): 783
CrossRef ADS Google scholar
[48]
K. Binder . Applications of Monte Carlo methods to statistical physics. Rep. Prog. Phys., 1997, 60(5): 487
CrossRef ADS Google scholar
[49]
G. Grégoire , H. Chaté . Onset of collective and cohesive motion. Phys. Rev. Lett., 2004, 92(2): 025702
CrossRef ADS Google scholar
[50]
M. Nagy , I. Daruka , T. Vicsek . New aspects of the continuous phase transition in the scalar noise model (SNM) of collective motion. Physica A, 2007, 373: 445
CrossRef ADS Google scholar
[51]
J. M. Encinas , C. E. Fiore . Influence of distinct kinds of temporal disorder in discontinuous phase transitions. Phys. Rev. E, 2021, 103(3): 032124
CrossRef ADS Google scholar
[52]
A.D. Sokal, Course 16 - Simulation of Statistical Mechanics Models, Elsevier, 2006

Declarations

The authors declare that they have no competing interests and there are no conflicts.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 12105213), China Postdoctoral Science Foundation (No. 2020M673363), and the Natural Science Basic Research Program of Shaanxi (No. 2021JQ-007).

RIGHTS & PERMISSIONS

2024 Higher Education Press
AI Summary AI Mindmap
PDF(6336 KB)

Accesses

Citations

Detail

Sections
Recommended

/