Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Zhenyi ZHANG; Jie HUANG; Congjie PAN

doi:10.1631/FITEE.2300394

PDF(1234 KB)

Front. Inform. Technol. Electron. Eng ›› 2024, Vol. 25 ›› Issue (6) : 869-886. DOI: 10.1631/FITEE.2300394

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Zhenyi ZHANG¹^,² ,
Jie HUANG¹^,² ,
Congjie PAN¹^,²

Author information +

History +

Abstract

Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

Keywords

Reinforcement learning / Behavioral control / Second-order systems / Mission supervisor

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems. Front. Inform. Technol. Electron. Eng, 2024, 25(6): 869‒886 https://doi.org/10.1631/FITEE.2300394