Opponent modeling with trajectory representation clustering

Yongliang Lv , Yan Zheng , Jianye Hao

Intelligence & Robotics ›› 2022, Vol. 2 ›› Issue (2) : 168 -79.

PDF
Intelligence & Robotics ›› 2022, Vol. 2 ›› Issue (2) :168 -79. DOI: 10.20517/ir.2022.09
Research Article

Opponent modeling with trajectory representation clustering

Author information +
History +
PDF

Abstract

For a non-stationary opponent in a multi-agent environment, traditional methods model the opponent through its complex information to learn one or more optimal response policies. However, the response policy learned earlier is prone to catastrophic forgetting due to data imbalance in the online-updated replay buffer for non-stationary changes of opponent policies. This paper focuses on how to learn new response policies without forgetting old policies that have been learned when the opponent policy is constantly changing. We extract the representation of opponent policies and make explicit clustering distinctions through the contrastive learning autoencoder. With the idea of balancing the replay buffer, we maintain continuous learning of the trajectory data of various opponent policies that have appeared to avoid policy forgetting. Finally, we demonstrate the effectiveness of the method under a classical opponent modeling environment (soccer) and show the clustering effect of different opponent policies.

Keywords

Non-stationary / opponent modeling / contrastive learning / trajectory representation / data balance

Cite this article

Download citation ▾
Yongliang Lv, Yan Zheng, Jianye Hao. Opponent modeling with trajectory representation clustering. Intelligence & Robotics, 2022, 2(2): 168-79 DOI:10.20517/ir.2022.09

登录浏览全文

4963

注册一个新账户 忘记密码

References

AI Summary AI Mindmap
PDF

54

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/