MAML2: meta reinforcement learning via meta-learning for task categories
Qiming FU , Zhechao WANG , Nengwei FANG , Bin XING , Xiao ZHANG , Jianping CHEN
Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (4) : 174325
MAML2: meta reinforcement learning via meta-learning for task categories
Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.
meta-learning / reinforcement learning / few-shot learning / negative adaptation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Kirsch L, van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the 8th International Conference on Learning Representations. 2020 |
| [14] |
Sohn S, Woo H, Choi J, Lee H. Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the 8th International Conference on Learning Representations. 2020 |
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Brafman R I, Tennenholtz M. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 953–958 |
| [20] |
|
| [21] |
|
| [22] |
Wang H, Dong S, Shao L. Measuring structural similarities in finite MDPs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 3684–3690 |
| [23] |
Song J, Gao Y, Wang H, An B. Measuring the distance between finite Markov decision processes. In: Proceedings of 2016 International Conference on Autonomous Agents and Multiagent Systems. 2016, 468–476 |
| [24] |
Hu Y, Gao Y, An B. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In: Proceedings of 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015, 753–761 |
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |