MAML2: meta reinforcement learning via meta-learning for task categories

Qiming FU , Zhechao WANG , Nengwei FANG , Bin XING , Xiao ZHANG , Jianping CHEN

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (4) : 174325

PDF (13011KB)
Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (4) : 174325 DOI: 10.1007/s11704-022-2037-1
Artificial Intelligence
RESEARCH ARTICLE

MAML2: meta reinforcement learning via meta-learning for task categories

Author information +
History +
PDF (13011KB)

Abstract

Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.

Graphical abstract

Keywords

meta-learning / reinforcement learning / few-shot learning / negative adaptation

Cite this article

Download citation ▾
Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML2: meta reinforcement learning via meta-learning for task categories. Front. Comput. Sci., 2023, 17(4): 174325 DOI:10.1007/s11704-022-2037-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1842–1850

[2]

Hochreiter S, Younger A S, Conwell P R. Learning to learn using gradient descent. In: Proceedings of the International Conference on Artificial Neural Networks. 2001, 87–94

[3]

Schmidhuber J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Computation, 1992, 4( 1): 131–139

[4]

Fakoor R, Chaudhari P, Soatto S, Smola A J. Meta-Q-learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[5]

Wang J X, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo J Z, Munos R, Blundell C, Kumaran D, Botvinick M. Learning to reinforcement learn. 2016, arXiv preprint arXiv: 1611.05763

[6]

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

[7]

Deleu T, Bengio Y. The effects of negative adaptation in Model-Agnostic Meta-Learning. In: Proceedings of the 2nd Workshop on Meta-Learning. 2018

[8]

Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman M L. Lipschitz lifelong reinforcement learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 8270–8278

[9]

Finn C, Rajeswaran A, Kakade S, Levine S. Online meta-learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1920–1930

[10]

Nguyen T, Luu T, Pham T, Rakhimkul S, Yoo C D. Robust MAML: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 3460–3464

[11]

Clavera I, Rothfuss J, Schulman J, Fujita Y, Asfour T, Abbeel P. Model-based reinforcement learning via meta-policy optimization. In: Proceedings of the 2nd Conference on Robot Learning. 2018, 617–629

[12]

Seo Y, Lee K, Clavera I, Kurutach T, Shin J, Abbeel P. Trajectory-wise multiple choice learning for dynamics generalization in reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 12968−12979

[13]

Kirsch L, van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[14]

Sohn S, Woo H, Choi J, Lee H. Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[15]

Guzman-Rivera A, Kohli P, Batra D, Rutenbar R A. Efficiently enforcing diversity in multi-output structured prediction. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics. 2014, 284–292

[16]

Guzmán-Rivera A, Batra D, Kohli P. Multiple choice learning: learning to produce multiple structured outputs. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1799–1807

[17]

Silver D L, Yang Q, Li L. Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of 2013 AAAI Spring Symposium Series. 2013, 49–55

[18]

Brunskill E, Li L. PAC-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 316–324

[19]

Brafman R I, Tennenholtz M. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 953–958

[20]

Abel D, Jinnai Y, Guo S Y, Konidaris G, Littman M L. Policy and value transfer in lifelong reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 20–29

[21]

Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999

[22]

Wang H, Dong S, Shao L. Measuring structural similarities in finite MDPs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 3684–3690

[23]

Song J, Gao Y, Wang H, An B. Measuring the distance between finite Markov decision processes. In: Proceedings of 2016 International Conference on Autonomous Agents and Multiagent Systems. 2016, 468–476

[24]

Hu Y, Gao Y, An B. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In: Proceedings of 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015, 753–761

[25]

Fu Q M, Liu Q, You S H, Huang W, Zhang X F. A novel fast Sarsa algorithm based on value function transfer. Acta Electronica Sinica, 2014, 42( 11): 2157–2161

[26]

Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018

[27]

Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL2: fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv: 1611.02779

[28]

Rakelly K, Zhou A, Finn C, Levine S, Quillen D. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5331–5340

[29]

Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1861–1870

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (13011KB)

Supplementary files

FCS-22037-OF-QF_suppl_1

2403

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/