MAML<sup>2</sup>: meta reinforcement learning via meta-learning for task categories

Qiming FU; Zhechao WANG; Nengwei FANG; Bin XING; Xiao ZHANG; Jianping CHEN

doi:10.1007/s11704-022-2037-1

PDF(13011 KB)

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (4) : 174325. DOI: 10.1007/s11704-022-2037-1

Artificial Intelligence

RESEARCH ARTICLE

MAML²: meta reinforcement learning via meta-learning for task categories

Author information +

History +

Abstract

Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.

Graphical abstract

Keywords

meta-learning / reinforcement learning / few-shot learning / negative adaptation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML²: meta reinforcement learning via meta-learning for task categories. Front. Comput. Sci., 2023, 17(4): 174325 https://doi.org/10.1007/s11704-022-2037-1

This is a preview of subscription content, contact us for subscripton.

Qiming Fu received his PhD degree from the School of Computer Science and Technology of Soochow University, China in 2014. He is an associate professor at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests include reinforcement learning, deep learning, and Intelligent Information Processing

Zhechao Wang is currently working towards his master’s degree at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests lie in reinforcement learning, including meta-learning and transfer learning

Nengwei Fang graduated from the Department of Mechanical Engineering, Tsinghua University, China and achieved a Master’s degree in Materials Science and Engineering. His research interests include applications in the area of industrial big-data. He is currently the Chairman of the Board of Directors in Chongqing Innovation Center of Industrial Big-Data Co. Ltd

Bin Xing is Chief scientist of National Engineering Laboratory of Industrial Big-Data Application Technology. He worked from 1998 to 2017, in DASSULT Group and ATOS ORIGIN in France, as Senior data consultant, Data analysis scientist. Visiting Professor, University of Electronic Science and Technology of China. He graduated with a PhD, from the Central Polytechnic University of Paris (Ecole Centrale de Paris), France, in Data analysis and processing major in 1998. His main research directions and contents include industrial big data analysis, key technologies of intelligent manufacturing, industrial mechanism models, and AI application

Xiao Zhang received his PhD degree from the Department of Computer Science and Electrical Engineering of University of Wisconsin-Milwaukee, USA. He is a professor and the dean at the School of Medical Informatics, Xuzhou Medical University, China. His research interests include precision medicine, artificial intelligence and machine learning in healthcare and medical sector

Jianping Chen received the Bachelor’s degree from Southeast University, China, the Master’s degree from New Jersey Institute of Technology, USA, the MBA from Washington University in St. Louis, USA, and the Doctor’s degree from University of Nice Sophia Antipolis, France. He is currently a professor at Suzhou University of Science and Technology, a director of the Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency and a director of the Suzhou City Key Laboratory of Mobile Networking and Applied Technology. His research interests include big data and analytics, building energy efficiency, and machine learning

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1842–1850

[2]	Hochreiter S, Younger A S, Conwell P R. Learning to learn using gradient descent. In: Proceedings of the International Conference on Artificial Neural Networks. 2001, 87–94

[3]	Schmidhuber J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Computation, 1992, 4( 1): 131–139

[4]	Fakoor R, Chaudhari P, Soatto S, Smola A J. Meta-Q-learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[5]	Wang J X, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo J Z, Munos R, Blundell C, Kumaran D, Botvinick M. Learning to reinforcement learn. 2016, arXiv preprint arXiv: 1611.05763

[6]	Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

[7]	Deleu T, Bengio Y. The effects of negative adaptation in Model-Agnostic Meta-Learning. In: Proceedings of the 2nd Workshop on Meta-Learning. 2018

[8]	Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman M L. Lipschitz lifelong reinforcement learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 8270–8278

[9]	Finn C, Rajeswaran A, Kakade S, Levine S. Online meta-learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1920–1930

[10]	Nguyen T, Luu T, Pham T, Rakhimkul S, Yoo C D. Robust MAML: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 3460–3464

[11]	Clavera I, Rothfuss J, Schulman J, Fujita Y, Asfour T, Abbeel P. Model-based reinforcement learning via meta-policy optimization. In: Proceedings of the 2nd Conference on Robot Learning. 2018, 617–629

[12]	Seo Y, Lee K, Clavera I, Kurutach T, Shin J, Abbeel P. Trajectory-wise multiple choice learning for dynamics generalization in reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 12968−12979

[13]	Kirsch L, van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[14]	Sohn S, Woo H, Choi J, Lee H. Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[15]	Guzman-Rivera A, Kohli P, Batra D, Rutenbar R A. Efficiently enforcing diversity in multi-output structured prediction. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics. 2014, 284–292

[16]	Guzmán-Rivera A, Batra D, Kohli P. Multiple choice learning: learning to produce multiple structured outputs. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1799–1807

[17]	Silver D L, Yang Q, Li L. Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of 2013 AAAI Spring Symposium Series. 2013, 49–55

[18]	Brunskill E, Li L. PAC-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 316–324

[19]	Brafman R I, Tennenholtz M. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 953–958

[20]	Abel D, Jinnai Y, Guo S Y, Konidaris G, Littman M L. Policy and value transfer in lifelong reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 20–29

[21]	Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999

[22]	Wang H, Dong S, Shao L. Measuring structural similarities in finite MDPs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 3684–3690

[23]	Song J, Gao Y, Wang H, An B. Measuring the distance between finite Markov decision processes. In: Proceedings of 2016 International Conference on Autonomous Agents and Multiagent Systems. 2016, 468–476

[24]	Hu Y, Gao Y, An B. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In: Proceedings of 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015, 753–761

[25]	Fu Q M, Liu Q, You S H, Huang W, Zhang X F. A novel fast Sarsa algorithm based on value function transfer. Acta Electronica Sinica, 2014, 42( 11): 2157–2161

[26]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018

[27]	Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL²: fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv: 1611.02779

[28]	Rakelly K, Zhou A, Finn C, Levine S, Quillen D. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5331–5340

[29]	Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1861–1870

Acknowledgements

This work was financially supported by the National Key R&D Program of China (2020YFC2006602), the National Natural Science Foundation of China (Grant Nos. 62072324, 61876217, 61876121, 61772357), University Natural Science Foundation of Jiangsu Province (No. 21KJA520005), Primary Research and Development Plan of Jiangsu Province (BE2020026), Natural Science Foundation of Jiangsu Province (BK20190942), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX21_3020).