Improving meta-learning model via meta-contrastive loss
Pinzhuo TIAN, Yang GAO
Improving meta-learning model via meta-contrastive loss
Recently, addressing the few-shot learning issue with meta-learning framework achieves great success. As we know, regularization is a powerful technique and widely used to improve machine learning algorithms. However, rare research focuses on designing appropriate meta-regularizations to further improve the generalization of meta-learning models in few-shot learning. In this paper, we propose a novel meta-contrastive loss that can be regarded as a regularization to fill this gap. The motivation of our method depends on the thought that the limited data in few-shot learning is just a small part of data sampled from the whole data distribution, and could lead to various bias representations of the whole data because of the different sampling parts. Thus, the models trained by a few training data (support set) and test data (query set) might misalign in the model space, making the model learned on the support set can not generalize well on the query data. The proposed meta-contrastive loss is designed to align the models of support and query sets to overcome this problem. The performance of the meta-learning model in few-shot learning can be improved. Extensive experiments demonstrate that our method can improve the performance of different gradient-based meta-learning models in various learning problems, e.g., few-shot regression and classification.
meta-learning / few-shot learning / metaregularization / deep learning
[1] |
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126−1135
|
[2] |
Snell J, Swersky K, Zemel R S. Prototypical networks for few-shot learning. 2017, arXiv preprint arXiv: 1703.05175
|
[3] |
Tian P , Wu Z , Qi L , Wang L , Shi Y , Gao Y . Differentiable meta-learning model for few-shot semantic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34( 7): 12087– 12094
|
[4] |
Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge: MIT Press, 2016
|
[5] |
Kingma D P, Welling M. Auto-encoding variational bayes. 2014, arXiv preprint arXiv: 1312.6114
|
[6] |
Srivastava N , Hinton G , Krizhevsky A , Sutskever I , Salakhutdinov R . Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15( 1): 1929– 1958
|
[7] |
Chen T, Kornblith S, Norouzi M, Hinton G E. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1597−1607
|
[8] |
He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9726−9735
|
[9] |
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3637−3645
|
[10] |
Sung F, Yang Y, Zhang L, Xiang T, Torr P H S, Hospedales T M. Learning to compare: relation network for few-shot learning. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1199−1208
|
[11] |
Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
|
[12] |
Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. One-shot learning with memory-augmented neural networks. 2016, arXiv preprint arXiv: 1605.06065
|
[13] |
Lee H B, Lee H, Na D, Kim S, Park M, Yang E, Hwang S J. Learning to balance: bayesian meta-learning for imbalanced and out-of-distribution tasks. 2020, arXiv preprint arXiv: 1905.12917
|
[14] |
Raghu A, Raghu M, Bengio S, Vinyals O. Rapid learning or feature reuse? towards understanding the effectiveness of MAML. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019
|
[15] |
Lee K , Maji S , Ravichandran A , Soatto S . Meta-learning with differentiable convex optimization. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 1064,
|
[16] |
Bertinetto L, Henriques J F, Torr P H S, Vedaldi A. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
[17] |
Sinha A , Malo P , Deb K . A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 2018, 22( 2): 276– 295
|
[18] |
Balaji Y, Sankaranarayanan S, Chellappa R. MetaReg: towards domain generalization using meta-regularization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1006−1016
|
[19] |
Tseng H Y, Chen Y W, Tsai Y H, Liu S, Lin Y Y, Yang M H. Regularizing meta-learning via gradient dropout. In: Proceedings of the 15th Asian Conference on Computer Vision. 2020, 218– 234
|
[20] |
Jaiswal A , Babu A R , Zadeh M Z , Banerjee D , Makedon F . A survey on contrastive self-supervised learning. Technologies, 2021, 9( 1): 2–
|
[21] |
van den Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding. 2018, arXiv preprint arXiv: 1807.03748
|
[22] |
Tschannen M, Djolonga J, Rubenstein P K, Gelly S, Lucic M. On mutual information maximization for representation learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
|
[23] |
Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M. Bilevel programming for hyperparameter optimization and meta-learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1568−1577
|
[24] |
Cortes C , Vapnik V . Support-vector networks. Machine Learning, 1995, 20( 3): 273– 297
|
[25] |
Kingma D, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations. 2015
|
[26] |
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum J B, Larochelle H, Zemel R S. Meta-learning for semi-supervised few-shot classification. In: Proceedings of the 6th International Conference on Learning Representations. 2018
|
[27] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of 26th Annual Conference on Neural Information Processing Systems. 2012, 1106−1114
|
[28] |
Oreshkin B N, Rodriguez P, Lacoste A. TADAM: task dependent adaptive metric for improved few-shot learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 719– 729
|
[29] |
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P. Caltech-UCSD birds 200. CNS-TR-2010-001. Pasadena: California Institute of Technology, 2010
|
[30] |
Chen W Y, Liu Y C, Kira Z, Wang Y C F, Huang J B. A closer look at few-shot classification. In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
/
〈 | 〉 |