BiaMix Contrastive Learning and Memory Similarity Distillation in Class-Incremental Learning

Mang Ye , Wenke Huang , Zekun Shi , Zhiwei Ye , Bo Du

CAAI Transactions on Intelligence Technology ›› 2025, Vol. 10 ›› Issue (6) : 1745 -1758.

PDF (8048KB)
CAAI Transactions on Intelligence Technology ›› 2025, Vol. 10 ›› Issue (6) :1745 -1758. DOI: 10.1049/cit2.70064
ORIGINAL RESEARCH
research-article

BiaMix Contrastive Learning and Memory Similarity Distillation in Class-Incremental Learning

Author information +
History +
PDF (8048KB)

Abstract

Class-incremental learning studies the problem of continually learning new classes from data streams. But networks suffer from catastrophic forgetting problems, forgetting past knowledge when acquiring new knowledge. Among different approaches, replay methods have shown exceptional promise for this challenge. But performance still baffies from two aspects: (i) data in imbalanced distribution and (ii) networks with semantic inconsistency. First, due to limited memory buffer, there exists imbalance between old and new classes. Direct optimisation would lead feature space skewed towards new classes, resulting in performance degradation on old classes. Second, existing methods normally leverage previous network to regularise the present network. However, the previous network is not trained on new classes, which means that these two networks are semantic inconsistent, leading to misleading guidance information. To address these two problems, we propose BCSD (BiaMix contrastive learning and memory similarity distillation). For imbalanced distribution, we design Biased MixUp, where mixed samples are in high weight from old classes and low weight from new classes. Thus, network learns to push decision boundaries towards new classes. We further leverage label information to construct contrastive learning in order to ensure discriminability. Meanwhile, for semantic inconsistency, we distill knowledge from the previous network by capturing the similarity of new classes in current tasks to old classes from the memory buffer and transfer that knowledge to the present network. Empirical results on various datasets demonstrate its effectiveness and efficiency.

Keywords

artificial intelligence / catastrophic forgetting / continual learning / deep learning

Cite this article

Download citation ▾
Mang Ye, Wenke Huang, Zekun Shi, Zhiwei Ye, Bo Du. BiaMix Contrastive Learning and Memory Similarity Distillation in Class-Incremental Learning. CAAI Transactions on Intelligence Technology, 2025, 10(6): 1745-1758 DOI:10.1049/cit2.70064

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

A. Krizhevsky,I. Sutskever, and G. E. Hinton, “Imagenet Classifi-cation With Deep Convolutional Neural Networks,” in Conference on Neural Information Processing Systems, (2012), 1097-1105.

[2]

K. He, X. Zhang,S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Conference on Computer Vision and Pattern Recognition, (2016), 770-778.

[3]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Inter-national Conference on Learning Representations, (2021).

[4]

O. Russakovsky, D. Jia, H. Su, et al., “Imagenet Large Scale Visual Recognition Challenge,” International journal of computer vision, 115, no. 3 (2015): 211-252, https://doi.org/10.1007/s11263-015-0816-y.

[5]

M. Cordts, M. Omran, S. Ramos, et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Conference on Computer Vision and Pattern Recognition, (2016).

[6]

Li He, M. Ye, and Bo Du, “Weperson: Learning a Generalized Re-Identification Model From All-Weather Virtual Data,” in Proceedings of the 29th ACM International Conference on Multimedia, (2021).

[7]

D. L. Silver and R. E. Mercer, “The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data,” in Conference of the Canadian Society for Computational Studies of Intelligence, (2002), 90-101.

[8]

R. Aljundi, F. Babiloni, M. Elhoseiny,M. Rohrbach, and T. Tuyte-laars, “Memory Aware Synapses: Learning What (Not) to Forget,” in Proceedings of the European conference on computer vision (ECCV) (2018): 139-154.

[9]

H. Li, P. Barnaghi,S. Enshaeifar, and F. Ganz, “Continual Learning Using Bayesian Neural Networks,” IEEE Transactions on Neural Net-works and Learning Systems, 32, no. 9 (2020): 4243-4252, https://doi.org/10.1109/TNNLS.2020.3017292.

[10]

Yu Lu, X. Liu, and J. Van de Weijer, “Self-Training for Class-Incremental Semantic Segmentation,” IEEE Transactions on Neural Networks and Learning Systems 34, no. 11 (2022): 9116-9127, https://doi.org/10.1109/TNNLS.2022.3155746.

[11]

R. Ratcliff, “Connectionist Models of Recognition Memory: Con-straints Imposed by Learning and Forgetting Functions,” Psychological Review 285 (1990), https://doi.org/10.1037/0033-295x.97.2.285.

[12]

M. McCloskey and N. J. Cohen, “Catastrophic Interference in Connectionist Networks:The Sequential Learning Problem,” in Psy-chology of Learning and Motivation (Elsevier, 1989), 109-165.

[13]

R. M. French, “Catastrophic Forgetting in Connectionist Networks,” Trends in Cognitive Sciences 3, no. 4 (1999): 128-135, https://doi.org/10.1016/s1364-6613(99)01294-2.

[14]

G. M. Van de Ven and A. S. Tolias, “Generative Replay With Feedback Connections as a General Strategy for Continual Learning,” arXiv preprint arXiv:1809.10635, (2018).

[15]

Y.-C. Hsu Y.-C. Liu A. Ramasamy and Z. Kira, “Re-Evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines,” arXiv preprint arXiv:1810.12488, (2018).

[16]

G. M. Van de Ven and A. S. Tolias, “Three Scenarios for Continual Learning,” arXiv preprint arXiv:1904.07734, (2019).

[17]

Da-W. Zhou, Y. Yang and De-C. Zhan,Learning to Classify with Incremental New Class (IEEE TNNLS, 2021).

[18]

M. Delange, R. Aljundi, M. Masana, et al., “A Continual Learning Survey: Defying Forgetting in Classification Tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 44, no. 7 (2021): 1-3385, https://doi.org/10.1109/tpami.2021.3057446.

[19]

S.-A. Rebuffi A. Kolesnikov G. Sperl and C. H. Lampert, “Icarl: Incremental Classifier and Representation Learning,” in Conference on Computer Vision and Pattern Recognition, (2017), 2001-2010.

[20]

M. Riemer, I. Cases, R. Ajemian, et al., “Learning to Learn Without Forgetting by Maximizing Transfer and Minimizing Interference,” in International Conference on Learning Representations, (2018).

[21]

P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara, Dark Experience for General Continual Learning: A Strong, Simple Baseline (NeurIPS, 2020).

[22]

Z. Mai, R. Li,H. Kim, and S. Sanner, “Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021): 3589-3599.

[23]

H. Cha, J. Lee, and J. Shin, “Co2l:Contrastive Continual Learning,” in International Conference on Computer Vision, (2021), 9516-9525.

[24]

H. Zhao, H. Wang, Y. Fu, F. Wu, and Xi Li, “Memory Efficient Class-Incremental Learning for Image Classification,” IEEE Trans-actions on Neural Networks and Learning Systems 33, no. 10 (2021): 5966-5977, https://doi.org/10.1109/TNNLS.2021.3072041.

[25]

C. He,R. Wang, and X. Chen, “A Tale of Two Cils: The Connections Between Class Incremental Learning and Class Imbalanced Learning, and Beyond,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021), 3559-3569.

[26]

C. Elwood Shannon, A Mathematical Theory of Communication, (2001): MC2R, 3-55.

[27]

Y. Yang and Z. Xu, “Rethinking the Value of Labels for Improving Class-Imbalanced Learning,” in Conference on Neural Information Pro-cessing Systems, (2020).

[28]

B. Kang, Yu Li, Sa Xie, Z. Yuan, and J. Feng, “Exploring Balanced Feature Spaces for Representation Learning,” in International Confer-ence on Computer Vision, (2021).

[29]

S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Learning a Unified Classifier Incrementally via Rebalancing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 831-839.

[30]

A. Douillard, M. Cord, C. Ollion,T. Robert, and E. Valle, “Podnet: Pooled Outputs Distillation for Small-Tasks Incremental Learning,” in European Conference on Computer Vision, (2020), 86-102.

[31]

F. M. Castro, M. J. Marín-Jiménez, N. Guil, C. Schmid, and K. Alahari, “End-To-End Incremental Learning,” in European Conference on Computer Vision, (2018), 233-248.

[32]

Y. Wu, Y. Chen, L. Wang, et al. “Large Scale Incremental Learning , in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 374-382.

[33]

X. Hu, K. Tang, C. Miao, X.-S. Hua and H. Zhang,“Distilling Causal Effect of Data in Class-Incremental Learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021): 3957-3966.

[34]

X. Tao, X. Chang, X. Hong, W. Xing and Y. Gong, “Topology-Preserving class-incremental Learning in European Conference on Computer Vision, (2020), 254-270.

[35]

X. Tao, X. Hong, X. Chang, S. Dong, W. Xing and Y. Gong, “Few-Shot Class-Incremental Learning in Conference on Computer Vision and Pattern Recognition, (2020), 12183-12192.

[36]

K. Lee, K. Lee,H. Lee, and J. Shin, “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks,” in Conference on Neural Information Processing Systems, Vol. 31, (2018).

[37]

Z. Li and D. Hoiem,“Improving Confidence Estimates for Unfa-miliar Examples,” Proceedings Of The IEEE/CVF Conference On Com-puter Vision And Pattern Recognition (2020): 2686-2695.

[38]

P. Khosla, P. Teterwak, C. Wang, et al., “Supervised Contrastive Learning,” in Conference on Neural Information Processing Systems,(2020).

[39]

H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond Empirical Risk Minimization,” arXiv preprint arXiv:1710.09412, (2017).

[40]

V. Verma, A. Lamb, C. Beckham, et al., “Manifold Mixup: Better Representations by Interpolating Hidden States,” in International Con-ference on Machine Learning, (2019), 6438-6447.

[41]

S. Yun, D. Han, S. J. Oh, S. Chun,J. Choe, and Y. Yoo, “Cutmix: Regularization Strategy to Train Strong Classifiers With Localizable Features,” in International Conference on Computer Vision, (2019), 6023-6032.

[42]

L. Zhang, Z. Deng, K. Kawaguchi, A. Ghorbani, and J. Zou, “How Does Mixup Help With Robustness and Generalization?,” in Interna-tional Conference on Learning Representations, (2021).

[43]

Z. Liu and D. Liu,Robot Path Planning Based on Board View Ant Colony Algorithm Under Grid Environment (Journal of Chongqing University of Technology(Natural Science), 2020).

[44]

M. Zhuang, X. Tan, Y. Fan, and H. Cheng, 3d Animation Expression Generation and Emotional Supervision Based on Convolutional Neural Network, Journal of Chongqing University of Technology(Natural Sci-ence) (2020).

[45]

H.-P. Chou S.-C. Chang J. -Yu Pan W. Wei and Da-C. Juan, “Re-mix: Rebalanced Mixup,” in European Conference on Computer Vision Workshop, (2020), 95-110.

[46]

J. Cai, Y. Wang, and J.-N. Hwang, “Ace: Ally Complementary Ex-perts for Solving Long-Tailed Recognition in One-Shot,” in International Conference on Computer Vision, (2021).

[47]

Y. Liu, Y. Su, An-An Liu, B. Schiele, and Q. Sun, “Mnemonics Training: Multi-Class Incremental Learning Without Forgetting,” in Conference on Computer Vision and Pattern Recognition, (2020).

[48]

J. Cheng, H. Liu, F. Wang, H. Li, and Ce Zhu, “Silhouette Analysis for Human Action Recognition Based on Supervised Temporal t-sne and Incremental Learning,” IEEE Transactions on Image Processing 24, no. 10 (2015): 3203-3217, https://doi.org/10.1109/tip.2015.2441634.

[49]

A. Gepperth and B. Hammer, “Incremental Learning Algorithms and Applications,” in European Symposium on Artificial Neural Net-works, Computational Intelligence and Machine Learning, (2016).

[50]

Qi Wang X. Liu, Wu Liu, An-An Liu, W. Liu, and T. Mei, “Meta-search: Incremental Product Search via Deep Meta-Learning,” IEEE Transactions on Image Processing 29 (2020): 7549-7564, https://doi.org/10.1109/tip.2020.3004249.

[51]

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, et al., “Progressive Neural Networks.” arXiv preprint arXiv:1606.04671, (2016).

[52]

R. Aljundi,P. Chakravarty, and T. Tuytelaars, “Expert Gate: Life-long Learning With a Network of Experts,” in Conference on Computer Vision and Pattern Recognition, (2017), 3366-3375.

[53]

P. Singh, P. Mazumder,P. Rai, and V. P. Namboodiri, “Rectification-Based Knowledge Retention for Continual Learning,” in Conference on Computer Vision and Pattern Recognition, (2021), 15282-15291.

[54]

S. Yan,J. Xie, and X. He, “Der: Dynamically Expandable Repre-sentation for Class Incremental Learning,” in Conference on Computer Vision and Pattern Recognition, (2021), 3014-3023.

[55]

A. Douillard, A. Ramé, G. Couairon, and M. Cord, “Dytox: Trans-formers for Continual Learning With Dynamic Token Expansion,” arXiv preprint arXiv:2111. 11326 (2021).

[56]

Z. Hu, Y. Li, J. Lyu,D. Gao, and N. Vasconcelos, “Dense Network Expansion for Class Incremental Learning,” inConference on Computer Vision and Pattern Recognition, (2023), 11858-11867.

[57]

J. Dong, W. Liang, C. Yang, and G. Sun, “Heterogeneous Forgetting Compensation for Class-Incremental Learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2023), 11742-11751.

[58]

C. Fernando, D. Banarse, C. Blundell, et al., “Pathnet: Evolution Channels Gradient Descent in Super Neural Networks,” arXiv preprint arXiv:1701.08734 (2017).

[59]

A. Mallya and S. Lazebnik, “Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning,” in Conference on Computer Vision and Pattern Recognition, (2018), 7765-7773.

[60]

A. Mallya,D. Davis, and S. Lazebnik, “Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights,” in European Conference on Computer Vision, (2018), 67-82.

[61]

P. Morgado and N. Vasconcelos, “Nettailor: Tuning the Architec-ture, Not Just the Weights,” in Conference on Computer Vision and Pattern Recognition, (2019), 3044-3054.

[62]

S. C. Y. Hung, C.-H. Tu, C.-En Wu, C.-H. Chen, Yi-M. Chan, and C.-S. Chen, “Compacting, Picking and Growing for Unforgetting Continual Learning,” in Conference on Neural Information Processing Systems, (2019).

[63]

Y. Yang, Da-W. Zhou, De-C. Zhan, H. Xiong, and Y. Jiang, “Adaptive Deep Models for Incremental Learning: Considering Capacity Scalability and Sustainability,” in Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining, (2019), 74-82.

[64]

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, et al., “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings Of The Na-tional Academy Of Sciences 114, no. 13 (2017): 3521-3526, https://doi.org/10.1073/pnas.1611835114.

[65]

Z. Li and D. Hoiem, “Learning Without Forgetting IEEE Trans-actions On Pattern Analysis And Machine Intelligence, no. 12 (2017): 2935-2947, https://doi.org/10.1109/tpami.2017.2773081.

[66]

S. Wang, X. Li,J. Sun, and Z. Xu, “Training Networks in Null Space of Feature Covariance for Continual Learning,” in Conference on Computer Vision and Pattern Recognition, (2021), 184-193.

[67]

F. Zhu, Z. Cheng, Xu-yao Zhang and C.-lin Liu, “Class-Incremental Learning via Dual Augmentation,” in Conference on Neural Information Processing Systems, (2021).

[68]

Da-W. Zhou, H. -J. Ye and De-C. Zhan, Co-transport for class-incremental Learning, ed. M. M. ACM, (2021), 1645-1654.

[69]

H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual Learning With Deep Generative Replay,” in Conference on Neural Information Pro-cessing Systems, (2017).

[70]

T. L. Hayes,N. D. Cahill, and C. Kanan, “Memory Efficient Expe-rience Replay for Streaming Learning,” in 2019 International Conference on Robotics and Automation (ICRA) 2019), 9769-9776.

[71]

A. Prabhu,P. H. S. Torr, and P. K. Dokania, “Gdumb: A Simple Approach That Questions Our Progress in Continual Learning,” Euro-pean Conference On Computer Vision (2020): 524-540, https://doi.org/10.1007/978-3-030-58536-5_31.

[72]

H. Inoue, “Data Augmentation by Pairing Samples for Images Classification,” arXiv preprint arXiv:1801.02929 (2018).

[73]

S. Venkataramanan, Y. Avrithis, E. Kijak, and L. Amsaleg, “Align-mix: Improving Representation by Interpolating Aligned Features,” in Conference on Computer Vision and Pattern Recognition, (2022).

[74]

T. Suzuki, “Teachaugment: Data Augmentation Optimization Using Teacher Knowledge,” in Conference on Computer Vision and Pattern Recognition, (2022).

[75]

L. Jing and Y. Tian, “Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey,” Ieee Transactions On Pattern Anal-ysis And Machine Intelligence 43, no. 11 (2020): 4037-4058, https://doi.org/10.1109/TPAMI.2020.2992393.

[76]

X. Pan, F. Tang, W. Dong, et al., “Self-Supervised Feature Augmentation for Large Image Object Detection,” IEEE Transactions on Image Processing 29 (2020): 6745-6758, https://doi.org/10.1109/tip.2020.2993403.

[77]

A. Jaiswal, A. Ramesh Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A Survey on Contrastive Self-Supervised Learning,” Tech-nologies 2 (2021), https://doi.org/10.3390/technologies9010002.

[78]

P. Chen, L. Li, J. Wu,W. Dong, and G. Shi, “Contrastive Self-Supervised Pre-Training for Video Quality Assessment,” IEEE Trans-actions on Image Processing (2021), https://doi.org/10.1109/TIP.2021.3130536.

[79]

M. Gutmann and A. Hyvärinen, “Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models,” in In-ternational Conference on Artificial Intelligence and Statistics, (2010), 297-304.

[80]

K. Sohn, “Improved Deep Metric Learning With Multi-Class n-Pair Loss Objective,” in Conference on Neural Information Processing Sys-tems, (2016), 1857-1865.

[81]

A. van den Oord, Y. Li, and O. Vinyals, “Representation Learning With Contrastive Predictive Coding,” arXiv preprint arXiv:1807.03748, (2018).

[82]

Z. Wu, Y. Xiong,S. X. Yu, and D. Lin, “Unsupervised Feature Learning via Non-Parametric Instance Discrimination,” in Conference on Computer Vision and Pattern Recognition, (2018), 3733-3742.

[83]

M. Ye, Xu Zhang, P. C. Yuen, and S.-Fu Chang, “Unsupervised Embedding Learning via Invariant and Spreading Instance Feature,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019): 6210-6219.

[84]

M. Ye, J. Shen, Xu Zhang, P. C. Yuen, and S.-Fu Chang, “Augmentation Invariant and Instance Spreading Feature for Softmax Embedding,” IEEE Transactions On Pattern Analysis And Machine Intelligence 44, no. 2, (2020): 924-939, https://doi.org/10.1109/TPAMI.2020.3013379.

[85]

I. Misra, L. Van Der Maaten, “Self-Supervised Learning of Pretext-Invariant Representations,” In Conference on Computer Vision and Pattern Recognition, (2020), 6707-6717.

[86]

Y. Tian, D. Krishnan, and P. Isola, “Contrastive Multiview Coding in European Conference on Computer Vision (Springer, 2020), 776-794.

[87]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” Proceedings Of The Ieee/Cvf Conference On Computer Vision And Pattern Recognition (2020): 9729-9738.

[88]

T. Chen, S. Kornblith,M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” Inter-national Conference on Machine Learning (2020): 1597-1607.

[89]

X. Chen and K. He, “Exploring Simple Siamese Representation Learning,” in Conference on Computer Vision and Pattern Recognition, (2021), 15750-15758.

[90]

G. E. Chongjian, J. Wang, Z. Tong, S. Chen, Y. Song, and P. Luo, “Soft Neighbors Are Positive Supporters in Contrastive Visual Repre-sentation Learning,” in International Conference on Machine Learning, (2023).

[91]

S. Ren, H. Wang, Z. Gao, et al., “A Simple Data Mixing Prior for Improving Self-Supervised Learning,” in Conference on Computer Vision and Pattern Recognition, (2022), 14595-14604.

[92]

D. T. Hoffmann, N. Behrmann, J. Gall,T. Brox, and M. Noroozi, “Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives,” in Proceedings of the AAAI Conference on Artificial Intelligence, (2022).

[93]

Z. Shen, Z. Liu, Z. Liu, M. Savvides,T. Darrell, and E. Xing, “Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representa-tion Learning,” in Proceedings of the AAAI Conference on Artificial In-telligence, (2022), 2216-2224.

[94]

J. Li, W. Qiang, C. Zheng,B. Su, and H. Xiong, “Metaug: Contrastive Learning via Meta Feature Augmentation,” in International Conference on Machine Learning, (2022).

[95]

H. Sun, M. Zhou, W. Chen, and W. Xie, “Tr-detr: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection,” arXiv preprint arXiv:2401.02309 38, no. 5 (2024): 4998-5007, https://doi.org/10.1609/aaai.v38i5.28304.

[96]

H. Sun, X. Zheng, and X. Lu, “A Supervised Segmentation Network for Hyperspectral Image Classification,” IEEE Transactions on Image Processing 30 (2021): 2810-2825, https://doi.org/10.1109/tip.2021.3055613.

[97]

H. Yao, R. Chen, W. Chen, H. Sun, W. Xie, and X. Lu, “Pseudo-Label-Based Unreliable Sample Learning for Semi-Supervised Hyperspectral Image Classification,” IEEE Transactions on Geoscience and Remote Sensing 61 (2023): 1-16, https://doi.org/10.1109/tgrs.2023.3322558.

[98]

C. Buciluǎ, “Rich Caruana, and Alexandru Niculescu-Mizil. Model Compression,” in Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining, (2006), 535-541.

[99]

G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv:1503.02531, (2015).

[100]

A. Romero, N. Ballas, S. Ebrahimi Kahou, A. Chassang,C. Gatta, and Y. Bengio, “Fitnets: Hints for Thin Deep Nets,” in International Conference on Learning Representations (ICLR) Is a Machine Learning Conference, (2015).

[101]

S. Zagoruyko and N. Komodakis, “Paying More Attention to Attention: Improving the Performance of Convolutional Neural Net-works via Attention Transfer,” in International Conference on Learning Representations (ICLR) Is a Machine Learning Conference, (2017).

[102]

H.-J. Ye, Su Lu, and De-C. Zhan, “Generalized Knowledge Distil-lation via Relationship Matching,” IEEE Transactions on Pattern Anal-ysis and Machine Intelligence 45, no. 2 (2022): 1817-1834, https://doi.org/10.1109/TPAMI.2022.3160328.

[103]

J. Zhu, S. Tang, D. Chen, et al. “Complementary Relation Contrastive Distillation , in Conference on Computer Vision and Pattern Recognition, (2021), 9260-9269.

[104]

J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge Distillation: A Survey,” International Journal Of Computer Vision 129, no. 6 (2021): 1789-1819, https://doi.org/10.1007/s11263-021-01453-z.

[105]

L. Wang and K. J. Yoon, “Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Out-looks,” IEEE transactions on pattern analysis and machine intelligence 44, no. 6 (2021): 3048-3068, https://doi.org/10.1109/TPAMI.2021.3055564.

[106]

Ke Zixuan B. Liu H. Wang and L. Shu, “Continual Learning With Knowledge Transfer for Sentiment Classification,” in European Con-ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (2020), 683-698.

[107]

S. Abbasi Koohpayegani,A. Tejankar, and H. Pirsiavash, “Compress: Self-Supervised Learning by Compressing Representations,” in Conference on Neural Information Processing Systems, (2020), 12980-12992.

[108]

A. Tejankar, S. Abbasi Koohpayegani, V. Pillai,P. Favaro, and H. Pirsiavash, “Isd: Self-Supervised Learning by Iterative Similarity Distil-lation,” in International Conference on Computer Vision, (2021).

[109]

Z. Fang, J. Wang, L. Wang, L. Zhang,Y. Yang, and Z. Liu, “Seed: Self-Supervised Distillation for Visual Representation,” in International Conference on Learning Representations, (2021).

[110]

T. Mensink, J. Verbeek,F. Perronnin, and G. Csurka, “Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost,” Ieee Transactions On Pattern Analysis And Machine Intelligence, no. 11 (2013): 2624-2637, https://doi.org/10.1109/tpami.2013.83.

[111]

A. Krizhevsky, G. Hinton, et al., Learning Multiple Layers of Fea-tures from Tiny Images. Technical report (University of Toronto, 2009).

[112]

Ya Le and X. Yang. Tiny Imagenet Visual Recognition Challenge, (2015).

[113]

A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An Imperative Style, High-Performance Deep Learning Library,” in Conference on Neural Information Processing Systems, (2019).

[114]

D. Lopez-Paz and M. ’A. Ranzato, “Gradient Episodic Memory for Continual Learning,” in Conference on Neural Information Processing Systems, (2017), 6467-6476.

[115]

A. Chaudhry, M.’A. Ranzato, M. Rohrbach and M. Elhoseiny, “Efficient Lifelong Learning With A-GEM,” in International Conference on Learning Representations, (2019).

[116]

A. S. Benjamin, D. Rolnick, and K. P. Körding, “Measuring and Regularizing Networks in Function Space,” in International Conference on Learning Representations, (2019).

[117]

R. Aljundi, M. Lin,B. Goujaud, and Y. Bengio, “Gradient Based Sample Selection for Online Continual Learning,” Conference on Neural Information Processing Systems (2019).

[118]

A. Chaudhry, A. Gordo, P. K. Dokania, P. H. S. Torr, and D. Lopez-Paz, “Using Hindsight to Anchor past Knowledge in Continual Learning,” in Proceedings Of The Aaai Conference On Artificial Intelli-gence, (2020).

[119]

H. Robbins and S. Monro, “A Stochastic Approximation Method Annals of Mathematical Statistics 22, no. 3 (1951): 400-407, https://doi.org/10.1214/aoms/1177729586.

[120]

D. S. Starnes, D. Yates, and D. S. Moore, The Practice of Statistics (Macmillan, 2010).

[121]

L. Van der Maaten, G. Hinton and Z. Geng, “Visualizing Data Using t-sne,” Journal Of Machine Learning Research 9, no. 11 (2008): 2847-2880, https://doi.org/10.1162/jmlr.2008.9.86.2579.

Funding

National Natural Science Foundation of China(62176188)

AI Summary AI Mindmap
PDF (8048KB)

30

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/