Decoupled knowledge distillation method based on meta-learning

Wenqing Du , Liting Geng , Jianxiong Liu , Zhigang Zhao , Chunxiao Wang , Jidong Huo

High-Confidence Computing ›› 2024, Vol. 4 ›› Issue (1) : 100164

PDF (1369KB)
High-Confidence Computing ›› 2024, Vol. 4 ›› Issue (1) : 100164 DOI: 10.1016/j.hcc.2023.100164
Research Articles
research-article

Decoupled knowledge distillation method based on meta-learning

Author information +
History +
PDF (1369KB)

Abstract

With the advancement of deep learning techniques, the number of model parameters has been increasing, leading to significant memory consumption and limits in the deployment of such models in real-time applications. To reduce the number of model parameters and enhance the generalization capability of neural networks, we propose a method called Decoupled MetaDistil, which involves decoupled meta-distillation. This method utilizes meta-learning to guide the teacher model and dynamically adjusts the knowledge transfer strategy based on feedback from the student model, thereby improving the generalization ability. Furthermore, we introduce a decoupled loss method to explicitly transfer positive sample knowledge and explore the potential of negative samples knowledge. Extensive experiments demonstrate the effectiveness of our method.

Keywords

Model compression / Knowledge distillation / Meta-learning / Decoupled loss

Cite this article

Download citation ▾
Wenqing Du, Liting Geng, Jianxiong Liu, Zhigang Zhao, Chunxiao Wang, Jidong Huo. Decoupled knowledge distillation method based on meta-learning. High-Confidence Computing, 2024, 4(1): 100164 DOI:10.1016/j.hcc.2023.100164

登录浏览全文

4963

注册一个新账户 忘记密码

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Key R&D Program of Shan-dong Province, China (2022CXGC20106), the Pilot Project for Integrated Innovation of Science, Education, and Industry of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01), Joint Fund of Shandong Natural Science Foundation (ZR2022LZH010), and Shandong Provincial Natural Science Foundation (ZR2021LZH008).

References

[1]

J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141.

[2]

N. Ma, X. Zhang, H.T. Zheng, et al., Shufflenet v2: Practical guidelines for efficient cnn architecture design,in:Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 116-131.

[3]

D. Lu, Q. Weng, A survey of image classification methods and techniques for improving classification performance, Int. J. Remote Sens. 28 (5) (2007) 823-870.

[4]

M. Braei, S. Wagner, Anomaly detection in univariate time-series: A survey on the state-of-the-art, 2020, arXiv preprint arXiv:2004.00433.

[5]

K. He, G. Gkioxari, P. Dollár, et al., Mask r-cnn,in:Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961-2969.

[6]

S. Ren, K. He, R. Girshick, et al., Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. (2015) 28.

[7]

J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.

[8]

J. Dong, D. Zhang, Y. Cong, et al., Federated incremental semantic segmentation,in:Proceedings of the IEEE/CVF Conference on Computer Visio-N and Pattern Recognition, 2023, pp. 3934-3943.

[9]

H. Zhao, J. Shi, X. Qi, et al., Pyramid scene parsing network,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881-2890.

[10]

Z. Liu, M. Sun, T. Zhou, et al., Rethinking the value of network pruning, 2018, arXiv preprint arXiv:1810.05270.

[11]

G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, 2015, arXiv preprint arXiv:1503.02531.

[12]

C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adapta- tion of deep networks, in:International Conference on Machine Learning, PMLR, 2017, pp. 1126-1135.

[13]

Y. Fan, F. Tian, T. Qin, et al., Learning to teach, 2018, arXiv preprint arXiv:1805.03643.

[14]

X. Zhu, S. Gong, Knowledge distillation by on-the-fly native ensemble, Adv. Neural Inf. Process. Syst. (2018) 31.

[15]

Y. Jang, H. Lee, S.J. Hwang, et al., Learning what and where to transfer,in:International Conference on Machine Learning, PMLR, 2019, pp. 3030-3039.

[16]

H. Pan, C. Wang, M. Qiu, et al., Meta-KD: A meta knowledge distillation framework for language model compression across domains, 2020, arXiv preprint arXiv:2012.01266.

[17]

W. Zhou, C. Xu, J. McAuley, BERT learns to teach: Knowledge distillation with meta learning, 2021, arXiv preprint arXiv:2106.04570.

[18]

H. Zhang, D. Chen, C. Wang, Adaptive multi-teacher knowledge distillation with meta-learning, 2023, arXiv preprint arXiv:2306.06634.

[19]

B. Zhao, Q. Cui, R. Song, et al., Decoupled knowledge distillation,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11953-11962.

[20]

A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, 2009.

[21]

K. He, X. Zhang, S. Ren, et al., Deep residual learning for image recognition,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[22]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.

[23]

L. Luo, S. Zheng, Y. Li, et al., BEVPlace: Learning LiDAR-based place recognition using bird’s eye view images, 2023, arXiv preprint arXiv: 2302.14325.

[24]

Z. Shi, X. Ding, P. Ding, et al., Regression-oriented knowledge distillation for lightweight ship orientation angle prediction with optical remote sensing images, 2023, arXiv preprint arXiv:2307.06566.

[25]

A. Romero, N. Ballas, S.E. Kahou, et al., Fitnets: Hints for thin deep nets, 2014, arXiv preprint arXiv:1412.6550.

[26]

S. Zagoruyko, N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, 2016, arXiv preprint arXiv:1612.03928.

[27]

W. Park, D. Kim, Y. Lu, et al., Relational knowledge distillation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3967-3976.

AI Summary AI Mindmap
PDF (1369KB)

197

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/