De-biased knowledge distillation framework based on knowledge infusion and label de-biasing techniques

Yan Li , Tai-Kang Tian , Meng-Yu Zhuang , Yu-Ting Sun

Journal of Electronic Science and Technology ›› 2024, Vol. 22 ›› Issue (3) : 100278

PDF (837KB)
Journal of Electronic Science and Technology ›› 2024, Vol. 22 ›› Issue (3) : 100278 DOI: 10.1016/j.jnlest.2024.100278
research-article

De-biased knowledge distillation framework based on knowledge infusion and label de-biasing techniques

Author information +
History +
PDF (837KB)

Abstract

Knowledge distillation, as a pivotal technique in the field of model compression, has been widely applied across various domains. However, the problem of student model performance being limited due to inherent biases in the teacher model during the distillation process still persists. To address the inherent biases in knowledge distillation, we propose a de-biased knowledge distillation framework tailored for binary classification tasks. For the pre-trained teacher model, biases in the soft labels are mitigated through knowledge infusion and label de-biasing techniques. Based on this, a de-biased distillation loss is introduced, allowing the de-biased labels to replace the soft labels as the fitting target for the student model. This approach enables the student model to learn from the corrected model information, achieving high-performance deployment on lightweight student models. Experiments conducted on multiple real-world datasets demonstrate that deep learning models compressed under the de-biased knowledge distillation framework significantly outperform traditional response-based and feature-based knowledge distillation models across various evaluation metrics, highlighting the effectiveness and superiority of the de-biased knowledge distillation framework in model compression.

Keywords

De-biasing / Deep learning / Knowledge distillation / Model compression

Cite this article

Download citation ▾
Yan Li, Tai-Kang Tian, Meng-Yu Zhuang, Yu-Ting Sun. De-biased knowledge distillation framework based on knowledge infusion and label de-biasing techniques. Journal of Electronic Science and Technology, 2024, 22(3): 100278 DOI:10.1016/j.jnlest.2024.100278

登录浏览全文

4963

注册一个新账户 忘记密码

Data availability

The datasets used in the experiments are all available in the Kaggle community through the provided link (https://www.kaggle.com/).

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 62172056; Young Elite Scientists Sponsorship Program by CAST under Grant No. 2022QNRC001.

Declaration of competing interest

The authors declare that there are no conflicts of interest regarding the publication of this manuscript. This work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

[1]

Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (May 2015) 436-444.

[2]

P.-F. Zhang, J.-S. Duan, Z. Huang, H.-Z. Yin, Joint-teaching: learning to refine knowledge for resource-constrained unsupervised cross-modal retrieval, in: Proc. of the 29th ACM Intl. Conf. on Multimedia, Chengdu, China, (2021), pp. 1517-1525.

[3]

G.-H. Ye, H.-Z. Yin, T. Chen, M. Xu, Q.V.H. Nguyen, J. Song, Personalized on-device e-health analytics with decentralized block coordinate descent, IEEE J. Biomed. Health 27 (6) (Jun. 2022) 5249-5259.

[4]

Y.-T. Sun, G.-S. Pang, G.-H. Ye, T. Chen, X. Hu, H.-Z. Yin, Unraveling the ‘anomaly’ in time series anomaly detection: A self-supervised tri-domain solution, in: Proc. of 2024 IEEE 40th Intl. Conf. on Data Engineering, Utrecht, Netherlands, (2024), pp. 981-994.

[5]

P.-F. Zhang, Z. Huang, X.-S. Xu, G.-D. Bai, Effective and robust adversarial training against data and label corruptions, IEEE T. Multimedia (May 2024), https://doi.org/10.1109/TMM.2024.3394677.

[6]

G.-H. Ye, T. Chen, Q.V.H. Nguyen, H.-Z. Yin, Heterogeneous decentralised machine unlearning with seed model distillation, CAAI T Intell. Techno. 9 (3) (Jun. 2024) 608-619.

[7]

C. Buciluǎ, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Philadelphia, USA, (2006), pp. 535-541.

[8]

Z. Li, H.-Y. Li, L. Meng, Model compression for deep neural networks: A survey, Computers 12 (3) (Mar. 2023) 60:1-22.

[9]

Y. He, L.-G. Xiao, Structured pruning for deep convolutional neural networks: A survey, IEEE T. Pattern Anal. 46 (5) (May 2024) 2900-2919.

[10]

H. Pham, M. Guan, B. Zoph, Q. Le, J. Dean, Efficient neural architecture search via parameters sharing, in: Proc. of the 35th Intl. Conf. on Machine Learning, Stockholm, Sweden, (2018), pp. 4095-4104.

[11]

G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network [Online]. Available, https://arxiv.org/abs/1503.02531, March 2015.

[12]

J.-P. Gou, B.-S. Yu, S.J. Maybank, D.-C. Tao, Knowledge distillation: A survey, Int. J. Comput. Vision 129 (6) (Jun. 2021) 1789-1819.

[13]

M. Phuong, C. Lampert, Towards understanding knowledge distillation, in: Proc. of the 36th Intl. Conf. on Machine Learning, Long Beach, USA, (2019), pp. 5142-5151.

[14]

X. Cheng, Z.-F. Rao, Y.-L. Chen, Q.-S. Zhang, Explaining knowledge distillation by quantifying the knowledge, in: Proc. of the IEEE/CVF Conf, on Computer Vision and Pattern Recognition, Seattle, USA, (2020), pp. 12922-12932.

[15]

A. Gotmare, N.S. Keskar, C.-M. Xiong, R. Socher, A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation [Online]. Available, https://arxiv.org/abs/1810.13243, October 2018.

[16]

A. Romero, N. Ballas, S.E. Kahou, A. Chassang, C. Gatta, Y. Bengio, FitNets: Hints for thin deep nets, in: Proc. of the 3rd Intl. Conf. on Learning Representations, San Diego, USA, (2015), pp. 1-13.

[17]

K.-M. He, X.-Y. Zhang, S.-Q. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, USA, (2016), pp. 770-778.

[18]

G.-H. Ye, T. Chen, Y.-W. Li, L.-Z. Cui, Q.V.H. Nguyen, H.-Z. Yin, Heterogeneous collaborative learning for personalized healthcare analytics via messenger distillation, IEEE J. Biomed. Health 27 (11) (Nov. 2023) 5249-5259.

[19]

P.-F. Zhang, Z. Huang, G.-D. Bai, X.-S. Xu, IDEAL: High-order-ensemble adaptation network for learning with noisy labels, in: Proc. of the 30th ACM Intl. Conf. on Multimedia, Lisboa, Portugal, (2022), pp. 325-333.

[20]

Z.-X. Xu, P.-H. Wei, W.-M. Zhang, S.-G. Liu, L. Wang, B. Zheng, UKD: Debiasing conversion rate estimation via uncertainty-regularized knowledge distillation, in: Proc. of the ACM Web Conf., Lyon, France, (2022), pp. 2078-2087.

[21]

Z.-N. Li, Q.-T. Wu, F. Nie, J.-C. Yan, GraphDE: A generative framework for debiased learning and out-of-distribution detection on graphs, in: Proc. of the 36th Intl. Conf. on Neural Information Processing Systems, New Orleans, USA, (2024) 2195:1-4.

[22]

Y. Cao, Z.-Y. Fang, Y. Wu, D.-X. Zhou, Q.-Q. Gu, Towards understanding the spectral bias of deep learning, in: Proc. of the 30th Intl. Joint Conf. on Artificial Intelligence, Montreal, Canada, (2021), pp. 2205-2211.

[23]

Y. Guo, Y. Yang, A. Abbasi, Auto-Debias: debiasing masked language models with automated biased prompts, in: Proc. of the 60th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, (2022), pp. 1012-1023.

[24]

T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as treatments: Debiasing learning and evaluation, in: Proc. of the 33rd Intl. Conf. on Machine Learning, New York, USA, (2016), pp. 1670-1679.

[25]

J.-W. Chen, H.-D. Dong, Y. Qiu, et al., AutoDebias: Learning to debias for recommendation, in: Proc. of the 44th Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, Virtual Event, (2021), pp. 21-30.

[26]

K. Zhou, B.-C. Zhang, X. Zhao, J.-R. Wen, Debiased contrastive learning of unsupervised sentence representations, in: Proc. of the 60th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, (2022), pp. 6120-6130.

[27]

B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, J.Y. Choi, A comprehensive overhaul of feature distillation, in: Proc. of the IEEE/CVF Intl. Conf. on Computer Vision, Seoul, Korea (South), (2019), pp. 1921-1930.

[28]

W. Park, D. Kim, Y. Lu, M. Cho, Relational knowledge distillation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Long Beach, USA, (2019), pp. 3962-3971.

[29]

B.-R. Zhao, Q. Cui, R.-J. Song, Y.-Y. Qiu, J.-J. Liang, Decoupled knowledge distillation, in: Proc. of the IEEE/CVF Conf, on Computer Vision and Pattern Recognition, New Orleans, USA, (2022), pp. 11943-11952.

[30]

H.-L. Zhou, L.-C. Song, J.-J. Chen, et al., Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective, in: Proc. of the 9th Intl. Conf. on Learning Representations, Virtual Event, (2021), pp. 1-15.

AI Summary AI Mindmap
PDF (837KB)

37

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/