A teacher-student based attention network for fine-grained image recognition

Li Ang , Zhang Xueyi , Li Peilin , Kang Bin

›› 2025, Vol. 11 ›› Issue (1) : 52 -59.

PDF
›› 2025, Vol. 11 ›› Issue (1) : 52 -59. DOI: 10.1016/j.dcan.2023.02.004
Original article

A teacher-student based attention network for fine-grained image recognition

Author information +
History +
PDF

Abstract

Fine-grained Image Recognition (FGIR) task is dedicated to distinguishing similar sub-categories that belong to the same super-category, such as bird species and car types. In order to highlight visual differences, existing FGIR works often follow two steps: discriminative sub-region localization and local feature representation. However, these works pay less attention on global context information. They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different sub-regions from a global view point. Therefore, in this paper, we consider both global and local information for FGIR, and propose a collaborative teacher-student strategy to reinforce and unity the two types of information. Our framework is implemented mainly by convolutional neural network, referred to Teacher-Student Based Attention Convolutional Neural Network (T-S-ACNN). For fine-grained local information, we choose the classic Multi-Attention Network (MA-Net) as our baseline, and propose a type of boundary constraint to further reduce background noises in the local attention maps. In this way, the discriminative sub-regions tend to appear in the area occupied by fine-grained objects, leading to more accurate sub-region localization. For fine-grained global information, we design a graph convolution based Global Attention Network (GA-Net), which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among sub-regions. At last, we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes, so as to enhance the cooperative reinforcement of MA-Net and GA-Net. Extensive experiments on CUB-200-2011, Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework.

Keywords

Fine-grained image recognition / Collaborative teacher-student strategy / Multi-attention / Global attention

Cite this article

Download citation ▾
Li Ang, Zhang Xueyi, Li Peilin, Kang Bin. A teacher-student based attention network for fine-grained image recognition. , 2025, 11(1): 52-59 DOI:10.1016/j.dcan.2023.02.004

登录浏览全文

4963

注册一个新账户 忘记密码

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China, China (Grants No.62171232), and the Priority Academic Program Development of Jiangsu Higher Education Institutions, China.

References

[1]

J. Krause, H. Jin, J. Yang, L. Fei-Fei, Fine-grained recognition without part annotations, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5546-5555.

[2]

S. Branson, G.V. Horn, S.J. Belongie, P. Perona, Bird Species Categorization Using Pose Normalized Deep Convolutional Nets. https://arxiv.org/abs/1406.2952.

[3]

N. Zhang, J. Donahue, R. Girshick, T. Darrell, Part-based r-cnns for fine-grained category detection, in:European Conference on Computer Vision, 2014, pp. 834-849.

[4]

X. Liu, T. Xia, J. Wang, Y. Lin,Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition. https://arxiv.org/abs/1603.06765.

[5]

D. Wang, Z. Shen, J. Shao, W. Zhang, X. Xue, Z. Zhang, Multiple granularity descriptors for fine-grained categorization, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2399-2406.

[6]

H. Zheng, J. Fu, T. Mei, J. Luo, Learning multi-attention convolutional neural network for fine-grained image recognition, in: 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5219-5227.

[7]

Y. Ding, Z. Ma, S. Wen, J. Xie, D. Chang, Z. Si, M. Wu, H. Ling, Ap-cnn: weakly supervised attention pyramid convolutional neural network for fine-grained visual classification, IEEE Trans. Image Process. 30 (2021) 2826-2836.

[8]

Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, L. Wang,Learning to navigate for finegrained classification, in:Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 420-435.

[9]

M. Liu, C. Zhang, H. Bai, R. Zhang, Y. Zhao, Cross-part learning for fine-grained image classification, IEEE Trans. Image Process. 31 (2021) 748-758.

[10]

W. Ge, X. Lin, Y. Yu,Weakly supervised complementary parts models for finegrained image classification from the bottom up, in:2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3029-3038.

[11]

X. Wang, R. Girshick, A. Gupta, K. He,Non-local neural networks, in:2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7794-7803.

[12]

A. Buades, B. Coll, J.-M. Morel, A non-local algorithm for image denoising, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 60-65.

[13]

T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear cnn models for fine-grained visual recognition, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1449-1457.

[14]

J. Fu, H. Zheng, T. Mei,Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4476-4484.

[15]

M. Sun, Y. Yuan, F. Zhou, E. Ding,Multi-attention multi-class constraint for finegrained image recognition, in:Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 805-821.

[16]

W. Luo, X. Yang, X. Mo, Y. Lu, L. Davis, J. Li, J. Yang, S.-N. Lim,Cross-x learning for fine-grained visual categorization, in:2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8241-8250.

[17]

H. Zheng, J. Fu, Z.-J. Zha, J. Luo, Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition,in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5007-5016.

[18]

H. Zheng, J. Fu, Z.-J. Zha, J. Luo, Learning deep bilinear transformation for finegrained image representation, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, ACM, December 2019, pp. 4277-4286.

[19]

R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, F. Huang,Attention convolutional binary neural tree for fine-grained visual categorization, in:2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10465-10474.

[20]

D. Chang, Y. Ding, J. Xie, A.K. Bhunia, X. Li, Z. Ma, M. Wu, J. Guo, Y.-Z. Song, The devil is in the channels: mutual-channel loss for fine-grained image classification, IEEE Trans. Image Process. 29 (2020) 4683-4695.

[21]

A. Bera, Z. Wharton, Y. Liu, N. Bessis, A. Behera, Sr-gnn: Spatial relation-aware graph neural network for fine-grained image categorization, IEEE Trans. Image Process. 31 (2022) 6017-6031.

[22]

G. Hinton, O. Vinyals, J. Dean, et al., Distilling the Knowledge in a Neural Network, arXiv preprint arXiv:1503.02531 2 (7).

[23]

D. Jia, K. Han, Y. Wang, Y. Tang, J. Guo, C. Zhang, D. Tao, Efficient Vision Transformers via Fine-Grained Manifold Distillation, arXiv preprint arXiv: 2107.01378.

[24]

Z. Hao, Y. Luo, H. Hu, J. An, Y. Wen,Data-free ensemble knowledge distillation for privacy-conscious multimedia model compression, in:Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1803-1811.

[25]

Z. Hao, Y. Luo, Z. Wang, H. Hu, J. an,Cdfkd-Mfs: Collaborative Data-free Knowledge Distillation via Multi-Level Feature Sharing, arXiv preprint arXiv: 2205.11845.

[26]

Y. Zhang, T. Xiang, T.M. Hospedales, H. Lu,Deep mutual learning, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4320-4328.

[27]

R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl,G. E. Hinton, Large Scale Distributed Neural Network Training through Online Distillation, arXiv preprint arXiv:1804.03235.

[28]

M. Gori, G. Monfardini, F. Scarselli, A new model for learning in graph domains, in: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005, pp. 729-734.

[29]

F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, Computational capabilities of graph neural networks, IEEE Trans. Neural Network. 20 (1) (2008) 81-102.

[30]

J. Bruna, W. Zaremba, A. Szlam,Y. LeCun, Spectral Networks and Locally Connected Networks on Graphs, arXiv preprint arXiv:1312.6203.

[31]

S. Yan, Y. Xiong, D. Lin,Spatial temporal graph convolutional networks for skeleton-based action recognition, Thirty-second AAAI Conference on Artificial Intelligence (2018) 7444-7452.

[32]

J. Gao, T. Zhang, C. Xu,Graph convolutional tracking, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4649-4659.

[33]

Z.-M. Chen, X.-S. Wei, P. Wang, Y. Guo,Multi-label image recognition with graph convolutional networks, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5177-5186.

[34]

F. Saleh, M.S. Aliakbarian, M. Salzmann, L. Petersson, S. Gould, J.M. Alvarez, Builtin foreground/background prior for weakly-supervised semantic segmentation, in: European Conference on Computer Vision, Springer, 2016, pp. 413-432.

[35]

O.M. Parkhi, A. Vedaldi, A. Zisserman,Deep Face Recognition, in:British Machine Vision Conference, 2015.

[36]

J. Deng, J. Guo, N. Xue, S. Zafeiriou, Arcface: additive angular margin loss for deep face recognition,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690-4699.

[37]

M.-T. Luong, H. Pham,C. D. Manning, Effective Approaches to Attention-Based Neural Machine Translation, arXiv preprint arXiv:1508.04025.

[38]

C. Anderson, M. Gwilliam, A. Teuscher, A. Merrill, R. Farrell, Facing the Hard Problems in FGVC, arXiv preprint arXiv:2006.13190.

[39]

C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie,The Caltech-Ucsd Birds-200- 2011 Dataset, Tech. Rep. CNS-TR-2011-001, California Institute of Technology, 2011.

[40]

J. Krause, M. Stark, J. Deng, L. Fei-Fei, 3d Object Representations for Fine-Grained Categorization, in: 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 554-561.

[41]

S. Maji, E. Rahtu, J. Kannala, M.B. Blaschko, A. Vedaldi,Fine-Grained Visual Classification of Aircraft. https://doi.org/10.48550/arXiv.1306.5151.

AI Summary AI Mindmap
PDF

310

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/