Distilling base-and-meta network with contrastive learning for few-shot semantic segmentation

Xinyue Chen, Yueyi Wang, Yingyue Xu, Miaojing Shi

Autonomous Intelligent Systems ›› 2023, Vol. 3 ›› Issue (1) : 11. DOI: 10.1007/s43684-023-00058-2
Original Article

Distilling base-and-meta network with contrastive learning for few-shot semantic segmentation

Author information +
History +

Abstract

Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories. However, these models trained on base classes with sufficient annotated samples are biased towards these base classes, which results in semantic confusion and ambiguity between base classes and new classes. A strategy is to use an additional base learner to recognize the objects of base classes and then refine the prediction results output by the meta learner. In this way, the interaction between these two learners and the way of combining results from the two learners are important. This paper proposes a new model, namely Distilling Base and Meta (DBAM) network by using self-attention mechanism and contrastive learning to enhance the few-shot segmentation performance. First, the self-attention-based ensemble module (SEM) is proposed to produce a more accurate adjustment factor for improving the fusion of two predictions of the two learners. Second, the prototype feature optimization module (PFOM) is proposed to provide an interaction between the two learners, which enhances the ability to distinguish the base classes from the target class by introducing contrastive learning loss. Extensive experiments have demonstrated that our method improves on the PASCAL-5 i under 1-shot and 5-shot settings, respectively.

Keywords

Semantic segmentation / Few-shot learning / Meta learning / Contrastive learning / Self-attention

Cite this article

Download citation ▾
Xinyue Chen, Yueyi Wang, Yingyue Xu, Miaojing Shi. Distilling base-and-meta network with contrastive learning for few-shot semantic segmentation. Autonomous Intelligent Systems, 2023, 3(1): 11 https://doi.org/10.1007/s43684-023-00058-2

References

[1]
YangZ., ShiM., XuC., FerrariV., AvrithisY.. Training object detectors from few weakly-labeled and many unlabeled images. Pattern Recognit., 2021, 120
CrossRef Google scholar
[2]
ZhangM., ShiM., LiL.. Mfnet: multiclass few-shot segmentation network with pixel-wise metric learning. IEEE Trans. Circuits Syst. Video Technol., 2022, 32(12):8586-8598
CrossRef Google scholar
[3]
Y. Du, M. Shi, F. Wei, G. Li, Boosting zero-shot learning via contrastive optimization of attribute representations (2022). arXiv preprint. arXiv:2207.03824
[4]
Z. Tian, H. Zhao, M. Shu, Z. Yang, R. Li, J. Jia, Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020). arXiv preprint. arXiv:2008.01449
[5]
WangK., LiewJ.H., ZouY., ZhouD., FengJ.. Panet: few-shot image semantic segmentation with prototype alignment. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019 9197-9206
[6]
ZhangC., LinG., LiuF., YaoR., ShenC.. Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019 5217-5226
[7]
LiG., JampaniV., Sevilla-LaraL., SunD., KimJ., KimJ.. Adaptive prototype learning and allocation for few-shot segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 8334-8343
[8]
LiuY., LiuN., CaoQ., YaoX., HanJ., ShaoL.. Learning non-target knowledge for few-shot semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022 11573-11582
[9]
ZhangC., LinG., LiuF., GuoJ., WuQ., YaoR.. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019 9587-9595
[10]
LiuY., ZhangX., ZhangS., HeX.. Part-aware prototype network for few-shot semantic segmentation. European Conference on Computer Vision, 2020 Berlin Springer 142-158
[11]
LangC., ChengG., TuB., HanJ.. Learning what not to segment: a new perspective on few-shot segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022 8057-8067
[12]
LongJ., ShelhamerE., DarrellT.. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015 3431-3440
[13]
RonnebergerO., FischerP., BroxT.. U-net: convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015 Berlin Springer 234-241
[14]
F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint. arXiv:1511.07122
[15]
ZhaoH., ShiJ., QiX., WangX., JiaJ.. Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 2881-2890
[16]
ChenL.-C., PapandreouG., KokkinosI., MurphyK., YuilleA.L.. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 40(4):834-848
CrossRef Google scholar
[17]
LifchitzY., AvrithisY., PicardS., BursucA.. Dense classification and implanting for few-shot learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019 9258-9267
[18]
TokmakovP., WangY.-X., HebertM.. Learning compositional representations for few-shot recognition. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019 6372-6381
[19]
JiangW., HuangK., GengJ., DengX.. Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol., 2020, 31(3):1091-1102
CrossRef Google scholar
[20]
YangY., WeiF., ShiM., LiG.. Restoring negative information in few-shot object detection. Adv. Neural Inf. Process. Syst., 2020, 33: 3521-3532
[21]
DuY., WeiF., ZhangZ., ShiM., GaoY., LiG.. Learning to prompt for open-vocabulary object detection with vision-language model. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022 14084-14093
[22]
O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29 (2016). arXiv preprint. arXiv:1606.04080
[23]
SantoroA., BartunovS., BotvinickM., WierstraD., LillicrapT.. Meta-learning with memory-augmented neural networks. International Conference on Machine Learning, 2016 1842-1850 PMLR
[24]
J. Snell, K. Swersky, R. Zemel, Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017). arXiv preprint. arXiv:1703.05175
[25]
SungF., YangY., ZhangL., XiangT., TorrP.H., HospedalesT.M.. Learning to compare: relation network for few-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018 1199-1208
[26]
FinnC., AbbeelP., LevineS.. Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning, 2017 1126-1135 PMLR
[27]
A. Nichol, J. Schulman, Reptile: a scalable metalearning algorithm, (2018). arXiv preprint. arXiv:1803.02999
[28]
A.A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, R. Hadsell, Meta-learning with latent embedding optimization (2018). arXiv preprint. arXiv:1807.05960
[29]
LeeK., MajiS., RavichandranA., SoattoS.. Meta-learning with differentiable convex optimization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019 10657-10665
[30]
A. Shaban, S. Bansal, Z. Liu, I. Essa, B. Boots, One-shot learning for semantic segmentation (2017). arXiv preprint. arXiv:1709.03410
[31]
ChenT., KornblithS., NorouziM., HintonG.. A simple framework for contrastive learning of visual representations. International Conference on Machine Learning, 2020 1597-1607 PMLR
[32]
WangW., ZhouT., YuF., DaiJ., KonukogluE., Van GoolL.. Exploring cross-image pixel contrast for semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021 7303-7313
[33]
AgarwalM., YurochkinM., SunY.. On sensitivity of meta-learning to support data. Adv. Neural Inf. Process. Syst., 2021, 34: 20447-20460
[34]
EveringhamM., Van GoolL., WilliamsC.K., WinnJ., ZissermanA.. The Pascal visual object classes (voc) challenge. Int. J. Comput. Vis., 2010, 88(2):303-338
CrossRef Google scholar
[35]
HariharanB., ArbeláezP., BourdevL., MajiS., MalikJ.. Semantic contours from inverse detectors. 2011 International Conference on Computer Vision, 2011 New York IEEE Press 991-998
CrossRef Google scholar
[36]
ZhangX., WeiY., YangY., HuangT.S.. Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern., 2020, 50(9):3855-3865
CrossRef Google scholar
[37]
HeK., ZhangX., RenS., SunJ.. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016 770-778

Accesses

Citations

Detail

Sections
Recommended

/