Balanced ID-OOD tradeoff transfer makes query based detectors good few shot learners

Yuantao Yin , Ping Yin , Xue Xiao , Liang Yan , Siqing Sun , Xiaobo An

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (1) : 100237

PDF (1220KB)
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (1) : 100237 DOI: 10.1016/j.hcc.2024.100237
Research Articles
research-article

Balanced ID-OOD tradeoff transfer makes query based detectors good few shot learners

Author information +
History +
PDF (1220KB)

Abstract

Fine-tuning is a popular approach to solve the few-shot object detection problem. In this paper, we attempt to introduce a new perspective on it. We formulate the few-shot novel tasks as a type of distribution shifted from its ground-truth distribution. We introduce the concept of imaginary placeholder masks to show that this distribution shift is essentially a composite of in-distribution (ID) and out-of-distribution(OOD) shifts. Our empirical investigation results show that it is significant to balance the trade-off between adapting to the available few-shot distribution and keeping the distribution-shift robustness of the pre-trained model. We explore improvements in the few-shot fine-tuning transfer in the few-shot object detection (FSOD) settings from three aspects. First, we explore the LinearProbe-Finetuning (LP-FT) technique to balance this trade-off to mitigate the feature distortion problem. Second, we explore the effectiveness of utilizing the protection freezing strategy for query-based object detectors to keep their OOD robustness. Third, we try to utilize ensembling methods to circumvent the feature distortion. All these techniques are integrated into a whole method called BIOT (Balanced ID-OOD Transfer). Evaluation results show that our method is simple yet effective and general to tap the FSOD potential of query-based object detectors. It outperforms the current SOTA method in many FSOD settings and has a promising scaling capability.

Keywords

Few shot learning / Object detection / Transfer learning

Cite this article

Download citation ▾
Yuantao Yin, Ping Yin, Xue Xiao, Liang Yan, Siqing Sun, Xiaobo An. Balanced ID-OOD tradeoff transfer makes query based detectors good few shot learners. High-Confidence Computing, 2025, 5(1): 100237 DOI:10.1016/j.hcc.2024.100237

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Yuantao Yin: Conceptualization, Methodology, Software. Ping Yin: Formal analysis, Data curation, Writing - Original draft preparation. Xue Xiao: Visualization, Investigation. Liang Yan: Supervision, Project administration. Siqing Sun: Software, Validation. Xiaobo An: Data curation, Writing - Reviewing and Editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.hcc.2024.100237. To learn about more details and performance results of our method, please refer to the appendix file of this paper. The appendix of this paper has six sections. In appendix A, we offer implementation details. And more performance comparison results are provided in appendix B. Appendix C shows the effectiveness of the weight space ensembling method in improving the OOD robustness. We discuss about limitations and future works in appendix D. Qualitative and quantitative results are demonstrated in appendix E and appendix F correspondingly.

References

[1]

L. Xie, X. Zhang, L. Wei, J. Chang, Q. Tian, What is considered complete for visual recognition? 2021, arXiv preprint arXiv:2105.13978.

[2]

A. Gupta, P. Dollar, R. Girshick, Lvis: A dataset for large vocabulary instance segmentation, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2019-June, 2019, pp. 5351-5359, http://dx.doi.org/10.1109/CVPR.2019.00550.

[3]

B.M. Lake, R. Salakhutdinov, J.B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science 350 (6266) (2015) 1332-1338, http://dx.doi.org/10.1126/science.aab3050, arXiv:science.aab3050

[4]

J.D.M.-W.C. Kenton, L.K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding,in:Proceedings of NAACL-HLT, 2019, pp. 4171-4186.

[5]

T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst. 33 (2020) 1877-1901.

[6]

T. Schick, H. Schütze, It’s not just size that matters: Small language models are also few-shot learners, in: NAACL-HLT, 2021.

[7]

A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision,in:International Conference on Machine Learning, PMLR, 2021, pp. 8748-8763.

[8]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).

[9]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European Conference on Computer Vision, Springer, 2020, pp. 213-229.

[10]

P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, et al., Sparse r-cnn: End-to-end object detection with learnable proposals,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14454-14463.

[11]

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable transformers for end-to-end object detection,in:International Conference on Learning Representations, 2020.

[12]

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L.M. Ni, H.-Y. Shum, DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection, 2022, arXiv preprint arXiv:2203.03605.

[13]

S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, J. Sun, Objects365:A large-scale, high-quality dataset for object detection, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8430-8439.

[14]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. 28 (2015).

[15]

O. Siméoni, G. Puy, H.V. Vo, S. Roburin, S. Gidaris, A. Bursuc, P. Pérez, R. Marlet, J. Ponce, Localizing objects with self-supervised transformers and no labels, in:BMVC-British Machine Vision Conference, 2021.

[16]

S.X. Hu, D. Li, J. Stühmer, M. Kim, T.M. Hospedales, Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9068-9077.

[17]

G. Zhang, Z. Luo, K. Cui, S. Lu, E.P. Xing, Meta-DETR: Image-level few-shot detection with inter-class correlation exploitation, IEEE Trans. Pattern Anal. Mach. Intell. (2022).

[18]

A. Kumar, A. Raghunathan, R.M. Jones, T. Ma, P. Liang, Fine-tuning can distort pretrained features and underperform out-of-distribution, in:International Conference on Learning Representations, 2021.

[19]

H. Pham, Z. Dai, G. Ghiasi, H. Liu, A.W. Yu, M.-T. Luong, M. Tan, Q.V. Le, Combined scaling for zero-shot transfer learning, 2021, arXiv preprint arXiv:2111.10050.

[20]

M. Wortsman, G. Ilharco, J.W. Kim, M. Li, S. Kornblith, R. Roelofs, R.G. Lopes, H. Hajishirzi, A. Farhadi, H. Namkoong, et al., Robust fine-tuning of zero-shot models,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7959-7971.

[21]

Y. Lee, A.S. Chen, F. Tajwar, A. Kumar, H. Yao, P. Liang, C. Finn, Surgical fine-tuning improves adaptation to distribution shifts, in:NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications.

[22]

J. Devlin, M. Chang, K. Lee, K. Toutanova,BERT: Pre-training of deep bidirectional transformers for language understanding,in: J. Burstein, C.Doran, T.Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171-4186, http://dx.doi.org/10.18653/v1/n19-1423.

[23]

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context,in: European Conference on Computer Vision, Springer, 2014, pp. 740-755.

[24]

J. Lu, P. Gong, J. Ye, C. Zhang, Learning from very few samples: A survey, 2020, pp. 1-30, arXiv preprint arXiv:2009.02653.

[25]

G. Huang, I. Laradji, D. Vazquez, S. Lacoste-Julien, P. Rodriguez, A survey of self-supervised and few-shot object detection, 2021, pp. 1-20, arXiv preprint arXiv:2110.14711.

[26]

J. Snell, K. Swersky, R. Zemel, Prototypical networks for few-shot learning, Adv. Neural Inf. Process. Syst. 2017-December (2017) 4078-4088, arXiv: 1703.05175.

[27]

O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, D. Wierstra, Matching networks for one shot learning, Adv. Neural Inf. Process. Syst. (2016) 3637-3645, arXiv:1606.04080.

[28]

C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in:34th International Conference on Machine Learning, Vol. 3, ICML 2017, 2017, pp. 1856-1868, arXiv:1703.03400.

[29]

H.J. Ye, H. Hu, D.C. Zhan, F. Sha, Few-shot learning via embedding adaptation with set-to-set functions, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 8805-8814, http://dx.doi.org/10.1109/CVPR42600.2020.00883, arXiv:1812. 03664.

[30]

Y.-X. Wang, R. Girshick, M. Hebert, B. Hariharan, Low-shot learning from imaginary data, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7278-7286.

[31]

J. Wu, T. Zhang, Y. Zhang, F. Wu, Task-aware part mining network for fewshot learning, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8433-8442.

[32]

Y. Zou, S. Zhang, K. Chen, Y. Tian, Y. Wang, J.M. Moura, Compositional fewshot recognition with primitive discovery and enhancing, in:Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 156-164.

[33]

A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, N. Houlsby, Big transfer (bit): General visual representation learning,in: European Conference on Computer Vision, Springer, 2020, pp. 491-507.

[34]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16 × 16 words: Transformers for image recognition at scale,in:9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, 2021, URL https://openreview.net/forum?id=YicbFdNTTy.

[35]

L. Yuan, D. Chen, Y.-L. Chen, N. Codella, X. Dai, J. Gao, H. Hu, X. Huang, B. Li, C. Li, et al., Florence: A new foundation model for computer vision, 2021, arXiv preprint arXiv:2111.11432.

[36]

B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, T. Darrell, Few-shot object detection via feature reweighting, in: Proceedings of the IEEE International Conference on Computer Vision, Vol. 2019-October, 2019, pp. 8419-8428, http://dx.doi.org/10.1109/ICCV.2019.00851, arXiv:1812.01866.

[37]

J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2017-January, CVPR 2017, 2017, pp. 6517-6525, http://dx.doi.org/10.1109/CVPR.2017.690, arXiv:1612.08242.

[38]

X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, L. Lin, Meta R-CNN: Towards general solver for instance-level low-shot learning, in: Proceedings of the IEEE International Conference on Computer Vision, Vol. 2019-October, 2019, pp. 9576-9585, http://dx.doi.org/10.1109/ICCV.2019.00967, arXiv: 1909.13032.

[39]

Y. Xiao, R. Marlet, Few-shot object detection and viewpoint estimation for objects in the wild, in: European Conference on Computer Vision, Springer, 2020, pp. 192-210.

[40]

G. Zhang, Z. Luo, K. Cui, S. Lu, Meta-detr: Few-shot object detection via unified image-level meta-learning, 2021, 2(6). arXiv preprint arXiv: 2103. 11731.

[41]

X. Wang, T.E. Huang, T. Darrell, J.E. Gonzalez, F. Yu, Frustratingly simple few-shot object detection, in:37th International Conference on Machine Learning, Vol. PartF168147-13, ICML 2020, 2020, pp. 9861-9870.

[42]

L. Qiao, Y. Zhao, Z. Li, X. Qiu, J. Wu, C. Zhang, DeFRCN: Decoupled faster R-CNN for few-shot object detection, in: ICCV 2021, 2021, http://dx.doi.org/10.1109/iccv48922.2021.00856, arXiv:2108.09017.

[43]

Y. Cao, J. Wang, Y. Jin, T. Wu, K. Chen, Z. Liu, D. Lin, Few-shot object detection via association and discrimination, Adv. Neural Inf. Process. Syst. 34 (2021).

[44]

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.

[45]

Y. Zhong, J. Yang, P. Zhang, C. Li, N. Codella, L.H. Li, L. Zhou, X. Dai, L. Yuan, Y. Li, et al., Regionclip: Region-based language-image pretraining,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16793-16803.

[46]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Ieee, 2009, pp. 248-255.

[47]

A. Achille, M. Lam, R. Tewari, A. Ravichandran, S. Maji, C.C. Fowlkes, S. Soatto, P. Perona, Task2vec: Task embedding for meta-learning,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6430-6439.

[48]

J. Ma, G. Han, S. Huang, Y. Yang, S.-F. Chang, Few-shot end-to-end object detection via constantly concentrated encoding across heads, in: European Conference on Computer Vision, Springer, 2022, pp. 57-73.

[49]

T.G. Dietterich, Ensemble methods in machine learning, in: International Workshop on Multiple Classifier Systems, Springer, 2000, pp. 1-15.

[50]

M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vis. 88 (2) (2010) 303-338.

[51]

W. Xiong, L. Liu, CD-FSOD: A benchmark for cross-domain few-shot object detection, 2022, arXiv preprint arXiv:2210.05311.

[52]

K. Lee, H. Yang, S. Chakraborty, Z. Cai, G. Swaminathan, A. Ravichandran, O. Dabeer, Rethinking few-shot object detection on a multi-domain benchmark, in: European Conference on Computer Vision, Springer, 2022, pp. 366-382.

[53]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012-10022.

[54]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[55]

J. Wu, S. Liu, D. Huang, Y. Wang, Multi-scale positive sample refinement for few-shot object detection, in: European Conference on Computer Vision, Springer, 2020, pp. 456-472.

[56]

G. Han, S. Huang, J. Ma, Y. He, S.-F. Chang, Meta faster r-cnn: Towards accurate few-shot object detection with attentive feature alignment,in:Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 1, 2022, pp. 780-789.

[57]

S. Wu, W. Pei, D. Mei, F. Chen, J. Tian, G. Lu, Multi-faceted distillation of base-novel commonality for few-shot object detection, in: European Conference on Computer Vision, Springer, 2022, pp. 578-594.

[58]

X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, L. Lin, Meta r-cnn: Towards general solver for instance-level low-shot learning,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9577-9586.

[59]

T.-I. Chen, Y.-C. Liu, H.-T. Su, Y.-C. Chang, Y.-H. Lin, J.-F. Yeh, W.-C. Chen, W. Hsu, Dual-awareness attention for few-shot object detection, IEEE Trans. Multimed. (2021).

[60]

H. Hu, S. Bai, A. Li, J. Cui, L. Wang, Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, pp. 10180-10189, http://dx.doi.org/10.1109/CVPR46437.2021.01005, arXiv:2103.17115.

[61]

A. Li, Z. Li, Transformation invariant few-shot object detection, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3094-3102.

[62]

C. Zhu, F. Chen, U. Ahmed, Z. Shen, M. Savvides, Semantic relation reasoning for shot-stable few-shot object detection, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8782-8791.

[63]

P. Kaul, W. Xie, A. Zisserman, Label, verify, correct: A simple few shot object detection method, 2021, arXiv preprint arXiv:2112.05749.

[64]

B. Sun, B. Li, S. Cai, Y. Yuan, C. Zhang, FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, pp. 7348-7358, http://dx.doi.org/10.1109/CVPR46437.2021.00727, arXiv:2103. 05950.

[65]

Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, et al., Swin transformer v2: Scaling up capacity and resolution,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12009-12019.

AI Summary AI Mindmap
PDF (1220KB)

204

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/