Enhancing object detection through global collaborative learning

Weidong Zhao , Jian Chen , Xianhui Liu , Jiahuan Liu

Autonomous Intelligent Systems ›› 2025, Vol. 5 ›› Issue (1) : 29

PDF
Autonomous Intelligent Systems ›› 2025, Vol. 5 ›› Issue (1) :29 DOI: 10.1007/s43684-025-00114-z
Original Article
research-article

Enhancing object detection through global collaborative learning

Author information +
History +
PDF

Abstract

Object detection serves as a challenging yet crucial task in computer vision. Despite significant advancements, modern detectors remain struggling with task alignment between localization and classification. In this paper, Global Collaborative Learning (GCL) is introduced to address these challenges from often-overlooked perspectives. First, the essence of GCL is reflected in the label assignment of the detector. Adjusting the loss function to transform samples with strong localization yet weak classification into high-quality samples in both tasks, provides more effective training signals, enabling the model to capture key consistent features. Second, the spirit of GCL is embodied in the head design. By enabling global feature interaction within the decoupled head, the approach ensures that final predictions are made more comprehensively and robustly, thereby preventing the two independent branches from converging into suboptimal solutions for their respective tasks. Extensive experiments on the challenging MS COCO and CrowdHuman datasets demonstrate that the proposed GCL method substantially enhances performance and generalization capabilities.

Keywords

Object detection / Global collaborative learning / Task alignment / Label assignment / Feature interaction

Cite this article

Download citation ▾
Weidong Zhao, Jian Chen, Xianhui Liu, Jiahuan Liu. Enhancing object detection through global collaborative learning. Autonomous Intelligent Systems, 2025, 5(1): 29 DOI:10.1007/s43684-025-00114-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

He K., Zhang X., Ren S., Sun J.. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016770-778

[2]

Lin T.-Y., Doll’ar P., Girshick R., He K., Hariharan B., Belongie S.. Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20172117-2125

[3]

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I.. Attention is all you need. Advances in Neural Information Processing Systems, 201730

[4]

Ren S., He K., Girshick R., Sun J.. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell., 2016, 39(6): 1137-1149.

[5]

Tian Z., Shen C., Chen H., He T.. Fcos: fully convolutional one-stage object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 20199626-9635.

[6]

Zhang S., Chi C., Yao Y., Lei Z., Li S.Z.. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20209759-9768

[7]

Li X., Wang W., Wu L., Chen S., Hu X., Li J., Tang J., Yang J.. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst., 2020, 33: 21002-21012

[8]

Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S.. End-to-end object detection with transformers. European Conference on Computer Vision, 2020, Berlin. Springer213-229

[9]

Li S., He C., Li R., Zhang L.. A dual weighting label assignment scheme for object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20229387-9396

[10]

Feng C., Zhong Y., Gao Y., Scott M.R., Huang W.. Tood: task-aligned one-stage object detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 20213490-3499. IEEE Computer Society

[11]

J.-W. Ma, M. Liang, L. Chen, S. Tian, S.-L. Chen, J. Qin, X.-C. Yin, Sample weighting with hierarchical equalization loss for dense object detection. IEEE Trans. Multimed. (2023)

[12]

X. Tang, Q. Yang, X. Zhang, W. Deng, H. Wang, X. Gao, A refinement method for single-stage object detection based on progressive decoupled task alignment. IEEE Trans. Circuits Syst. Video Technol. (2023)

[13]

Lin W., Chu J., Leng L., Miao J., Wang L.. Feature disentanglement in one-stage object detection. Pattern Recognit., 2024, 145109878

[14]

Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A.C.. Ssd: single shot multibox detector. Computer Vision-ECCV 2016: 14th European Conference, 2016, Berlin. Springer21-37. Proceedings, Part I14

[15]

Kong T., Sun F., Liu H., Jiang Y., Li L., Shi J.. Foveabox: beyond anchor-based object detection. IEEE Trans. Image Process., 2020, 29: 7389-7398.

[16]

Zhang X., Wan F., Liu C., Ji R., Ye Q.. Freeanchor: learning to match anchors for visual object detection. Advances in Neural Information Processing Systems, 201932

[17]

Li H., Wu Z., Zhu C., Xiong C., Socher R., Davis L.S.. Learning from noisy anchors for one-stage object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202010588-10597

[18]

Gao Z., Wang L., Wu G.. Mutual supervision for dense object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20213641-3650

[19]

Zhang Y., Luo C.. A dynamic label assignment strategy for one-stage detectors. Neurocomputing, 2024, 577127383

[20]

Lin T.-Y., Goyal P., Girshick R., He K., Doll’ar P.. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 20172980-2988

[21]

Zhang H., Wang Y., Dayoub F., Sunderhauf N.. Var-ifocalnet: an iou-aware dense object detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20218514-8523

[22]

Cao Y., Chen K., Loy C.C., Lin D.. Prime sample attention in object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202011583-11591

[23]

Girshick R.. Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, 20151440-1448

[24]

Wu Y., Chen Y., Yuan L., Liu Z., Wang L., Li H., Fu Y.. Rethinking classification and localization for object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202010186-10195

[25]

Song G., Liu Y., Wang X.. Revisiting the sibling head in object detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202011563-11572

[26]

Dai J., Qi H., Xiong Y., Li Y., Zhang G., Hu H., Wei Y.. Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, 2017764-773

[27]

Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Doll’ar P., Zitnick C.L.. Microsoft coco: common objects in context. Computer Vision-ECCV 2014: 13th European Conference, 2014, Berlin. Springer740-755. Proceedings, Part V13

[28]

S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, J. Sun, Crowdhuman: a benchmark for detecting human in a crowd. arXiv preprint (2018). arXiv:1805.00123

[29]

K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C.C. Loy, D. Lin, MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint (2019). arXiv:1906.07155

[30]

Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L.. Imagenet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009248-255. IEEE

[31]

P. Goyal, P. Doll’ar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, K. He, Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint (2017). arXiv:1706.02677

[32]

Sun P., Zhang R., Jiang Y., Kong T., Xu C., Zhan W., Tomizuka M., Li L., Yuan Z., Wang C., Luo P.. Sparse r-cnn: end-to-end object detection with learnable proposals. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 202114449-14458.

[33]

Jiang B., Luo R., Mao J., Xiao T., Jiang Y.. Acquisition of localization confidence for accurate object detection. Proceedings of the European Conference on Computer Vision (ECCV), 2018784-799

[34]

Kang K., Lee H.S.. Probabilistic anchor assignment with iou prediction for object detection. Computer Vision-ECCV 2020: 16th European Conference, 2020, Berlin. Springer355-371Proceedings, Part XXV16

[35]

Dai X., Chen Y., Xiao B., Chen D., Liu M., Yuan L., Zhang L.. Dynamic head: unifying object detection heads with attentions. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20217373-7382

Funding

Key Technologies Research and Development Program(NO.2022YFB3305700)

RIGHTS & PERMISSIONS

The Author(s)

PDF

31

Accesses

0

Citation

Detail

Sections
Recommended

/