Enhancing object detection through global collaborative learning
Weidong Zhao , Jian Chen , Xianhui Liu , Jiahuan Liu
Autonomous Intelligent Systems ›› 2025, Vol. 5 ›› Issue (1) : 29
Enhancing object detection through global collaborative learning
Object detection serves as a challenging yet crucial task in computer vision. Despite significant advancements, modern detectors remain struggling with task alignment between localization and classification. In this paper, Global Collaborative Learning (GCL) is introduced to address these challenges from often-overlooked perspectives. First, the essence of GCL is reflected in the label assignment of the detector. Adjusting the loss function to transform samples with strong localization yet weak classification into high-quality samples in both tasks, provides more effective training signals, enabling the model to capture key consistent features. Second, the spirit of GCL is embodied in the head design. By enabling global feature interaction within the decoupled head, the approach ensures that final predictions are made more comprehensively and robustly, thereby preventing the two independent branches from converging into suboptimal solutions for their respective tasks. Extensive experiments on the challenging MS COCO and CrowdHuman datasets demonstrate that the proposed GCL method substantially enhances performance and generalization capabilities.
Object detection / Global collaborative learning / Task alignment / Label assignment / Feature interaction
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
J.-W. Ma, M. Liang, L. Chen, S. Tian, S.-L. Chen, J. Qin, X.-C. Yin, Sample weighting with hierarchical equalization loss for dense object detection. IEEE Trans. Multimed. (2023) |
| [12] |
X. Tang, Q. Yang, X. Zhang, W. Deng, H. Wang, X. Gao, A refinement method for single-stage object detection based on progressive decoupled task alignment. IEEE Trans. Circuits Syst. Video Technol. (2023) |
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, J. Sun, Crowdhuman: a benchmark for detecting human in a crowd. arXiv preprint (2018). arXiv:1805.00123 |
| [29] |
K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C.C. Loy, D. Lin, MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint (2019). arXiv:1906.07155 |
| [30] |
|
| [31] |
P. Goyal, P. Doll’ar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, K. He, Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint (2017). arXiv:1706.02677 |
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
The Author(s)
/
| 〈 |
|
〉 |