In the field of object detection, it is challenging to achieve a balance between neck complexity and accuracy. To address this issue, we propose an efficient and decoupled neck module called shared feature pyramid network (Shared-FPN). Not only does Shared-FPN not increase the number of model parameters and floating point operations per second (FLOPs), but it also can be easily ported to any of the detection models. It improves on path aggregation feature pyramid network (PAFPN) by using transposed convolution with a large convolution kernel as the upsampling module, designing spatial pyramidal pooling-fast downsampling (SPPFD) based on shared pooling, and designing shared convolution as the right part module. To evaluate the performance of Shared-FPN in object detection tasks, we conducted experiments on object detection datasets. The results show that Shared-FPN achieved excellent performance across all sizes. In particular, on the VOC 2012 dataset, the Shared-FPN’s mean average precision (mAP) was improved by 11.2% compared to FPN with you only look once extended-s (YOLOX-s) as the detector. On the COCO dataset, the Shared-FPN’s mAP was improved by 7.8% compared to FPN and 7.1% compared to PAFPN with faster region-based convolutional neural network (Faster RCNN) as the detector. The Shared-FPN can be easily inserted into any of the detectors for better performance in various scenarios, such as small, medium or large objects.
| [1] |
Zhou Y S, Liu Q, Zhu H Z, et al.. Exploiting ground depth estimation for mobile monocular 3D object detection. IEEE transactions on pattern analysis and machine intelligence, 2025, 47(3): 3079-3093 J]
|
| [2] |
Tang H, Li Z C, Zhang D, et al.. Divide-and-conquer: confluent triple-flow network for RGB-T salient object detection. IEEE transactions on pattern analysis and machine intelligence, 2025, 47(3): 1958-1974 J]
|
| [3] |
Zhang X, Chen Z, Zhang J, et al.. Learning general and specific embedding with transformer for few-shot object detection. International journal of computer vision, 2025, 133(2): 968-984 J]
|
| [4] |
Huang Y, Chen J, Huang D. UFPMP-DET: toward accurate and efficient object detection on drone imagery. Proceedings of the 36th AAAI Conference on Artificial Intelligence, February 22–March 1, 2022, Virtual, 2022Palo AltoAAAI Press1026-1033[C]
|
| [5] |
Ren S, He K, Girshick R, et al.. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149 J]
|
| [6] |
GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-07-18) [2025-09-30]. https://arxiv.org/abs/2107.08430.
|
| [7] |
Ghiasi G, Lin T Y, Le Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16–20, 2019, Long Beach, CA, USA, 2019New YorkIEEE7036-7045[C]
|
| [8] |
Qiao S, Chen L C, Yuille A. Detectors: detecting objects with recursive feature pyramid and switchable Atrous convolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 20–25, 2021, Virtual, 2021New YorkIEEE10213-10224[C]
|
| [9] |
CHEN K, CAO Y, LOY C C, et al. Feature pyramid grids[EB/OL]. (2020-04-07) [2025-09-30]. https://arxiv.org/abs/2004.03580.
|
| [10] |
Yu J, Gao H, Zhou D, et al.. Deep temporal model-based identity-aware hand detection for space human–robot interaction. IEEE transactions on cybernetics, 2021, 52(12): 13738-13751 J]
|
| [11] |
Yu J, Gao H, Chen Y, et al.. Deep object detector with attentional spatiotemporal LSTM for space human–robot interaction. IEEE transactions on human-machine systems, 2022, 52(4): 784-793 J]
|
| [12] |
He K, Zhang X, Ren S, et al.. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916 J]
|
| [13] |
Dong X, Li S, Zhang J. YOLOv5s object detection based on sim SPPF hybrid pooling. Optoelectronics letters, 2024, 20(6): 367-371 J]
|
| [14] |
Wu D, Liao M W, Zhang W T, et al.. YOLOP: you only look once for panoptic driving perception. Machine intelligence research, 2022, 19(6): 550-562 J]
|
| [15] |
Cheng T, Song L, Ge Y, et al.. YOLO-World: real-time open-vocabulary object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16–22, 2024, Seattle, WA, USA, 2024New YorkIEEE16901-16911[C]
|
| [16] |
XU S, WANG X, LV W, et al. PP-YOLOE: an evolved version of YOLO[EB/OL]. (2022-03-30) [2025-09-30]. https://arxiv.org/abs/2203.16250.
|
| [17] |
Shi Z, Hu J, Ren J, et al.. HS-FPN: high frequency and spatial perception FPN for tiny object detection. Proceedings of the 39th AAAI Conference on Artificial Intelligence, February 25–March 4, 2025, Virtual, 2025Palo AltoAAAI Press6896-6904[C]
|
| [18] |
Guo C X, Fan B, Zhang Q, et al.. AugFPN: improving multi-scale feature learning for object detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13–19, 2020, Seattle, WA, USA, 2020New YorkIEEE12592-12601[C]
|
| [19] |
Wu Y, Chen Y P, Yuan L, et al.. Rethinking classification and localization for object detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13–19, 2020, Seattle, WA, USA, 2020New YorkIEEE10183-10192[C]
|
| [20] |
Zand M, Etemad A, Greenspan M. ObjectBox: from centers to boxes for anchor-free object detection. European Conference on Computer Vision, October 23–27, 2022, Tel Aviv, Israel, 2022ChamSpringer Nature Switzerland390-406[C]
|
RIGHTS & PERMISSIONS
Tianjin University of Technology