High accuracy object detection via bounding box regression network

Lipeng SUN, Shihua ZHAO, Gang LI, Binbing LIU

Front. Optoelectron. ›› 2019, Vol. 12 ›› Issue (3) : 324-331.

PDF(1565 KB)
PDF(1565 KB)
Front. Optoelectron. ›› 2019, Vol. 12 ›› Issue (3) : 324-331. DOI: 10.1007/s12200-019-0853-1

High accuracy object detection via bounding box regression network

Author information +
History +


As one of the primary computer vision problems, object detection aims to find and locate semantic objects in digital images. Different with object classification, which only recognizes an object to a certain class, object detection also needs to extract accurate locations of objects. In the state-of-the-art object detection algorithms, bounding box regression plays a critical role in order to achieve high localization accuracy. Almost all the popular deep learning based object detection algorithms have utilized bounding box regression for fine tuning of object locations. However, while bounding box regression is widely used, there is few study focused on the underlying rationale, performance dependencies, and performance evaluation. In this paper, we proposed a dedicated deep neural network for bounding box regression, and presented several methods to improve its performance. Some ad hoc experiments are conducted to prove the effectiveness of the network. Also, we apply the network as an auxiliary module to the faster R-CNN algorithm and test them on some real-world images. Experiment results show certain performance improvements on detection accuracy in term of mean IOU.


deep learning / object detection / bounding box regression / IOU distribution

Cite this article

Download citation ▾
Lipeng SUN, Shihua ZHAO, Gang LI, Binbing LIU. High accuracy object detection via bounding box regression network. Front. Optoelectron., 2019, 12(3): 324‒331 https://doi.org/10.1007/s12200-019-0853-1


Mikolajczyk K, Schmid C. An affine invariant interest point detector. Vancouver, Canada. In: Proceedings of European Conference on Computer Vision. Beilin: Springer, 2002, 128–142
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005, 886–893
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110
CrossRef Google scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M S, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252
CrossRef Google scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2015, 770–778
Han J, Zhang D, Cheng G, Liu N, Xu D. Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Processing Magazine, 2018, 35(1): 84–100
CrossRef Google scholar
Jiang H, Cheng M M, Li S J, Borji A, Wang J. Joint salient object detection and existence prediction. Frontiers of Computer Science, 2018, https://doi.org/10.1007/s11704-017-6613-8
CrossRef Google scholar
Girshick R B, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014, 580–587
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149
CrossRef Pubmed Google scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S E, Fu C, Berg A C. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016, 21–37
Redmon J, Divvala S K, Girshick R B, Farhadi A. You only look once: unified, real-time object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016, 779–788
Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017, 936–944
He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, doi:10.1109/TPAMI.2018.2844175
Everingham M, Eslami S M A, Gool L V, Williams C K I, Winn J, Zisserman A. The pascal visual object classes challenge: A Retrospective. International Journal of Computer Vision, 2015, 111(1): 98–136
CrossRef Google scholar
Uijlings J R R, van de Sande K E A, Gevers T, Smeulders A W M. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2): 154–171
CrossRef Google scholar
Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645
CrossRef Pubmed Google scholar
Park H M, Cho D Y, Yoon K J. Greedy refinement of object proposals via boundary-aligned minimum bounding box search. IET Computer Vision, 2018, 12(3): 357–363
CrossRef Google scholar
Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338
CrossRef Google scholar
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012, 3354–3361
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, arXiv:1409.1556
Chen Z, Zhang T, Ouyang C. End-to-end airplane detection using transfer learning in remote sensing images. Remote Sensing, 2018, 10(1): 139
CrossRef Google scholar
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359
CrossRef Google scholar
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009, 248–255


2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1565 KB)




