Object detection in seriously degraded images with unbalanced training samples
Sheng Liu , Jiayu Shen , Shengyue Huang
Optoelectronics Letters ›› 2021, Vol. 17 ›› Issue (9) : 564 -571.
Object detection in seriously degraded images with unbalanced training samples
Uncertain environments, especially uneven lighting and shadows, can degrade an image, which causes a great negative impact on object detection. Moreover, unbalanced training samples can cause overfitting problem. Since available data that is collected at night is much rarer than that collected in the daytime, the nighttime detection effect will be relatively poor. In this paper, we propose a novel data augmentation method named Mask Augmentation, which reduces the brightness and contrast of objects, and also weakens the edge of objects to simulate the degraded scene. In addition, we propose a new architecture, by adding a classification loss branch and a feature extraction module named Multi-Feature Attention Module, which combines the attention mechanism and feature fusion on the basis of Darknet-53. This architecture makes the features extracted in daytime and nighttime images distinguishable. We also increase the loss weight of nighttime images during the training process. We achieved 78.68% mAP on nighttime detection and 73.14% mAP on daytime detection. Compared with other models, our method greatly improves the accuracy of nighttime detection, and also performs satisfactorily on daytime detection. We deployed our model on an intelligent garbage collection robot for real-time detection, which implements automatic picking at night and assists cleaning staff during the day.
| [1] |
|
| [2] |
Cai Z and Vasconcelos N, Cascade R-CNN: Delving into High Quality Object Detection, arXiv:1712.00726v1, 2018. |
| [3] |
Redmon J and Farhadi A, YOLOv3: An Incremental Improvement, arXiv:1804.02767, 2018. |
| [4] |
Tian Z, Shen C, Chen H and He T, Fcos: Fully Convolutional One-Stage Object Detection, IEEE International Conference on Computer Vision, 9627 (2019). |
| [5] |
Kong T, Sun F, Liu H, Jiang Y and Shi J, Foveabox: Beyond Anchor-Based Object Detector, arXiv:1904.03797, 2019. |
| [6] |
Yang Z, Liu S, Hu H, Wang L and Lin S, Reppoints: Point Set Representation for Object Detection, IEEE International Conference on Computer Vision, 9657 (2019). |
| [7] |
Proença P F and Simões P, TACO: Trash Annotations in Context for Litter Detection, arXiv:2003.06975, 2020. |
| [8] |
|
| [9] |
Platt J, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, 1998. |
| [10] |
Girshick R, Donahue J, Darrell T and Malik J, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, IEEE Conference on computer vision and pattern recognition, 580 (2014). |
| [11] |
Girshick R, Fast R-CNN, IEEE International Conference on Computer Vision, 1440 (2015). |
| [12] |
Redmon J, Divvala S, Girshick R and Farhadi A, You Only Look Once: Unified, Real-Time Object Detection, IEEE Conference on Computer Vision and Pattern Recognition, 779 (2016). |
| [13] |
Redmon J and Farhadi A, YOLO9000: Better, Faster, Stronger, IEEE Conference on Computer Vision and Pattern Recognition, 7263 (2017). |
| [14] |
Lin T Y, Goyal P, Girshick R, He K and Dollár P, Focal Loss for Dense Object Detection, IEEE International Conference on Computer Vision, 2980 (2017). |
| [15] |
Zhou X, Wang D and Krähenbühl P, Objects as Points, Computer Vision and Pattern Recognition, arXiv:1904.07850, 2019. |
| [16] |
Fan D P, Ji G P, Sun G, Cheng MM and L Shao, Camouflaged Object Detection, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. |
| [17] |
Fan Q, Zhuo W, Tang C K and Tai Y W, Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. |
| [18] |
Cao J, Cholakkal H, Anwer R M, Khan F S and Shao L, D2Det: Towards High Quality Object Detection and Instance Segmentation, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. |
| [19] |
Ioffe S and Szegedy C, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167, 2015. |
| [20] |
Maas A L, Hannun A Y and Ng A Y, Rectifier Nonline-arities Improve Neural Network Acoustic Models, ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 3 (2013). |
| [21] |
He K, Zhang X, Ren S and Sun J, Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition, 770 (2016). |
| [22] |
Neubeck A and Van Gool L, Efficient Non-Maximum Suppression, 18th International Conference on Pattern Recognition, 850 (2006). |
| [23] |
Lin M, Chen Q and Yan S, Network in Network, arXiv:1312.4400, 2013. |
| [24] |
Lin T Y, Maire M, Belongie S, Hays J and Zitnick C, Microsoft COCO: Common Objects in Context, European Conference on Computer Vision, 740 (2014). |
| [25] |
Deng J, Dong W, Socher R, Li L and Li F, Imagenet: A Large-Scale Hierarchical Image Database, IEEE Conference on Computer Vision and Pattern Recognition, 248 (2009). |
| [26] |
Bochkovskiy A, Wang C Y and Liao H Y M, YOLOv4: Optimal Speed and Accuracy of Object Detection, arXiv:2004.10934, 2020. |
/
| 〈 |
|
〉 |