A similarity-guided segmentation model for garbage detection under road scene
Caiyun Zheng, Danhua Cao, Cheng Hu
A similarity-guided segmentation model for garbage detection under road scene
The development of computer vision technology provides a possible path for realizing intelligent control of road sweepers to reduce energy waste in urban street cleaning work. For garbage segmentation of seven categories under road scene, we introduce an efficient deep-learning-based method. Our model follows a lightweight structure with a feature pyramid attention (FPA) module employed in the decoder to enhance feature integration at multi-levels. Besides, a similarity guidance (SG) module is added to the decoder branches, which calculates the cosine distance between learned prototypes and feature maps to guide the segmentation results from a metric learning perspective. Our model has less than 3 M parameters and can run at over 65 FPS in an RTX 2070 GPU. Experimental results demonstrate that our method can yield competitive results in terms of speed and accuracy trade-off, with overall mean intersection-over-union (mIoU) reaching 0.87 and 0.67, respectively, on two garbage data sets we built. Besides, our model can perform acceptable category-balanced segmentation from less than 20 annotated images per category by introducing the SG module.
Machine vision / Semantic segmentation / Garbage segmentation
[1] |
Min, H. , Zhu, X. , Yan, B. : Research on visual algorithm of road garbage based on intelligent control of road sweeper. J. Phys. Conf. Ser. 1302 (3), 032024 (2019)
|
[2] |
Rad, M.S. , Kaenel, A.V. , Droux, A. : A computer vision system to localize and classify wastes on the streets. In: Proceedings of International Conference on Computer Vision Systems. pp. 195- 204. Springer, Cham (2017)
|
[3] |
Mittal, G. , Yagnik, K.B. , Garg, M. , Krishnan, N.C. : Spotgarbage: smartphone app to detect garbage using deep learning. In: Proceedings of 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. pp. 940- 945. ACM, Heidelberg (2016)
|
[4] |
Balchandani, C. , Hatwar, R.K. , Makkar, P. , Shah, Y. , Eirinaki, M. : A deep learning framework for smart street cleaning. In: Proceedings of IEEE Third International Conference on Big Data Computing Service and Applications. pp. 112- 117. CA, San Francisco (2017)
|
[5] |
Zeng, D. , Zhang, S. , Chen, F. , Wang, Y. : Multi-scale cnn based garbage detection of airborne hyperspectral data. IEEE Access Pract. Innov. Open Solut. 7, 104514- 104527 (2019)
|
[6] |
Wang, T. , Cai, Y. , Liang, L. , Ye, D. : A multi-level approach to waste object segmentation. Sensors (Basel) 20 (14), 3816 (2020)
|
[7] |
Proença, P.F. , Simões, P. : TACO: trash annotations in context for litter detection. arXiv preprint arXiv: 2003.06975 (2020)
|
[8] |
Ping, P. , Xu, G. , Kumala, E. , Gao, J. : Smart street litter detection and classification based on Faster R-CNN and edge computing. Int. J. Softw. Eng. Knowl. Eng. 30 (04), 537- 553 (2020)
|
[9] |
Yu, F. , Koltun, V. : Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations. pp. 2- 4. ICLR, San Juan (2016)
|
[10] |
Romera, E. , Alvarez, J.M. , Bergasa, L.M. , Arroyo, R. : ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 1, 1- 10 (2017)
|
[11] |
Badrinarayanan, V. , Kendall, A. , Cipolla, R. : SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39 (12), 2481- 2495 (2017)
|
[12] |
Ronneberger, O. , Fischer, P. , Brox, T. : U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham (2015)
|
[13] |
Zhao, H. , Shi, J. , Qi, X. , Wang, X. , Jia, J. : Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 2881- 2890. IEEE, Honolulu (2017)
|
[14] |
Chen, L.C. , Zhu, Y. , Papandreou, G. , Schroff, F. , Adam, H. : Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision (ECCV). pp. 801- 818. Springer, Munich (2018)
|
[15] |
Li, H. , Xiong, P. , An, J. , Wang, L. : Smart street litter detection and classification based on Faster R-CNN and edge computing. arXiv preprint arXiv: 1805.10180 (2018)
|
[16] |
Long, J. , Shelhamer, E. , Darrell, T. : Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431- 3440. IEEE, Boston (2015)
|
[17] |
Milletari, F. , Navab, N. , Ahmadi, S.A. : V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of IEEE 2016 4th International Conference on 3D Vision. pp. 565- 571. IEEE, Stanford (2016)
|
[18] |
Sudre, C.H. , Li, W. , Vercauteren, T. , Ourselin, S. , Cardoso, M.J. : Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Proceedings of Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. pp 240- 248. Springer, Québec (2017)
|
[19] |
Salehi, S.S.M. , Erdogmus, D. , Gholipour, A. : Tversky loss function for image segmentation using 3D fully convolutional deep networks. In: Proceedings of International Workshop on Machine Learning in Medical Imaging. pp. 379- 387. Springer, Québec (2017)
|
[20] |
Shrivastava, A. , Gupta, A. , Girshic, k R. : Training region-based object detectors with online hard example mining. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition. pp. 761- 769. IEEE, Las Vegas (2016)
|
[21] |
Wu, Z. , Shen, C. , Hengel, A.V.D. : High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv: 1604.04339 (2016)
|
[22] |
Snell, J. , Swersky, K. , Zemel, R. : Prototypical networks for fewshot learning. In: Proceedings of Advances in Neural Information Processing Systems, pp. 4077- 4087. MIT Press, Long Beach (2017)
|
[23] |
Zhang, X. , Wei, Y. , Yang, Y. , Huang, T.S. : Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50 (9), 3855- 3865 (2020)
|
[24] |
Wang, K. , Liew, J.H. , Zou, Y. , Zhou, D. Feng J. : PANet: fewshot image semantic segmentation with prototype alignment. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 9196- 9205. IEEE, Seoul (2019)
|
[25] |
Paszke, A. , Chaurasia, A. , Kim, S. , Culurciello, E. : Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv: 1606.02147 (2016)
|
[26] |
Shi, W. , Caballer, J. , Huszár, F. , Totz, J. , Wang, Z. : Real-time single image and video super-resolution using an efficient subpixel convolutional neural network. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1874- 188. IEEE: Las Vegas (2016)
|
[27] |
Zhang, Z. , Zhang, X. , Peng, C. , Xue, X. , Sun, J. : Exfuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision. pp. 269- 284. Springer, Munich (2018)
|
[28] |
Huang, G. , Liu, Z. , Maaten, L.V.D. , Weinberger, K.Q. : Densely connected convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2261- 2269. IEEE, Honolulu (2017)
|
[29] |
Ma, N. , Zhang, X. , Zheng, H.T. , Sun, J. : ShuffleNet V2: practical guidelines for efficient CNN architecture design. arXiv preprint arXiv: 1807.11164 (2018)
|
[30] |
Khoreva, A. , Benenson, R. , Ilg, E. , Brox, T. , Schiele, B. : Lucid data dreaming for video object segmentation. arXiv preprint arXiv: 1703.09554 (2017)
|
[31] |
Bochkovskiy, A. , Wang, C.Y. , Liao, H.Y.M. : Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004.10934 (2020)
|
/
〈 | 〉 |