Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion
Mingdi HU, Long BAI, Jiulun FAN, Sirui ZHAO, Enhong CHEN
Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion
Vehicle Color Recognition (VCR) plays a vital role in intelligent traffic management and criminal investigation assistance. However, the existing vehicle color datasets only cover 13 classes, which can not meet the current actual demand. Besides, although lots of efforts are devoted to VCR, they suffer from the problem of class imbalance in datasets. To address these challenges, in this paper, we propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion (SMNN-MSFF). Specifically, to construct the benchmark of model training and evaluation, we first present a new VCR dataset with 24 vehicle classes, Vehicle Color-24, consisting of 10091 vehicle images from a 100-hour urban road surveillance video. Then, to tackle the problem of long-tail distribution and improve the recognition performance, we propose the SMNN-MSFF model with multi-scale feature fusion and smooth modulation. The former aims to extract feature information from local to global, and the latter could increase the loss of the images of tail class instances for training with class-imbalance. Finally, comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that our proposed SMNN-MSFF outperformed state-of-the-art VCR methods. And extensive ablation studies also demonstrate that each module of our method is effective, especially, the smooth modulation efficiently help feature learning of the minority or tail classes. Vehicle Color-24 and the code of SMNN-MSFF are publicly available and can contact the author to obtain.
vehicle color recognition / benchmark dataset / multi-scale feature fusion / long-tail distribution / improved smooth l1 loss
Mingdi Hu, doctor, associate professor. She obtained doctor of science degree from school of mathematics and statistics at Shaanxi Normal University, China. Her research interests include image recognition, target retrieval and classification, data enhancement, machine learning, artificial intelligence and fuzzy information processing
Long Bai, master student of Xi’an University of Posts and Telecommunications, China. He received the BS degree in communication engineering from Xi’an University of Science and Technology, China. His research interests include machine learning, deep neural network, image target recognition and artificial intelligence
Jiulun Fan, doctor, professor. He graduated from Xidian University, China, majoring in signal and information processing, and obtained doctor degree in engineering. His research interests include pattern recognition and image processing, fuzzy information processing theory and application, image security technology
Sirui Zhao, doctor student of University of Science and Technology of China, China. His research interests include human-computer interaction, affective computing, computer vision and knowledge representation. He has published several papers in refereed conferences and journals, such as ACM MM2021, Neural Networks
Enhong Chen, doctor, professor of University of Science and Technology of China, China. He is CCF Fellow, IEEE Senior Member. His research interests includes data mining and machine learning, especially social network analysis and recommender systems. He has published more than 200 papers in refereed conferences and journals, such as TKED, KDD, ICDM, NIPS
[1] |
Ke X, Zhang Y F . Fine-grained vehicle type detection and recognition based on dense attention network. Neurocomputing, 2020, 399: 247–257
|
[2] |
Tariq A, Khan M Z, Khan M U G. Real time vehicle detection and colour recognition using tuned features of faster-RCNN. In: Proceedings of the 1st International Conference on Artificial Intelligence and Data Analytics. 2021, 262–267
|
[3] |
Chen P, Bai X, Liu W Y . Vehicle color recognition on urban road by feature context. IEEE Transactions on Intelligent Transportation Systems, 2014, 15( 5): 2340–2346
|
[4] |
Jeong Y, Park K H, Park D . Homogeneity patch search method for voting-based efficient vehicle color classification using front-of-vehicle image. Multimedia Tools and Applications, 2019, 78( 20): 28633–28648
|
[5] |
Tilakaratna D S B, Watchareeruetai U, Siddhichai S, Natcharapinchai N. Image analysis algorithms for vehicle color recognition. In: Proceedings of 2017 International Electrical Engineering Congress. 2017, 1–4
|
[6] |
Dule E, Gökmen M, Beratoğlu M S. A convenient feature vector construction for vehicle color recognition. In: Proceedings of the 11th WSEAS International Conference on Nural Networks and 11th WSEAS International Conference on Evolutionary Computing and 11th WSEAS International Conference on Fuzzy Systems. 2010, 250–255
|
[7] |
Hu C P, Bai X, Qi L, Chen P, Xue G J, Mei L . Vehicle color recognition with spatial pyramid deep learning. IEEE Transactions on Intelligent Transportation Systems, 2015, 16( 5): 2925–2934
|
[8] |
Rachmadi R F, Purnama I K E. Vehicle color recognition using convolutional neural network. 2015, arXiv preprint arXiv: 1510.07391
|
[9] |
Zhuo L, Zhang Q, Li J F, Zhang J, Li X G, Zhang H . High-accuracy vehicle color recognition using hierarchical fine-tuning strategy for urban surveillance videos. Journal of Electronic Imaging, 2018, 27( 5): 051203
|
[10] |
Fu H Y, Ma H D, Wang G Y, Zhang X M, Zhang Y F . MCFF-CNN: multiscale comprehensive feature fusion convolutional neural network for vehicle color recognition based on residual learning. Neurocomputing, 2020, 395: 178–187
|
[11] |
Nafzi M, Brauckmann M, Glasmachers T. Vehicle shape and color classification using convolutional neural network. 2019, arXiv preprint arXiv: 1905.08612
|
[12] |
Ren S Q, He K M, Girshick R, Sun J . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149
|
[13] |
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: single shot MultiBox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21–37
|
[14] |
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 779–788
|
[15] |
Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection. 2020, arXiv preprint arXiv: 2004.10934
|
[16] |
Tan M X, Pang R M, Le Q V. EfficientDet: scalable and efficient object detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10778–10787
|
[17] |
Zhou X Y, Wang D Q, Krähenbühl P. Objects as points. 2019, arXiv preprint arXiv: 1904.07850
|
[18] |
Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2999–3007
|
[19] |
Tang K H, Huang J Q, Zhang H W. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020
|
[20] |
Wang X G, Bai X, Liu W Y, Latecki L J. Feature context for image classification and object detection. In: Proceedings of the CVPR 2011. 2011, 961–968
|
[21] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 2012, 1106–1114
|
[22] |
Cui Y, Jia M L, Lin T Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269
|
[23] |
Cao K D, Wei C L, Gaidon A, Arechiga N, Ma T Y. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 140
|
[24] |
Ren M Y, Zeng W Y, Yang B, Urtasun R. Learning to reweight examples for robust deep learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018
|
[25] |
Shu J, Xie Q, Yi L X, Zhao Q, Zhou S P, Xu Z B, Meng D Y. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 172
|
[26] |
Jamal M A, Brown M, Yang M H, Wang L Q, Gong B Q. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 7607–7616
|
[27] |
Kang B Y, Xie S N, Rohrbach M, Yan Z C, Gordo A, Feng J S, Kalantidis Y. Decoupling representation and classifier for long-tailed recognition. In: Proceedings of the 8th International Conference on Learning Representations. 2020
|
[28] |
Zhou B Y, Cui Q, Wei X S, Chen Z M. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9716–9725
|
[29] |
Yin X, Yu X, Sohn K, Liu X M, Chandraker M. Feature transfer learning for face recognition with under-represented data. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5697–5706
|
[30] |
Liu J L, Sun Y F, Han C C, Dou Z P, Li W H. Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 2967–2976
|
[31] |
Liu Z W, Miao Z Q, Zhan X H, Wang J Y, Gong B Q, Yu S X. Large-scale long-tailed recognition in an open world. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2532–2541
|
[32] |
Chu P, Bian X, Liu S P, Ling H B. Feature space augmentation for long-tailed data. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 694–710
|
[33] |
Menon A K, Jayasumana S, Rawat A S, Jain H, Veit A, Kumar S. Long-tail learning via logit adjustment. In: Proceedings of the International Conference on Learning Representations. 2020
|
[34] |
Xiang L Y, Ding G G, Han J G. Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 247–263
|
[35] |
Li Y, Wang T, Kang B Y, Tang S, Wang C F, Li J T, Feng J S. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10988–10997
|
[36] |
Wang X D, Lian L, Miao Z Q, Liu Z W, Yu S X. Long-tailed recognition by routing diverse distribution-aware experts. 2021, arXiv preprint arXiv: 2010.01809
|
[37] |
Xue X Q, Ding J K, Shi Y J. Research and application of illumination processing method in vehicle color recognition. In: Proceedings of the 3rd IEEE International Conference on Computer and Communications. 2017, 1662–1666
|
[38] |
Seifert C, Aamir A, Balagopalan A, Jain D, Sharma A, Grottel S, Gumhold S. Visualizations of deep neural networks in computer vision: a survey. In: Cerquitelli T, Quercia D, Pasquale F, eds. Transparent Data Mining for Big and Small Data. Cham: Springer, 2017, 123–144
|
[39] |
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
|
[40] |
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 936–944
|
/
〈 | 〉 |