A novel method for subway wheelset tread defect detection with improved self-attention and loss function
Jindong Wang , Chengsheng Xie , Tao Liu , Haonan Zhou , Ao Li
Railway Engineering Science ›› : 1 -15.
Wheelsets are crucial components of subway locomotives, and defects on their tread surfaces pose significant safety risks. This study presents an enhanced defect detection algorithm based on the YOLOv5 model, specifically designed to identify tread defects in subway wheelsets to meet the demands of intelligent maintenance. To improve the detection of small targets, we incorporate a multi-head self-attention module, which enhances the model’s ability to capture long-range dependencies within global feature maps. Additionally, a weighted bidirectional feature pyramid network is adopted to achieve balanced multi-scale feature fusion, enabling efficient cross-scale integration. To overcome limited labeled data and annotation inaccuracies, we propose a novel loss function (W-MPDIoU) to accelerate model convergence. Experimental results demonstrate that our enhanced model achieves 99.1% average precision—a 4.29% improvement over the original YOLOv5. With reduced parameters and a detection speed of 15 ms per image, the proposed solution enables real-time tread defect detection in subway systems, significantly improving safety and operational efficiency.
YOLOv5 / Defect detection / Self-attention mechanism / Feature pyramid / Loss function
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Ghiasi R, Gordan M, Mosleh A et al (2024) M-CLUSTER: multistage clustering for unsupervised train wheel condition monitoring. Veh Syst Dyn 1–26 |
| [7] |
|
| [8] |
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, pp 779–788 |
| [9] |
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 6517–6525 |
| [10] |
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767 |
| [11] |
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 |
| [12] |
|
| [13] |
|
| [14] |
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition. Columbus, pp 580–587 |
| [15] |
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, Santiago, pp 1440–1448 |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: Computer vision—ECCV 2018. Springer, Cham |
| [26] |
|
| [27] |
|
| [28] |
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 936–944 |
| [29] |
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 8759–8768 |
| [30] |
Pang J, Chen K, Shi J (2019) Libra R-CNN: Towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, Long Beach, pp 821–830, |
| [31] |
Yu J, Jiang Y, Wang Z (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, Amsterdam, pp 516–520 |
| [32] |
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IoU loss. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 6877–6885 |
| [33] |
|
| [34] |
Siliang M, Yong X (2023) Mpdiou: a loss for efficient and accurate bounding box regression. arXiv: 2307.07662 |
| [35] |
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 7132–7141 |
| [36] |
Wang Q, Wu B, Zhu P (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, pp 11534–11542 |
| [37] |
|
| [38] |
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 |
The Author(s)
/
| 〈 |
|
〉 |