A novel method for subway wheelset tread defect detection with improved self-attention and loss function

Jindong Wang , Chengsheng Xie , Tao Liu , Haonan Zhou , Ao Li

Railway Engineering Science ›› : 1 -15.

PDF
Railway Engineering Science ›› :1 -15. DOI: 10.1007/s40534-025-00415-2
Article
research-article

A novel method for subway wheelset tread defect detection with improved self-attention and loss function

Author information +
History +
PDF

Abstract

Wheelsets are crucial components of subway locomotives, and defects on their tread surfaces pose significant safety risks. This study presents an enhanced defect detection algorithm based on the YOLOv5 model, specifically designed to identify tread defects in subway wheelsets to meet the demands of intelligent maintenance. To improve the detection of small targets, we incorporate a multi-head self-attention module, which enhances the model’s ability to capture long-range dependencies within global feature maps. Additionally, a weighted bidirectional feature pyramid network is adopted to achieve balanced multi-scale feature fusion, enabling efficient cross-scale integration. To overcome limited labeled data and annotation inaccuracies, we propose a novel loss function (W-MPDIoU) to accelerate model convergence. Experimental results demonstrate that our enhanced model achieves 99.1% average precision—a 4.29% improvement over the original YOLOv5. With reduced parameters and a detection speed of 15 ms per image, the proposed solution enables real-time tread defect detection in subway systems, significantly improving safety and operational efficiency.

Keywords

YOLOv5 / Defect detection / Self-attention mechanism / Feature pyramid / Loss function

Cite this article

Download citation ▾
Jindong Wang, Chengsheng Xie, Tao Liu, Haonan Zhou, Ao Li. A novel method for subway wheelset tread defect detection with improved self-attention and loss function. Railway Engineering Science 1-15 DOI:10.1007/s40534-025-00415-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Cong T, Jiang B, Zhang G, et al.. Study on failure behavior of EMU wheels. Adv Mater High Speed Railw, 2022, 1(1): 53-58

[2]

Zhang B, Fu X. Type and formation mechanism of railway wheel and tire tread spall. China Railw Sci, 2001, 22(2): 73-78(in Chinese)

[3]

Li Y, Zuo M, Lin J, et al.. Fault detection method for railway wheel flat using an adaptive multiscale morphological filter. Mech Syst Signal Process, 2017, 84: 642-658

[4]

Ni Y, Zhang Q. A bayesian machine learning approach for online detection of railway wheel defects using track-side monitoring. Struct Health Monit, 2020, 20(4): 1536-1550

[5]

Mosleh A, Meixedo A, Ribeiro D, et al.. Automatic clustering-based approach for train wheels condition monitoring. Int J Rail Transp, 2023, 11(5): 639-664

[6]

Ghiasi R, Gordan M, Mosleh A et al (2024) M-CLUSTER: multistage clustering for unsupervised train wheel condition monitoring. Veh Syst Dyn 1–26

[7]

Ali SG, Wang X, Li P, et al.. EGDNet: an efficient glomerular detection network for multiple anomalous pathological feature in glomerulonephritis. Vis Comput, 2025, 41(4): 2817-2834

[8]

Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, pp 779–788

[9]

Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 6517–6525

[10]

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

[11]

Bochkovskiy A, Wang C Y, Liao H Y M (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934

[12]

Liu W, Anguelov D, Erhan D, et al.. SSD: Single shot multibox detector. Lect Notes Comput Sci, 2016, 9905: 21-37

[13]

Wang D, Zhang B, Cao Y. SFSSD: shallow feature fusion single shot multibox detector. Lect Notes Electr Eng, 2020, 571: 2590-2598

[14]

Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition. Columbus, pp 580–587

[15]

Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, Santiago, pp 1440–1448

[16]

Ren S, He K, Girshick R, et al.. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2016, 39(6): 1137-1149

[17]

Jin K, Wang J, Chen S. A method of locating and measuring train wheel tread defects based on YOLOv3-tiny. Control Inf Technol, 2022, 2: 69-75

[18]

Yang H, He J, Liu Z, et al.. LLD-MFCOS: a multiscale anchor-free detector based on label localization distillation for wheelset tread defect detection. IEEE Trans Instrum Meas, 2023, 73: 5003815

[19]

Shi T, Ding Y, Zhu KF, et al.. DFP-YOLO: a lightweight machine tool workpiece defect detection algorithm based on computer vision. Vis Comput, 2025, 41(7): 5029-5041

[20]

Zhang M, Tian X. Transformer architecture based on mutual attention for image-anomaly detection. Virt Real Intell Hardw, 2023, 5(157-67

[21]

He Y, Wu J, Zheng Y, et al.. Track defect detection for high-speed maglev trains via deep learning. IEEE Trans Instrum Meas, 2022, 71 3506008

[22]

Zhou C, Li D, Wang P, et al.. Acr-net: attention integrated and cross-spatial feature fused rotation network for tubular solder joint detection. IEEE Trans Instrum Meas, 2021, 70 5012512

[23]

Chen R, Cai D, Hu X, et al.. Defect detection method of aluminum profile surface using deep self-attention mechanism under hybrid noise conditions. IEEE Trans Instrum Meas, 2021, 70 3524509

[24]

Zhu M, Jiao LC, Liu F. Residual spectral–spatial attention network for hyperspectral image classification. IEEE Trans Geosci Remote Sens, 2021, 59(1): 449-462

[25]

Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: Computer vision—ECCV 2018. Springer, Cham

[26]

Liu S, Lei Y, Zhang L, et al.. MRDDANet: a multiscale residual dense dual attention network for SAR image denoising. IEEE Trans Geosci Remote Sens, 2021, 60: 5214213

[27]

Qiu J, Gao Y, Shen M. Semantic-SCA: semantic structure image inpainting with the spatial-channel attention. IEEE Access, 2021, 9: 12997-13008

[28]

Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 936–944

[29]

Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 8759–8768

[30]

Pang J, Chen K, Shi J (2019) Libra R-CNN: Towards balanced learning for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition, Long Beach, pp 821–830,

[31]

Yu J, Jiang Y, Wang Z (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, Amsterdam, pp 516–520

[32]

Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IoU loss. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 6877–6885

[33]

Zheng Z, Wang P, Liu W, et al.. Distance-iou loss: faster and better learning for bounding box regression. Proc AAAI Conf Artif Intell, 2020, 34(7): 12993-13000

[34]

Siliang M, Yong X (2023) Mpdiou: a loss for efficient and accurate bounding box regression. arXiv: 2307.07662

[35]

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. Salt Lake City, pp 7132–7141

[36]

Wang Q, Wu B, Zhu P (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, pp 11534–11542

[37]

Dai L, Liu J, Ju Z. Binocular feature fusion and spatial attention mechanism based gaze tracking. IEEE Trans Hum Mach Syst, 2022, 52(2): 302-311

[38]

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

RIGHTS & PERMISSIONS

The Author(s)

PDF

19

Accesses

0

Citation

Detail

Sections
Recommended

/