YOLO-SDLUWD: YOLOv7-based small target detection network for infrared images in complex backgrounds

Zhu Jinxiu , Qin Chao , Choi Dongmin

›› 2025, Vol. 11 ›› Issue (2) : 269 -279.

PDF
›› 2025, Vol. 11 ›› Issue (2) : 269 -279. DOI: 10.1016/j.dcan.2023.11.001
Original article

YOLO-SDLUWD: YOLOv7-based small target detection network for infrared images in complex backgrounds

Author information +
History +
PDF

Abstract

Infrared small-target detection has important applications in many fields due to its high penetration capability and detection distance. This study introduces a detector called “YOLO-SDLUWD” which is based on the YOLOv7 network, for small target detection in complex infrared backgrounds. The “SDLUWD” refers to the combination of the Spatial Depth layer followed Convolutional layer structure (SD-Conv) and a Linear Up-sampling fusion Path Aggregation Feature Pyramid Network (LU-PAFPN) and a training strategy based on the normalized Gaussian Wasserstein Distance loss (WD-loss) function. “YOLO-SDLUWD” aims to reduce detection accuracy when the maximum pooling downsampling layer in the backbone network loses important feature information, support the interaction and fusion of high-dimensional and low-dimensional feature information, and overcome the false alarm predictions induced by noise in small target images. The detector achieved a mAP@0.5 of 90.4% and mAP@0.5:0.95 of 48.5% on IRIS-AG, an increase of 9%-11% over YOLOv7-tiny, outperforming other state-of-the-art target detectors in terms of accuracy and speed.

Keywords

Small infrared target detection / YOLOv7 / SD-Conv / LU-PAFPN / WD-loss

Cite this article

Download citation ▾
Zhu Jinxiu, Qin Chao, Choi Dongmin. YOLO-SDLUWD: YOLOv7-based small target detection network for infrared images in complex backgrounds. , 2025, 11(2): 269-279 DOI:10.1016/j.dcan.2023.11.001

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Jinxiu Zhu: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Chao Qin: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Dongmin Choi: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft.

Declaration of Competing Interest

All authors disclosed no relevant relationships.

Acknowledgement

This work was supported by the National Key R&D Program “Development and Application Verification of Underwater Intelligent Defect Detection Robot System for Large Hydropower Station Dams” (Project No. 2022YFB4703400) sub-topic 4 “Research on Intelligent Identification and Diagnosis of Dam Defects and Fine Inspection Equipment and Technology of Hydropower Stations” (Project No. 2022YFB4703404)

This work was supported in part by the National Natural Science Foundation of China under Grant 62371181, and in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029.

References

[1]

W. Tang, Y. Zheng, R. Lu, X. Huang, A Novel Infrared Dim Small Target Detection Algorithm Based on Frequency Domain Saliency, 2016, pp. 1053-1057.

[2]

Q. Shi, C. Zhang, Z. Chen, F. Lu, L. Ge, S. Wei, An infrared small target detec-tion method using coordinate attention and feature fusion, Infrared Phys. Technol. (2023) 104614.

[3]

Y. Jing, C. Yuhua, Y. Yupeng, L. Xiaofei, Z. Zuwei, X. Ming, W. Dengpan, M. Jiang-dong, M. Yong, Z. Yuzhe, Design and optimization of an integrated mems gas chamber with high transmissivity, Digit. Commun. Netw. 7 (1) (2021) 82-91.

[4]

Chuhuan Liu, Yi Zhang, Guohang Niu, Luliang Jia, Liang Xiao, Jiangxia Luan, Towards reinforcement learning in UAV relay for anti-jamming maritime communi-cations, Digit. Commun. Netw. 9 (6) (2023) 1477-1485.

[5]

J. Yang, T. Chen, B. Payne, P. Guo, Y. Zhang, J. Guo, Generating routes for autonomous driving in vehicle-to-infrastructure communications, Digit. Commun. Netw. 6 (4) (2020) 444-451.

[6]

G.H. Beckman, D. Polyzois, Y.-J. Cha, Deep learning-based automatic volumetric damage quantification using depth camera, Autom. Constr. 99 (2019) 114-124.

[7]

P.B. Chapple, D.C. Bertilone, R.S. Caprari, S. Angeli, G.N. Newsam,Target detection in infrared and SAR terrain images using a non-Gaussian stochastic model, Proc. SPIE 3699 (1999) 122-132.

[8]

J. Zhang, M. Shao, L. Yu, Y. Li, Image super-resolution reconstruction based on sparse representation and deep learning, Signal Process. Image Commun. 87 (2020) 115925.

[9]

C.-Y. Wang, A. Bochkovskiy, H.-Y.M. Liao,Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, arXiv preprint, arXiv :2207.02696, 2022.

[10]

J. Wang, C. Xu, W. Yang, L. Yu, A normalized Gaussian Wasserstein distance for tiny object detection, arXiv preprint, arXiv :2110.13389, 2021.

[11]

Z. Ai-gang, W. Hong-li, Y. Xiao-gang, L. Jing-hui, H. Peng-jie, Infrared small tar-get detection method based on nonlinear local filter, Chin. J. Eng. 38 (11) (2016) 1652-1658.

[12]

M.M. Hadhoud, D.W. Thomas, The two-dimensional adaptive LMS (TDLMS) algo-rithm, IEEE Trans. Circuits Syst. 35 (5) (1988) 485-494.

[13]

X. Shen, J. Yang, C. Wei, B. Deng, J. Huang, X.-S. Hua, X. Cheng, K. Liang, Dct- Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation, 2021, pp. 8720-8729.

[14]

H. Liang, S.L. Bressler, R. Desimone, P. Fries, Empirical mode decomposition: a method for analyzing neural data, in: Computational Neuroscience: Trends in Re-search 2005, Neurocomputing 65-66 ( 2005) 801-807.

[15]

Y. Qian, Q. Chen, G. Zhu, G. Gu, J. Xiao, W. Qian, K. Ren, M. Wan, X. Zhou, In-frared small target detection based on saliency and gradients difference measure, Opt. Quantum Electron. 52 (2020), https://doi.org/10.1007/s11082-020-2197-x.

[16]

Y. He, M. Li, J. Zhang, Q. An, Small infrared target detection based on low-rank and sparse representation, Infrared Phys. Technol. 68 (2015) 98-109.

[17]

C.P. Chen, H. Li, Y. Wei, T. Xia, Y.Y. Tang, A local contrast method for small infrared target detection, IEEE Trans. Geosci. Remote Sens. 52 (1) (2013) 574-581.

[18]

J. Han, K. Liang, B. Zhou, X. Zhu, J. Zhao, L. Zhao, Infrared small target detection utilizing the multiscale relative local contrast measure, IEEE Geosci. Remote Sens. Lett. 15 (4) (2018) 612-616.

[19]

S. Yao, Y. Chang, X. Qin, A Coarse-to-Fine Method for Infrared Small Target Detec-tion, vol. 16, IEEE, 2018, pp. 256-260.

[20]

Y. Wei, X. You, H. Li, Multiscale patch-based contrast measure for small infrared target detection, Pattern Recognit. 58 (10) (2016) 216-226.

[21]

Z. Fan, D. Bi, L. Xiong, S. Ma, L. He, W. Ding, Dim infrared image enhancement based on convolutional neural network, Neurocomputing 272 (2018) 396-404.

[22]

Q. Hou, Z. Wang, F. Tan, Y. Zhao, H. Zheng, W. Zhang, Ristdnet: robust infrared small target detection network, IEEE Geosci. Remote Sens. Lett. 19 (2021) 1-5.

[23]

Q. Hou, L. Zhang, F. Tan, Y. Xi, H. Zheng, N. Li, ISTDU-Net: infrared small-target detection U-Net, IEEE Geosci. Remote Sens. Lett. 19 (2022) 1-5.

[24]

Y. Dai, Y. Wu, F. Zhou, K. Barnard, Attentional Local Contrast Networks for Infrared Small Target Detection, vol. 59, IEEE, 2021, pp. 9813-9824.

[25]

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 779-788.

[26]

J. Redmon, A. Farhadi, YOLO9000: Better, Faster, Stronger, 2017, pp. 7263-7271.

[27]

J. Redmon, A. Farhadi,Yolov3: an incremental improvement, arXiv preprint, arXiv : 1804.02767, 2018.

[28]

A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao,Yolov4: optimal speed and accuracy of object detection, arXiv preprint, arXiv :2004.10934, 2020.

[29]

X. Zhu, S. Lyu, X. Wang, Q. Zhao, TPH-YOLOv5: Improved YOLOV5 Based on Trans-former Prediction Head for Object Detection on Drone-Captured Scenarios, 2021, pp. 2778-2788.

[30]

A. Benjumea, I. Teeti, F. Cuzzolin, A. Bradley,YOLO-Z: improving small object detection in YOLOv5 for autonomous vehicles, arXiv preprint, arXiv :2112.11798, 2021.

[31]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.

[32]

G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolu-tional networks, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700-4708.

[33]

M. Tan, R. Pang, Q.V. Le, EfficientDet: scalable and efficient object detection,in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10781-10790.

[34]

R. Li, Y. Shen, YOLOSR-IST: a deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO, Signal Process. 208 (2023) 108962.

[35]

Q. Hou, D. Zhou, J. Feng,Coordinate attention for efficient mobile network de-sign, in:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13713-13722.

[36]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: hierarchical vision transformer using shifted windows,in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10012-10022.

[37]

R. Sunkara, T. Luo, No more strided convolutions or pooling: a new cnn build-ing block for low-resolution images and small objects,in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 443-459.

[38]

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117-2125.

[39]

S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmen-tation, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759-8768.

[40]

G. Ghiasi, T.-Y. Lin, Q.V. Le, NAS-FPN: learning scalable feature pyramid architec-ture for object detection,in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7029-7038, https://doi.org/10.1109/CVPR.2019.00720.

[41]

J. Shen, Y. Qu, W. Zhang, Y. Yu, Wasserstein distance guided representation learning for domain adaptation, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2962-2971.

[42]

J. Zhu, L. Meng, W. Wu, D. Choi, J. Ni, Generative adversarial network-based at-mospheric scattering model for image dehazing, Digit. Commun. Netw. 7 (2) (2021) 178-186.

[43]

M. Arjovsky, L. Bottou,Towards principled methods for training generative adver-sarial networks, arXiv preprint, arXiv :1701.04862, 2017.

[44]

M. Arjovsky, S. Chintala, L. Bottou,Wasserstein generative adversarial networks, in:Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 214-223.

[45]

R. Fu, H. Fan, Y. Zhu, B. Hui, Z. Zhang, P. Zhong, D. Li, S. Zhang, G. Chen, L. Wang, A Dataset for Infrared Time-Sensitive Target Detection and Tracking for Air-Ground Application, May 2022, https://doi.org/10.11922/sciencedb.j00001.00331.

[46]

B. Hui, Z. Song, H. Fan, P. Zhong, W. Hu, X. Zhang, J. Lin, H. Su, W. Jin, Y. Zhang, Y. Bai, A Dataset for Infrared Image Dim-Small Aircraft Target Detection and Tracking Under Ground / Air Background, Oct. 2019, https://doi.org/10.11922/sciencedb.902.

[47]

D.P. Kingma, J. Ba,Adam: A method for stochastic optimization, arXiv preprint, arXiv :1412.6980, 2014.

AI Summary AI Mindmap
PDF

1317

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/