Fusion network for small target detection based on YOLO and attention mechanism

Caie Xu; Zhe Dong; Shengyun Zhong; Yijiang Chen; Sishun Pan; Mingyang Wu

doi:10.1007/s11801-024-3177-3

Optoelectronics Letters ›› 2024, Vol. 20 ›› Issue (6) : 372-378. DOI: 10.1007/s11801-024-3177-3

Article

Fusion network for small target detection based on YOLO and attention mechanism

Caie Xu¹^,²^,³^,^a ,
Zhe Dong¹ ,
Shengyun Zhong¹ ,
Yijiang Chen¹ ,
Sishun Pan¹ ,
Mingyang Wu¹

Author information +

History +

Abstract

Target detection is an important task in computer vision research, and such an anomaly detection and the topic of small target detection task is more concerned. However, there are still some problems in this kind of researches, such as small target detection in complex environments is susceptible to background interference and poor detection results. To solve these issues, this study proposes a method which introduces the attention mechanism into the you only look once (YOLO) network. In addition, the amateur-produced mask dataset was created and experiments were conducted. The results showed that the detection effect of the proposed mothed is much better.

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Caie Xu, Zhe Dong, Shengyun Zhong, Yijiang Chen, Sishun Pan, Mingyang Wu. Fusion network for small target detection based on YOLO and attention mechanism. Optoelectronics Letters, 2024, 20(6): 372‒378 https://doi.org/10.1007/s11801-024-3177-3

References

Publishing order | Descend order by publishing year | Descend order by cited within

[[1]]

KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017: 84–90.

[[2]]

REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-06-24]. https://arxiv.org/abs/1804.02767.

[[3]]

Lin

T Y

, Dollar

, Girshick

, et al.. . Feature pyramid networks for object detection[C], 2017 New York IEEE 936-944

[[4]]

BOCHKOVSKIY A, WANG C Y, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2023-06-24]. https://arxiv.org/abs/2004.10934.

[[5]]

Viola

, Jones

. . Rapid object detection using a boosted cascade of simple features[C], 2001 New York IEEE 990517

[[6]]

Felzenszwalb

P F

, Girshick

R B

, Mcallester

, et al.. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(9): 1627-1645,

CrossRef Google scholar

[[7]]

Girshick

, Donahue

, Darrell

, et al.. . Rich feature hierarchies for accurate object detection and semantic segmentation[C], 2014 New York IEEE 81

[[8]]

, Zhang

, Ren

, et al.. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916,

CrossRef Google scholar

[[9]]

Girshick

. . Fast R-CNN[C], 2015 New York IEEE 169

[[10]]

Ren

, He

, Girshick

, et al.. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149,

CrossRef Google scholar

[[11]]

, Gkioxari

, Dollar

, et al.. Mask R-CNN[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 386-397,

CrossRef Google scholar

[[12]]

Carion

, Massa

, Synnaeve

, et al.. . End-to-end object detection with transformers[C], 2020 Berlin, Heidelberg Springer 213-229

[[13]]

DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2023-06-24]. https://arxiv.org/abs/2010.11929v1.

[[14]]

, Shen

, Albanie

, et al.. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 2011-2023,

CrossRef Google scholar

[[15]]

Woo

, Park

, Lee

J Y

, et al.. . CBAM: convolutional block attention module[C], 2018 Berlin, Heidelberg Springer 3-19

[[16]]

Hou

, Zhou

, Feng

. . Coordinate attention for efficient mobile network design[C], 2021 New York IEEE 01350

[[17]]

VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Neural information processing systems, neural information processing systems, 2017: 30.

[[18]]

RAMACHANDRAN P, ZOPH B, LE Q. Searching for activation functions[EB/OL]. (2017-10-16) [2023-06-24]. https://arxiv.org/abs/1710.05941v2.