To address challenges in pedestrian detection within dense scenes, including high crowd density, severe occlusion, and overlapping individuals, an improved you only look once (YOLO)-based algorithm is proposed. First, deformable convolutions are employed to replace standard convolutions, enhancing the model’s adaptability to variations in shape and appearance under occlusions. Second, a multi-dimensional attention module is designed to emphasize critical local regions and extract more precise feature information. Lastly, a diagonal difference intersection-over-union (IoU) loss function is introduced, which incorporates a measure of the Euclidean distance difference between the main diagonal points of predicted and ground truth bounding boxes, thereby enhancing detection accuracy and regression performance. Experimental results demonstrate that the enhanced algorithm achieves a mean average precision at IoU=0.5 (mAP50) of 75.1% on the public dense pedestrian dataset WiderPerson, an improvement of 1.8% over the original YOLOv5 model, showcasing superior detection performance.
| [1] |
Zou Z X, Chen K Y, Shi Z W, et al. . Object detection in 20 years: a survey. Proceedings of the IEEE. 2023, 111(3): 257-276 J]
|
| [2] |
He H Y, Li Z S, Tian G Z, et al. . Towards accurate dense pedestrian detection via occlusion-prediction aware label assignment and hierarchical-NMS. Pattern recognition letters. 2023, 174: 78-84 J]
|
| [3] |
Zhang H X, Yang X F, Hu Z Y, et al. . High-density pedestrian detection algorithm based on deep information fusion. Applied intelligence. 2022, 52(13): 15483-15495 J]
|
| [4] |
Liu W, Anguelov D, Erhan D, et al. . SSD: single shot MultiBox detector. 14th European Conference on Computer Vision, October 11–14, 2016, Amsterdam, The Netherlands. 2016, Cham, Springer: 2137[C]
|
| [5] |
Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17–24, 2023, Vancouver, BC, Canada. 2023, New York, IEEE: 74647475[C]
|
| [6] |
Ren S Q, He K M, Girshick R, et al. . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis & machine intelligence. 2017, 39(6): 1137-1149 J]
|
| [7] |
He K M, Gkioxari G, Dollár P, et al. . Mask R-CNN. IEEE transactions on pattern analysis and machine intelligence. 2020, 42(2): 386-397 J]
|
| [8] |
Dong X H, Li S X, Zhang J X. YOLOV5s object detection based on Sim SPPF hybrid pooling. Optoelectronics letters. 2024, 20(6): 367-371 J]
|
| [9] |
Lou H T, Duan X H, Guo J M, et al. . DC-YOLOv8: small-size object detection algorithm based on camera sensor. Electronics. 2023, 12(10): 2323 J]
|
| [10] |
Liu P Y, Ma Y X. Key points and visible part fusion attention network for occluded pedestrian detection in traffic environments. Optoelectronics letters. 2024, 20(7): 430-436 J]
|
| [11] |
Wang C Y, Liao H Y M, Wu Y H, et al. . CSPNet: a new backbone that can enhance learning capability of CNN. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 14–19, 2020, Seattle, WA, USA. 2020, New York, IEEE: 15711580[C]
|
| [12] |
Dai J F, Qi H Z, Xiong Y W, et al. . Deformable convolutional networks. IEEE International Conference on Computer Vision, October 22–29, 2017, Venice, Italy. 2017, New York, IEEE: 764773[C]
|
| [13] |
Liu Q L, Ye H X, Wang S M, et al. . YOLOv8-CB: dense pedestrian detection algorithm based on in-vehicle camera. Electronics. 2024, 13(1): 236 J]
|
| [14] |
Li M J, Chen S, Sun C, et al. . An improved lightweight dense pedestrian detection algorithm. Applied sciences. 2023, 13(15): 8757 J]
|
| [15] |
Aboli M, Rahee W, Ketan K. Evaluating the performance of ensemble methods and voting strategies for dense 2D pedestrian detection in the wild. IEEE/CVF International Conference on Computer Vision Workshops, October 11–17, 2021, Montreal, BC, Canada. 2021, New York, IEEE: 3575-3584[C]
|
| [16] |
Peng H, Chen S Q. FedsNet: the real-time network for pedestrian detection based on RT-DETR. Journal of real-time image processing. 2024, 21(4): 142 J]
|
| [17] |
Lu F, Xu Y C, Qi Y, et al. . BSMH: cross-dataset object detection based on box-separated multiple-head. IET image processing. 2024, 18(11): 3013-3027 J]
|
| [18] |
Chen W Z, Wu W, Dai W T, et al. . EAAnet: efficient attention and aggregation network for crowd person detection. Applied sciences. 2024, 14(19): 8692 J]
|
| [19] |
Xu H Y, Chen Q P, Liu Y H, et al. . Non-parametric diagnostic classification with multiple attributes: a comparison of 18 distance discrimination methods. Psychological science. 2023, 46(6): 1486-1494[J]
|
RIGHTS & PERMISSIONS
Tianjin University of Technology