Obstacle detection: improved YOLOX-S based on swin transformer-tiny

Hongying Zhang , Chengjian Lu , Enyao Chen

Optoelectronics Letters ›› 2023, Vol. 19 ›› Issue (11) : 698 -704.

PDF
Optoelectronics Letters ›› 2023, Vol. 19 ›› Issue (11) : 698 -704. DOI: 10.1007/s11801-023-3018-9
Article

Obstacle detection: improved YOLOX-S based on swin transformer-tiny

Author information +
History +
PDF

Abstract

Aiming at the accuracy challenge in obstacle detection for autonomous driving, we propose an improved you only look once X-S (YOLOX-S) model based on swin transformer-tiny YOLOX-S (ST-YOLOX-S) for obstacle detection, which could detect multiple targets, including people, cars, bicycles, motorcycles, and buses. Our method mainly comprises two aspects as follows. To improve the capability of local feature extraction and then obtain more accurate detection for obstacles under real-world vehicle conditions, the existing backbone of YOLOX-S is replaced with the swin transformer-tiny backbone. We reduced the number of channels between the swin transformer and path aggregation-feature pyramid network (PA-FPN) from [96, 192, 384, 768] to [192, 384, 768], to decrease the computational cost and then make the swin transformer-tiny more compatible with the PA-FPN. Conclusively, on the popular COCO dataset, the proposed ST-YOLOX-S improves the detection mean average precision (mAP) by 6.1% when compared with YOLOX-S. Among the five types of obstacles that appear in simulated actual vehicle conditions, our ST-YOLOX-S also achieves superior performance compared to YOLOX-S. Furthermore, our method achieves significant performance over the YOLOv3 on obstacle detection, which shows the effectiveness of the proposed algorithm.

Cite this article

Download citation ▾
Hongying Zhang, Chengjian Lu, Enyao Chen. Obstacle detection: improved YOLOX-S based on swin transformer-tiny. Optoelectronics Letters, 2023, 19(11): 698-704 DOI:10.1007/s11801-023-3018-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ZhangX, ZhouM, QiuP, et al.. Radar and vision fusion for real-time obstacle detection and identification[J]. Industrial robot: the international journal of robotics research and application, 2019, 46(3):391-395

[2]

REDMON J, FARHADI A. Yolov3: an incremental improvement[EB/OL]. (2018-04-08) [2023-01-22]. https://arxiv.org/abs/1804.02767.

[3]

BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. (2020-06-05) [2023-01-22]. https://github.com/kiccho1101/paper/issues/27.

[4]

HeK, ZhangX, RenS, et al.. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9):1904-1916

[5]

LiuS, QiL, QinH, et al.. Path aggregation network for instance segmentation[C], 2018, New York, IEEE: 8759-8768

[6]

GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-08-06) [2023-01-22]. https://arxiv.org/abs/2107.08430.

[7]

JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytics/YOLOv5: v5.0-YOLOv5-P6 1280 models AWS supervisely and youtube integrations[J]. Zenodo, 2021, 11.

[8]

WangC Y, LiaoH Y M, WuY H, et al.. CSPNet: a new backbone that can enhance learning capability of CNN[C], 2020, IEEE, New York: 390-391

[9]

VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[10]

DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2010-11-09) [2023-01-22]. https://arxiv.org/pdf/2010.11929.pdf.

[11]

LiuZ, LinY, CaoY, et al.. Swin transformer: hierarchical vision transformer using shifted windows[C], 2021, IEEE, New York: 10012-10022

[12]

GrigorescuS, TrasneaB, CociasT, et al.. A survey of deep learning techniques for autonomous driving[J]. Journal of field robotics, 2020, 37(3):362-386

[13]

HeK, GkioxariG, DollarP, et al.. Mask R-CNN[C], 2017, New York, IEEE: 2961-2969

[14]

LinG, MilanA, ShenC, et al.. Refinenet: multipath refinement networks for high-resolution semantic segmentation[C], 2017, IEEE, New York: 1925-1934

[15]

REN S, HE K, GIRSHICK R, et al. Faster R-CNN towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.

[16]

LinT Y, MaireM, BelongieS, et al.. Microsoft coco: common objects in context[C], 2014, Berlin, Springer, Cham: 740-755

[17]

JOCHER G, KWON Y, VEITCH-MICHAELIS J, et al. Ultralytics/YOLOv3: v9.5.0-YOLOv5 v5.0 release compatibility update for yolov3[J]. Zenodo, 2021.

[18]

ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. (2017-10-25) [2023-01-22]. https://arxiv.org/abs/1710.09412.

[19]

GeZ, LiuS T, LiZ M, et al.. OTA: optimal transport assignment for object detection[C], 2021, IEEE, New Yorkvirtual

[20]

KnightP A. The sinkhorn-knopp algorithm: convergence and applications[J]. SIAM journal on matrix analysis and applications, 2008, 30(1):261-275

[21]

LiuW, AnguelovD, ErhanD, et al.. SSD: single shot multibox detector[C], 2016, Berlin, Springer International Publishing: 21-37

[22]

LinT Y, GoyalP, GirshickR, et al.. Focal loss for dense object detection[C], 2017, IEEE, New York: 2980-2988

[23]

CaiZ, VasconcelosN. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(5):1483-1498

[24]

LawH, DengJ. Cornernet: detecting objects as paired keypoints[C], 2018, Berlin, Springer International Publishing: 734-750

[25]

LuX, LiB, YueY, et al.. Grid R-CNN[C], 2019, IEEE, New York: 7363-7372

[26]

SunP, ZhangR, JiangY, et al.. Sparse R-CNN: end-to-end object detection with learnable proposals[C], 2021, New York, IEEE: 14454-14463virtual

AI Summary AI Mindmap
PDF

140

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/