Bi2F-YOLO: a novel framework for underwater object detection based on YOLOv7
Xiaopeng Liu , Keke Zhao , Cong Liu , Long Chen
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 9
Bi2F-YOLO: a novel framework for underwater object detection based on YOLOv7
Underwater object detection faces significant challenges, including ambiguity and occlusion, which greatly undermine the accuracy of traditional algorithms. To address these issues, we propose Bi2F-YOLO, an algorithm, specifically designed for underwater environments. Bi2F-YOLO integrates the BiFormer module into the YOLOv7 backbone, utilizing Bi-Level routing attention (BRA) to focus on key features such as object edges and textures. This effectively addresses the problem of object ambiguity. In the detection head, we replace the conventional ELAN component with the FasterNet module. This update enhances detection efficiency and accuracy through the use of partial convolution (PConv), which redistributes the convolution kernel weights based on the sparsity of the input feature map. By doing so, it prevents the dilution of critical underwater object features caused by interference from irrelevant data. This effectively resolves the occlusion problem in underwater target detection while simultaneously reducing model parameters and computational costs. The experimental results show that Bi2F-YOLO achieves 87.3%
FasterNet / BiFormer / YOLO / Loss function / Underwater object detection / Information and Computing Sciences / Artificial Intelligence and Image Processing
| [1] |
Chen GQ, Mao ZY, Wang K, Shen JE (2023a) HTDet: a hybrid transformer-based approach for underwater small object detection. Remote Sens 15(4):1076 |
| [2] |
Chen JR, Kao SH, He H, Zhuo WP, Wen S, Lee CH et al (2023b) Run, don’t walk: chasing higher FLOPS for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, pp 12021–12031 |
| [3] |
|
| [4] |
|
| [5] |
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv:2010.11929 |
| [6] |
Fu CP, Fan X, Xiao JW, Yuan WQ, Liu RS, Luo ZX (2023a) Learning heavily-degraded prior for underwater object detection. IEEE Trans Circuits Syst Video Technol 33(11):6887–6896 |
| [7] |
Fu CP, Liu RS, Fan X, Chen PY, Fu H, Yuan WQ et al (2023b) Rethinking general underwater object detection: datasets, challenges, and solutions. Neurocomputing 517:243–256 |
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Jocher G, Stoken A, Borovec J, Changyu L, Hogan A, Diaconu L et al (2020) Ultralytics yolov5. https://github.com/ultralytics/yolov5. Accessed 19 Dec 2024 |
| [13] |
|
| [14] |
Liang XT, Song PH (2022) Excavating RoI attention for underwater object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, pp 2651–2655 |
| [15] |
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D et al (2014) Microsoft COCO: common objects in context. In: 13th European Conference on Computer Vision (ECCV), Zurich, pp 740–755 |
| [16] |
Lin WH, Zhong JX, Liu S, Li T, Li G (2020) ROIMIX: proposal-fusion among multiple images for underwater object detection. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, pp 2588–2592 |
| [17] |
|
| [18] |
|
| [19] |
Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z et al (2021) Swin Transformer: hierarchical vision Transformer using shifted windows. In: 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, pp 10012–10022 |
| [20] |
|
| [21] |
|
| [22] |
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, pp 658–666 |
| [23] |
|
| [24] |
|
| [25] |
Sun YQ, Wang X, Zheng Y, Yao L, Qi SH, Tang LL (2022) Underwater object detection with swin transformer. In: 2022 4th International Conference on Data Intelligence and Security (ICDIS), Shenzhen, pp 422–427 |
| [26] |
Wang BW, Wang XQ, Wang YJ (2023a) UOD-FPN: an effective feature pyramid network for underwater object detection. In: 2023 International Conference on New Trends in Computational Intelligence (NTCI), Qingdao, pp 122–126 |
| [27] |
Wang CY, Bochkovskiy A, Liao HYM (2023b) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint at arXiv:2207.02696 |
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
Zheng ZH, Wang P, Liu W, Li JZ, Ye RG, Ren DW (2020) Distance-IoU loss: faster and better learning for bounding box regression. In: 34th AAAI Conference on Artificial Intelligence, New York, pp 12993–13000 |
| [34] |
Zhou JC, He ZX, Lam KM, Wang YD, Zhang WS, Guo CL et al (2024) AMSP-UOD: when vortex convolution and stochastic perturbation meet underwater object detection. In: 38th AAAI Conference on Artificial Intelligence, Vancouver, pp 7659–7667 |
| [35] |
Zhu L, Wang XJ, Ke ZH, Zhang W, Lau R (2023) BiFormer: vision transformer with Bi-level routing attention. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, pp 10323–10333 |
The Author(s)
/
| 〈 |
|
〉 |