A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation

Ruolan Zhang , Xingchen Ji , Jinichi Koue , Katsutoshi Hirayama

Journal of Marine Science and Application ›› : 1 -17.

PDF
Journal of Marine Science and Application ›› : 1 -17. DOI: 10.1007/s11804-025-00703-7
Research Article

A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation

Author information +
History +
PDF

Abstract

The establishment of a reliable benchmark for evaluating model performance is critical for advancing deep learning (DL), including its application in the recognition of the ship navigation environment. Despite the steady progress being made in object detection models across various tasks, maritime navigation presents unique challenges, such as long distances, miscellaneous objects, wide perception scales, and local conditions and features of water areas. Therefore, the improvement of DL approaches for this domain remains a significant challenge. Using a widely applicable offshore image dataset from the ship bridge, we evaluated the performance of the state-of-the-art object detection model from three perspectives: average precision, multiscale feature calculation, and intersection-over-union design, and explored the factors that may affect the model performance evaluation benchmark from the perspective of data quality, scale calculation, feature quantification, and object association. Our experiments have demonstrated that, in the context of object detection tasks within complex water surface traffic scenes, comprehensive model performance evaluation benchmarks are essential. Such benchmarks must incorporate multiple dimensions of the model.

Keywords

Long-range perception / Visual navigation / Dataset / Multiscale detection / Vision benchmark

Cite this article

Download citation ▾
Ruolan Zhang, Xingchen Ji, Jinichi Koue, Katsutoshi Hirayama. A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation. Journal of Marine Science and Application 1-17 DOI:10.1007/s11804-025-00703-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934

[2]

BorkarS, GhutkeP, PatilW, JoshiS, SorteS. A review of pick and place robots for the pharmaceutical industry. 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP), 2023, Nagpur, India, IEEE: 1-6

[3]

CaiJ, ChenG, YinJ, DingC, SuoY, ChenJ. A Review of Autonomous Berthing Technology for Ships. Journal of Marine Science and Engineering, 2024, 12(7): 1137

[4]

CavegnS, HaalaN, NebikerS, RothermelM, TutzauerP. Benchmarking high density image matching for oblique airborne imagery. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, 40(3): 45

[5]

ChaiJ, ZengH, LiA, NgaiEW. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 2021, 6: 100134

[6]

ChenB, GhiasiG, LiuH, LinTY, KalenichenkoD, AdamH, LeQV. MnasFPN: Learning latency-aware pyramid architecture for object detection on mobile devices. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 202013607-13616

[7]

CordtsM, OmranM, RamosS, RehfeldT, EnzweilerM, BenensonR, FrankeU, RothS, SchieleB. The Cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 20163213-3223

[8]

DengJ, DongW, SocherR, LiLJ, LiK, LiFF. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009248-255

[9]

DuanK, BaiS, XieL, QiH, HuangQ, TianQ. CenterNet: Keypoint triplets for object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 20196569-6578

[10]

DurlikI, MillerT, Cembrowska-LechD, KrzeminskaA, ZioczowskaE, NowakA. Navigating the sea of data: a comprehensive review on data analysis in maritime IoT applications. Applied Sciences, 2023, 13(17): 9742

[11]

ErMJ, ChenJ, ZhangY, GaoW. Research challenges, recent advances, and popular datasets in deep learning-based underwater marine object detection: A review. Sensors, 2023, 23(4): 1990

[12]

EveringhamM, Van GoolL, WilliamsCK, WinnJ, ZissermanA. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338

[13]

GirshickRFast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 20151440-1448

[14]

GirshickR, DonahueJ, DarrellT, MalikJ. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014580-587

[15]

Hackel T, Savinov N, Ladicky L, Wegner JD, Schindler K, Pollefeys M (2017) Semantic3d. net: A new large-scale point cloud classification benchmark. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:1704.03847. https://doi.org/10.48550/arXiv.1704.03847

[16]

HanX, ZhaoL, NingY, HuJ. ShipYolo: an enhanced model for ship detection. Journal of Advanced Transportation, 2021, 2021(1): 1090182

[17]

HeJ, ErfaniS, MaX, BaileyJ, ChiY, HuaXS. α-IoU: A family of power intersection over union losses for bounding box regression. 35th Conference on Neural Information Processing Systems, 20211-13

[18]

HeK, ZhangX, RenS, SunJ. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916

[19]

HeK, ZhangX, RenS, SunJ. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016770-778

[20]

HendersonP, FerrariV. End-to-end training of object class detectors for mean average precision. Asian Conference on Computer Vision, 2016, Cham, Springer: 198-213

[21]

HowardA, SandlerM, ChenB, WangW, ChenLC, TanM, ChuG, VasudevanV, ZhuY, PangR, AdamH, LeQSearching for mobilenetv3. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 20191314-1324

[22]

HussainM, SaherN, QadriS. Computer vision approach for liver tumor classification using CT dataset. Applied Artificial Intelligence, 2022, 36(1): 2055395

[23]

IancuB, SolovievV, ZelioliL, LiliusJ. ABOships—An inshore and offshore maritime vessel detection dataset with precise annotations. Remote Sensing, 2021, 13(5): 988

[24]

IdreesH, TayyabM, AthreyK, ZhangD, Al-MaadeedS, RajpootN, ShahM. Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018532-546

[25]

IslamMA, MobarakMH, RimonMIH, Al MahmudMZ, GhoshJ, AhmedMMS, HossainN. Additive manufacturing in polymer research: Advances, synthesis, and applications. Polymer Testing, 2024, 132: 108364

[26]

IsmailN, MalikOA. Real-time visual inspection system for grading fruits using computer vision and deep learning techniques. Information Processing in Agriculture, 2022, 9(1): 24-37

[27]

JocherG. YOLOv5 by Ultralytics (Version 7.0). Computer software, 2020

[28]

KarasV, SchullerDM, SchullerBW. Audiovisual affect recognition for autonomous vehicles: Applications and future agendas. IEEE Transactions on Intelligent Transportation Systems, 2023, 25(6): 4918-4932

[29]

KaurR, SinghS. A comprehensive review of object detection with deep learning. Digital Signal Processing, 2023, 132: 103812

[30]

KhanW, ZakiN, AliL. Intelligent pneumonia identification from chest x-rays: A systematic literature review. IEEE Access, 2021, 9: 51747-51771

[31]

LenkaAK, TripathyHK. 5 Computer vision for medical diagnosis and surgery. Healthcare Big Data Analytics: Computational Optimization and Cohesive Approache, 2024, Berlin, De Gruyter: 101-124

[32]

LiY, MoreauJ, Ibanez-GuzmanJ. Emergent visual sensors for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(5): 4716-4737

[33]

LinTY, MaireM, BelongieS, HaysJ, PeronaP, RamananD, DollárP, ZitnickCLFleetD, PajdlaT, SchieleB, TuytelaarsT. Microsoft COCO: Common objects in context. Computer Vision-ECCV 2014 (ECCV 2014), 2014, Cham, Springer: 740-755

[34]

LinTY, GoyalP, GirshickR, HeK, DollarP. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 20172980-2988

[35]

Liu S, Gao C, Chen Y, Peng X, Kong X, Wang K, Xu R, Jiang W, Ma J, Wang M (2023) Towards vehicle-to-everything autonomous driving: A survey on collaborative perception. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv: 2308.16714

[36]

LiuW, AnguelovD, ErhanD, SzegedyC, ReedS, FuCY, BergAC. SSD: Single shot multibox detector. European Conference on Computer Vision, 2016, Cham, Springer: 21-37

[37]

LiuY, LuB, PengJ, ZhangZ. Research on the use of YOLOv5 object detection algorithm in mask wearing recognition. World Scientific Research Journal, 2020, 6(11): 276-284

[38]

LiuZ, LuoP, WangX, TangX. Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 2018, 15(2018): 11

[39]

Long X, Deng K, Wang G, Zhang Y, Dang Q, Gao Y, Wen S (2020) PP-YOLO: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099. https://doi.org/10.48550/arXiv.2007.12099

[40]

ManakitsaN, MaraslidisGS, MoysisL, FragulisGF. A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies, 2024, 12(2): 15

[41]

MenzeM, GeigerA. Object scene flow for autonomous vehicles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 20153061-3070

[42]

RedmonJ, DivvalaS, GirshickR, FarhadiA. You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016779-788

[43]

RenS, HeK, GirshickR, SunJ. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6): 1137-1149

[44]

ShaoZ, WuW, WangZ, DuW, LiC. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia, 2018, 20(10): 2593-2604

[45]

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:1409.1556

[46]

TanM, LeQ. Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 20196105-6114

[47]

TanM, PangR, LeQV. EfficientDet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 202010781-10790

[48]

VoulodimosA, DoulamisN, DoulamisA, ProtopapadakisE. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 2018(1): 7068349

[49]

YanB, PengH, FuJ, WangD, LuH. Learning spatio-temporal transformer for visual tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 202110448-10457

[50]

YuJ, ZhangC, WangS. Multichannel one-dimensional convolutional neural network-based feature learning for fault diagnosis of industrial processes. Neural Computing and Applications, 2021, 33(8): 3085-3104

[51]

ZhangR, JiX, PanM. Diversified assessment benchmark of vision dataset-based perception in ship navigation scenario. Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning, Dalian, China, 2022282-287

[52]

ZhangYF, RenW, ZhangZ, JiaZ, WangL, TanT. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing, 2022, 506: 146-157

[53]

ZhouB, LapedrizaA, KhoslaA, OlivaA, TorralbaA. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(6): 1452-1464

[54]

ZhouZ, SunJ, YuJ, LiuK, DuanJ, ChenL, ChenCP. An image-based benchmark dataset and a novel object detector for water surface object detection. Frontiers in Neurorobotics, 2021, 15: 723336

RIGHTS & PERMISSIONS

Harbin Engineering University and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF

134

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/