UTNet: event-RGB multimodal fusion model for underwater transparent organism detection

Fengyue Guo , Peng Ren , Cai Luo

Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 18

PDF
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 18 DOI: 10.1007/s44295-025-00065-4
Research Paper

UTNet: event-RGB multimodal fusion model for underwater transparent organism detection

Author information +
History +
PDF

Abstract

In underwater environments, transparent organisms with low visibility and minimal visual features, lacking distinctive shadows or silhouettes, can blend seamlessly into their surroundings. Existing deep learning methods for detecting such organisms have shown unsatisfactory performance. This study proposes a multimodal fusion network, UTNet, which combines event-based and red-green-blue (RGB)-based vision for the underwater transparent camouflaged organism detection task. UTNet introduces a two-stage enhanced representation aggregation module comprising a multi-feature aggregation component (MFAC) and a deep fusion component (DFC) to facilitate the synergy between frame-based and event-based vision. First, MFAC aggregates the high dynamic range features from events with the static details from RGB images. Then, the edge information from the edge clue search module is used to guide the fusion process, reducing background interference. Next, DFC further extracts depth information from the MFAC output using five parallel branches. Additionally, a submanifold sparse convolution-modified ResNet50 backbone network is employed to extract features from event frames, preserving event sparsity and improving computational efficiency. Extensive experiments on our custom underwater transparent organism dataset, captured using the DAVIS346 event camera, demonstrate the effectiveness of UTNet. The results show that UTNet achieves 75.2% accuracy and 37.8 frames per second, providing the best trade-off between speed and accuracy compared to other detectors.

Keywords

Deep learning / Underwater transparent organism detection / Event camera / Multimodal fusion

Cite this article

Download citation ▾
Fengyue Guo, Peng Ren, Cai Luo. UTNet: event-RGB multimodal fusion model for underwater transparent organism detection. Intelligent Marine Technology and Systems, 2025, 3(1): 18 DOI:10.1007/s44295-025-00065-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Cannici M, Ciccone M, Romanoni A, Matteucci M (2019) Asynchronous convolutional networks for object detection in neuromorphic cameras. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 1656–1665

[2]

CaoH, ChenG, XiaJH, ZhuangGH, KnollA. Fusion-based feature attention gate component for vehicle detection based on event camera. IEEE Sens J, 2021, 21(21): 24540-24548

[3]

Chen HS, Suter D, Wu QQ, Wang HZ (2020a) End-to-end learning of object motion estimation from retinal events for event-based object tracking. In: 34th AAAI Conference on Artificial Intelligence. AAAI, pp 10534–10541

[4]

Chen HS, Wu QQ, Liang YJ, Gao XB, Wang HZ (2019) Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In: Proceedings of the 27th ACM International Conference on Multimedia (MM ’19). ACM, pp 473–481

[5]

ChenL, ZhouFX, WangSK, DongJY, LiN, MaHP, et al.. SWIPENET: object detection in noisy underwater scenes. Pattern Recognit, 2022, 132: 108926

[6]

Chen NFY (2018) Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 757–766

[7]

ChenZ, GaoHM, ZhangZ, ZhouHL, WangX, TianY. Underwater salient object detection by combining 2D and 3D visual features. Neurocomputing, 2020, 391: 249-259

[8]

Cheng TL, Li J, Luo JT, Li ZH (2024) Improved YOLOv8 for complex environmental fish detection. In: 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, pp 797–801

[9]

DaiLH, LiuH, SongPH, TangH, DingRW, LiSQ. Edge-guided representation learning for underwater object detection. CAAI Trans Intell Technol, 2024, 9(5): 1078-1091

[10]

El ShairZ, RawashdehS. High-temporal-resolution event-based vehicle detection and tracking. Opt Eng, 2023, 62(3): 031209

[11]

FayazS, ParahSA, QureshiGJ. Underwater object detection: architectures and algorithms-a comprehensive review. Multimed Tools Appl, 2022, 81(15): 20871-20916

[12]

GallegoG, DelbrückT, OrchardGM, BartolozziC, TabaB, CensiA, et al.. Event-based vision: a survey. IEEE Trans Pattern Anal Mach Intell, 2020, 44(1): 154-180

[13]

GehrigD, ScaramuzzaD. Low-latency automotive vision with event cameras. Nature, 2024, 629(8014): 1034-1040

[14]

Graham B, van der Maaten L (2017) Submanifold sparse convolutional networks. Preprint at arXiv:1706.01307

[15]

HuYH, DelbruckT, LiuSC, et al. VedaldiA, et al.. Learning to exploit multiple vision modalities by using grafted networks. Lecture notes in computer science, 2020 Cham Springer 85-101 12361

[16]

JiaJQ, FuM, LiuXF, ZhengB. Underwater object detection based on improved efficientDet. Remote Sens, 2022, 14(18): 4487

[17]

Jia Q, Yao SL, Liu Y, Fan X, Liu RS, Luo ZX (2022b) Segment, magnify and reiterate: detecting camouflaged objects the hard way. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4703–4712

[18]

JianMM, LiuXY, LuoHJ, LuXW, YuH, DongJY. Underwater image processing and analysis: a review. Signal Proc-Image Commun, 2021, 91: 116088

[19]

Jiang ZY, Xia PF, Huang K, Stechele W, Chen G, Bing ZS et al (2019) Mixed frame-/event-driven fast pedestrian detection. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 8332–8338

[20]

JohnsenS. Hidden in plain sight: the ecology and physiology of organismal transparency. Biol Bull, 2001, 201(3): 301-318

[21]

Kajiura N, Liu H, Satoh S (2021) Improving camouflaged object detection with the uncertainty of pseudo-edge labels. In: Proceedings of the 3rd ACM International Conference on Multimedia in Asia. ACM, pp 1–7

[22]

KimJH, KimN, ParkYW, WonCS. Object detection and classification based on YOLO-V5 with improved maritime dataset. J Mar Sci Eng, 2022, 10(3): 377

[23]

KugeleA, PfeilT, PfeifferM, ChiccaE, et al. BauckhageC, et al.. Hybrid SNN-ANN: energy-efficient classification and object detection for event-based vision. Lecture Notes in Computer Science, 2021 Cham Springer 297-312 13024

[24]

LiDZ, TianYH, LiJN. SODFormer: streaming object detection with transformer using events and frames. IEEE Trans Pattern Anal Mach Intell, 2023, 45(11): 14020-14037

[25]

Li J, Dong SW, Yu ZF, Tian YH, Huang TJ (2019) Event-based vision enhanced: a joint detection framework in autonomous driving. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1396–1401

[26]

LiJN, LiJ, ZhuL, XiangXJ, HuangTJ, TianYH. Asynchronous spatio-temporal memory network for continuous event-based object detection. IEEE Trans Image Proc, 2022, 31: 2975-2987

[27]

LiTY, RongSH, ChenL, ZhouHY, HeB. Underwater motion deblurring based on cascaded attention mechanism. IEEE J Ocean Eng, 2022, 49(1): 262-278

[28]

Liang XT, Song PH (2022) Excavating roi attention for underwater object detection. In: 2022 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2651–2655

[29]

Liang ZC, Cao H, Yang C, Zhang ZK, Chen G (2022) Global-local feature aggregation for event-based object detection on eventKITTI. In: 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, pp 1–7

[30]

Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 936–944

[31]

Lin WH, Zhong JX, Liu S, Li T, Li G (2020) Roimix: proposal-fusion among multiple images for underwater object detection. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2588–2592

[32]

LiuJ, LiuS, XuSJ, ZhouCJ. Two-stage underwater object detection network using swin transformer. IEEE Access, 2022, 10: 117235-117247

[33]

Liu MS, Zhu ML (2018) Mobile video object detection with temporally-aware feature maps. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 5686–5695

[34]

Liu MY, Qi N, Shi YH, Yin BC (2021) An attention fusion network for event-based vehicle object detection. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp 3363–3367

[35]

LuoC, WuJH, SunSX, RenP. TransCODNet: underwater transparently camouflaged object detection via rgb and event frames collaboration. IEEE Robot Autom Lett, 2023, 9(2): 1444-1451

[36]

MessikommerN, GehrigD, LoquercioA, ScaramuzzaD, et al. VedaldiA, et al.. Event-based asynchronous sparse convolutional networks. Lecture Notes in Computer Science, 2020 Cham Springer 415-431 12353

[37]

OuyangWJ, WeiYH. An anchor-free detector with channel-based prior and bottom-enhancement for underwater object detection. IEEE Sens J, 2023, 23(20): 24800-24811

[38]

Pang JM, Chen K, Shi JP, Feng HJ, Ouyang WL, Lin DH (2019) Libra R-CNN: towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 821–830

[39]

Perot E, De Tournemire P, Nitti D, Masci J, Sironi A (2020) Learning to detect objects with a 1 megapixel event camera. In: 34th Conference on Neural Information Processing Systems (NeurIPS). NIPS, pp 16639–16652

[40]

RahmanS, LiAQ, RekleitisI. SVIn2: a multi-sensor fusion-based underwater SLAM system. Int J Robot Res, 2022, 41(11–12): 1022-1042

[41]

RenSQ, HeKM, GirshickR, SunJ. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137-1149

[42]

Salvatore N, Fletcher J (2022) Learned event-based visual perception for improved space object detection. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 3301–3310

[43]

SongPH, LiPT, DaiLH, WangT, ChenZ. Boosting R-CNN: reweighting R-CNN samples by RPN’s error for underwater object detection. Neurocomputing, 2023, 530: 150-164

[44]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2818–2826

[45]

Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 7464–7475

[46]

XingHM, LiuY, GuoSX, ShiLW, HouXH, LiuWZ, et al.. A multi-sensor fusion self-localization system of a miniature underwater robot in structured and GPS-denied environments. IEEE Sens J, 2021, 21(23): 27136-27146

[47]

XuFQ, WangHB, SunXD, FuXP. Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput Appl, 2022, 34(17): 14881-14894

[48]

YangYX, XieLY, FuZY, YanJW, NaqviSM. Pose-oriented scene-adaptive matching for abnormal event detection. Neurocomputing, 2025, 611: 128673

[49]

Yang F, Li ZX, Wang YL, Guo DS (2023) An underwater object detection algorithm based on improved YOLOv5s. In: 2023 2nd International Conference on Advanced Sensing, Intelligent Manufacturing (ASIM). IEEE, pp 83–87

[50]

YehCH, LinCH, KangLW, HuangCH, LinMH, ChangCY, et al.. Lightweight deep neural network for joint learning of underwater object detection and color conversion. IEEE Trans Neural Netw Learn Syst, 2021, 33(11): 6129-6143

[51]

ZhaoZQ, ZhengP, XuST, WuXD. Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst, 2019, 30(11): 3212-3232

[52]

ZhouT, ZhouY, GongC, YangJ, ZhangY. Feature aggregation and propagation network for camouflaged object detection. IEEE Trans Image Proc, 2022, 31: 7036-7047

[53]

Zhu XZ, Su WJ, Lu LW, Li B, Wang XG, Dai JF (2020) Deformable DETR: deformable transformers for end-to-end object detection. Preprint at arXiv:2010.04159

[54]

ZouZX, ChenKY, ShiZW, GuoYH, YeJP. Object detection in 20 years: a survey. Proc IEEE, 2023, 111(3): 257-276

Funding

Fundamental Research Funds for the Central Universities(22CX01004A)

RIGHTS & PERMISSIONS

The Author(s)

AI Summary AI Mindmap
PDF

313

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/