MAU-Depth: a multi-attention-based underwater lightweight self-supervised monocular depth estimation method

Peng Yao , Yalu Wang , Dongdong Yang , Qiming Liu , Jiatao Yu

Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 30

PDF
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 30 DOI: 10.1007/s44295-025-00079-y
Research Paper
research-article

MAU-Depth: a multi-attention-based underwater lightweight self-supervised monocular depth estimation method

Author information +
History +
PDF

Abstract

Accurate depth estimation is essential for unmanned underwater vehicles to effectively perceive their environment during target tracking tasks. Therefore, we propose a self-supervised monocular depth estimation framework tailored for underwater scenes, incorporating multi-attention mechanisms and the unique optical characteristics of underwater imagery. To address issues such as color distortion in underwater images, primarily caused by light attenuation in underwater scenes, we design an adaptive underwater light attenuation loss function to improve the model’s adaptability and generalization across diverse underwater scenes. The inherent blurriness of underwater images poses considerable challenges for feature extraction and semantic interpretation. We use dilated convolutions and linear space reduction attention (CDC Joint Linear SRA) to capture local and global features of underwater images, which are then integrated through feature map fusion. Subsequently, we use a multi-attention feature enhancement module to further enhance the spatial and semantic information of the extracted features. To address fusion interference arising from discrepancies in semantic information between feature maps, we introduce a progressive fusion module that balances cross-module features using a two-step feature refinement strategy. Comparative, ablation, and generalization experiments were conducted on the FLSea dataset to verify the superiority of the proposed model.

Keywords

Monocular depth estimation / Underwater depth estimation / Self-supervised learning / Underwater vision

Cite this article

Download citation ▾
Peng Yao, Yalu Wang, Dongdong Yang, Qiming Liu, Jiatao Yu. MAU-Depth: a multi-attention-based underwater lightweight self-supervised monocular depth estimation method. Intelligent Marine Technology and Systems, 2025, 3(1): 30 DOI:10.1007/s44295-025-00079-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Alexey D, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv:2010.11929

[2]

Amitai S, Klein I, Treibitz T (2023) Self-supervised monocular depth underwater. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1098–1104. https://doi.org/10.1109/ICRA48891.2023.10161161

[3]

Bae J, Moon S, Im SProceedings of the AAAI Conference on Artificial IntelligenceAAAI. Deep digging into the generalization of self-supervised monocular depth estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 2023.

[4]

Choi S, Choi D, Kim D. Tie-kd: teacher-independent and explainable knowledge distillation for monocular depth estimation. Image Vis Comput, 2024, 148. 105110

[5]

de Queiroz MR, Ribeiro EG, dos Santos RN, Grassi VJr. On deep learning techniques to boost monocular depth estimation for autonomous navigation. Robot Auton Syst, 2021, 136. 103701

[6]

Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: 28th Conference on Neural Information Processing Systems (NIPS). NIPS, pp 1–9

[7]

Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 3827–3837. https://doi.org/10.1109/ICCV.2019.00393

[8]

Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3D packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 2482–2491. https://doi.org/10.1109/CVPR42600.2020.00256

[9]

Gupta H, Mitra K (2019) Unsupervised single image underwater depth estimation. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE, pp 624–628. https://doi.org/10.1109/icip.2019.8804200

[10]

Hambarde P, Murala S, Dhall A. UW-GAN: single-image depth estimation and image enhancement for underwater images. IEEE Trans Instrum Meas, 2021, 70. 5018412

[11]

He M, Hui L, Bian YK, Ren J, Xie J, Yang J (2022) RA-Depth: resolution adaptive self-supervised monocular depth estimation. In: 17th European Conference on Computer Vision. Springer, Cham, pp 565–581. https://doi.org/10.1007/978-3-031-19812-0_33

[12]

Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint at arXiv:1503.02531

[13]

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

[14]

Ibrahem H, Salem A, Kang HS. Rt-ViT: real-time monocular depth estimation using lightweight vision transformers. Sensors, 2022, 22(10): 3849.

[15]

Liu F, Huang M, Ge HY, Tao D, Gao RP. Unsupervised monocular depth estimation for monocular visual SLAM systems. IEEE Trans Instrum Meas, 2023, 73: 2502613.

[16]

Luo X, Huang JB, Szeliski R, Matzen K, Kopf J. Consistent video depth estimation. ACM Trans Graph, 2020, 39(4): 71.

[17]

Lyu XY, Liu L, Wang MM, Kong X, Liu LN, Liu Y et al (2021) HR-Depth: high resolution self-supervised monocular depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, pp 2294–2301. https://doi.org/10.1609/aaai.v35i3.16329

[18]

Peng YT, Zhao XY, Cosman PC (2015) Single underwater image enhancement using depth estimation based on blurriness. In: 2015 IEEE International Conference on Image Processing (ICIP). IEEE, pp 4952–4956. https://doi.org/10.1109/ICIP.2015.7351749

[19]

Pérez J, Bryson M, Williams SB, Sanz PJ. Recovering depth from still images for underwater dehazing using deep learning. Sensors, 2020, 20(16): 4580.

[20]

Randall Y (2023) FLSea: underwater visual-inertial and stereo-vision forward-looking datasets. PhD thesis, University of Haifa (Israel)

[21]

Ren WS, Wang LJ, Piao YR, Zhang M, Lu HC, Liu T (2022) Adaptive co-teaching for unsupervised monocular depth estimation. In: 17th European Conference on Computer Vision. Springer, pp 89–105. https://doi.org/10.1007/978-3-031-19769-7_6

[22]

Shao SW, Pei ZC, Chen WH, Sun DC, Chen PCY, Li ZG. MonoDiffusion: self-supervised monocular depth estimation using diffusion model. IEEE Trans Circuits Syst Video Technol, 2024, 35(4): 3664-3678.

[23]

Song W, Wang Y, Huang DM, Tjondronegoro D (2018) A rapid scene depth estimation model based on underwater light attenuation prior for underwater image restoration. In: Advances in Multimedia Information Processing–PCM 2018: 19th Pacific-Rim Conference on Multimedi. Springer, Cham, pp 678–688. https://doi.org/10.1007/978-3-030-00776-8_62

[24]

Sun JM, Xie YM, Chen LH, Zhou XW, Bao HJ (2021) NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 15593–15602. https://doi.org/10.1109/CVPR46437.2021.01534

[25]

Wang JD, Sun K, Cheng TH, Jiang BR, Deng CR, Zhao Y. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 43(10): 3349-3364.

[26]

Wang WH, Xie EZ, Li X, Fan DP, Song KT, Liang D et al (2021b) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 548–558. https://doi.org/10.1109/ICCV48922.2021.00061

[27]

Wang WH, Xie EZ, Li X, Fan DP, Song KT, Liang D. et al.. PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media, 2022, 8(3): 415-424.

[28]

Wu CY, Wang JL, Hall M, Neumann U, Su SC (2022) Toward practical monocular indoor depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 3804–3814. https://doi.org/10.1109/CVPR52688.2022.00379

[29]

Zhang F, You S, Li Y, Fu Y (2024) Atlantis: enabling underwater depth estimation with stable diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 11852–11861. https://doi.org/10.1109/CVPR52733.2024.01126

[30]

Zhang N, Nex F, Vosselman G, Kerle N (2023) Lite-Mono: a lightweight CNN and Transformer architecture for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 18537–18546. https://doi.org/10.1109/CVPR52729.2023.01778

[31]

Zhao CQ, Zhang YM, Poggi M, Tosi F, Guo XD, Zhu Z et al (2022) MonoViT: self-supervised monocular depth estimation with a vision transformer. In: International Conference on 3D Vision (3DV). IEEE, pp 668–678. https://doi.org/10.1109/3DV57658.2022.00077

[32]

Zhou H, Greenwood D, Taylor S (2021) Self-supervised monocular depth estimation with internal feature fusion. Preprint at arXiv:2110.09482

[33]

Zhou KY, Chen J, Gui SC, Wang ZK (2024) Towards lightweight underwater depth estimation. In: 2024 IEEE Conference on Artificial Intelligence (CAI). IEEE, pp 1442–1445. https://doi.org/10.1109/CAI59869.2024.00258

[34]

Zhou TH, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 6612–6619. https://doi.org/10.1109/CVPR.2017.700

Funding

National Key R&D Program of China(2024YFF0507102)

Natural Science Foundation of Shandong Province, China (ZR2023ME009)

National Natural Science Foundation of China(51909252)

RIGHTS & PERMISSIONS

The Author(s)

AI Summary AI Mindmap
PDF

194

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/