Reinforcement Learning Applications in Unmanned Underwater Vehicles: A Review

Zinuo Tian , Sai Deng , Zhengxing Wu , Chao Zhou

Journal of Marine Science and Application ›› : 1 -18.

PDF
Journal of Marine Science and Application ›› : 1 -18. DOI: 10.1007/s11804-025-00723-3
Review

Reinforcement Learning Applications in Unmanned Underwater Vehicles: A Review

Author information +
History +
PDF

Abstract

The ocean, which is a key component of Earth’s ecosystem, requires advanced technologies for deep and highly comprehensive exploration. Unmanned underwater vehicles (UUVs) play an important role in this task, but their development encounters great challenges due to the complex and dynamic underwater environment. Reinforcement learning (RL) has recently emerged as a promising method to improve the capabilities of UUVs. This study comprehensively reviews the implementations of RL in UUVs, with a focus on key tasks such as motion planning, navigation and control, and multiagent coordination. We investigate current difficulties and emerging trends, as illustrated by a case study. This review aims to provide a foundation for RL-based control and decision-making in UUVs and offer actionable insights for advancing studies in this rapidly evolving domain.

Keywords

Unmanned underwater vehicles / Reinforcement learning / Motion planning / Navigation control / Multiagent collaboration

Cite this article

Download citation ▾
Zinuo Tian, Sai Deng, Zhengxing Wu, Chao Zhou. Reinforcement Learning Applications in Unmanned Underwater Vehicles: A Review. Journal of Marine Science and Application 1-18 DOI:10.1007/s11804-025-00723-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

BaharinUNA, KamarudinKR. Using remote-controlled vehicles (ROV) as tools for sea cucumber conservation: A review. Maritime Technology and Research, 2025, 7: 274346-274346

[2]

BaoH, ZhangY, SongM, KongQ, HuX, AnX. A review of underwater vehicle motion stability. Ocean Engineering, 2023, 287: 115735

[3]

BellmanR. Dynamic programming. Science, 1966, 153: 34-37

[4]

BertsekasDP Neuro-dynamic programming, 1996

[5]

BoraseRP, MaghadeDK, SondkarSY, PawarSN. A review of PID control, tuning methods and applications. Int. J. Dynam. Control, 2021, 9: 818-827

[6]

ChenT, ZhangZ, FangZ, JiangD, LiG. Imitation learning from imperfect demonstrations for AUV path tracking and obstacle avoidance. Ocean Engineering, 2024, 298: 117287

[7]

ChristensenL, De Gea FernándezJ, HildebrandtM, KochCES, WehbeB. Recent advances in AI for navigation and control of underwater robots. Curr Robot Rep, 2022, 3: 165-175

[8]

CoatesRFW Underwater acoustic systems, 1990 London Macmillan Education UK

[9]

CohenJE, SmallC, MellingerA, GallupJ, SachsJ. Estimates of coastal populations. Science, 1997, 278: 1209-1213

[10]

CongY, GuC, ZhangT, GaoY. Underwater robot sensing technology: A survey. Fundamental Research, 2021, 1: 337-345

[11]

DunbabinM, MarquesL. Robots for environmental monitoring: significant advancements and applications. IEEE Robotics & Automation Magazine, 2012, 19: 24-39

[12]

El-FakdiA, CarrerasM. Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robotics and Autonomous Systems, 2013, 61: 271-282

[13]

FeinbergV, WanA, StoicaI, JordanMI, GonzalezJE, LevineS Model-based value estimation for efficient model-free reinforcement Learning, 2018

[14]

FernandezDC, HollingerGA. Model predictive control for underwater robots in ocean waves. IEEE Robot. Autom. Lett., 2017, 2: 88-95

[15]

FossenTI Handbook of marine craft hydrodynamics and motion control, 2011

[16]

GaskettC, WettergreenD, ZelinskyA. Reinforcement learning applied to the control of an autonomous underwater vehicle. Proceedings of the AustraLian Conference on Robotics and Automation (AuCRA), 1999

[17]

GriffithsG Technology and applications of autonomous underwater vehicles, 2002

[18]

HadiB, KhosraviA, SarhadiP. Adaptive formation motion planning and control of autonomous underwater vehicles using deep reinforcement learning. IEEE J. Oceanic Eng., 2024, 49: 311-328

[19]

HanZ, WangY, SunQ. Straight-path following and formation control of USVs using distributed deep reinforcementlearning and adaptive neural network. IEEE/CAA J. Autom. Sinica, 2023, 10: 572-574

[20]

HouX, WangJ, BaiT, DengY, RenY, HanzoL. Environment-aware AUV trajectory design and resource management for multitier underwater computing. IEEE J. Select. Areas Commun., 2023, 41: 474-490

[21]

HowardRA. Dynamic programming and markov processes. MIT Press, 1960, 2: 39-47

[22]

HuangH, TangQ, LiJ, ZhangW, BaoX, ZhuH, WangG. A review on underwater autonomous environmental perception and target grasp, the challenge of robotic organism capture. Ocean Engineering, 2020, 195: 106644

[23]

HuyDQ, SadjoliN, AzamAB, ElhadidiB, CaiY, SeetG. Object perception in underwater environments: a survey on sensors and sensing methodologies. Ocean Engineering, 2023, 267: 113202

[24]

KelasidiE, PettersenKY, GravdahlJT. Energy efficiency of underwater robots. 10th IFAC Conference on Manoeuvring and Control of Marine Craft MCMC, 2015, 48: 152-159

[25]

KondaV, TsitsiklisJ. Actor-critic algorithms. Advances in neural information processing systems, 1999, 12: 1008-1014

[26]

LanW, JinX, ChangX, WangT, ZhouH, TianW, ZhouL. Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning. Ocean Engineering, 2022, 262: 112226

[27]

LeCunY, BengioY, HintonG. Deep learning. Nature, 2015, 521: 436-444

[28]

LiD, DuL. AUV trajectory tracking models and control strategies: a review. Journal of Marine Science and Engineering, 2021, 9: 1020

[29]

LillicrapTP Continuous control with deep reinforcement learning, 2015

[30]

LuoY, XuH, LiY, TianY, DarrellT, MaT Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees, 2021

[31]

LyuX, SunY, WangL, TanJ, ZhangL. End-to-end AUV local motion planning method based on deep reinforcement learning. Journal of Marine Science and Engineering, 2023, 11: 1796

[32]

MaD, ChenX, MaW, ZhengH, QuF. Neural network modelbased reinforcement learning control for AUV 3-D path following. IEEE Trans. Intell. Veh., 2024, 9: 893-904

[33]

MacreadiePI, McLeanDL, ThomsonPG, PartridgeJC, JonesDOB, GatesAR, BenfieldMC, CollinSP, BoothDJ, SmithLL, TecheraE, SkropetaD, HortonT, PattiaratchiC, BondT, FowlerAM. Eyes in the sea: unlocking the mysteries of the ocean using industrial, remotely operated vehicles (ROVs). Science of The Total Environment, 2018, 634: 1077-1091

[34]

MaiC, PedersenS, HansenL, JepsenKL, YangZ. Subsea infrastructure inspection: A review study. IEEE International Conference on Underwater System Technology: Theory and Applications (USYS), Penang, Malaysia, 2016 71-76

[35]

MnihV Playing atari with deep reinforcement learning, 2013

[36]

MorozsN, GormaW, HensonBT, ShenL, MitchellPD, ZakharovYV. Channel modeling for underwater acoustic network simulation. IEEE Access, 2020, 8: 136151-136175

[37]

PaullL, SaeediS, SetoM, LiH. AUV navigation and localization: a review. IEEE Journal of Oceanic Engineering, 2014, 39: 131-149

[38]

PereiraMI, PintoAM. Reinforcement learning based robot navigation using illegal actions for autonomous docking of surface vehicles in unknown environments. Engineering Applications of Artificial Intelligence, 2024, 133: 108506

[39]

PolitiE, StefanidouA, ChronisC, DimitrakopoulosG, IraklisV. Adaptive deep reinforcement learning for efficient 3D navigation of autonomous underwater vehicles. IEEE Access, 2024, 12: 178209-178221

[40]

RummeryGA, NiranjanM On-line Q-learning using connectionist systems, 1994 Cambridge, UK University of Cambridge, Department of Engineering Cambridge

[41]

SchettiniR, CorchsS. Underwater image processing: state of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 746052, 2010

[42]

SchulmanJ Trust region policy optimization, 2015

[43]

SchulmanJ, WolskiF, DhariwalP, RadfordA, KlimovO Proximal policy optimization algorithms, 2017

[44]

SilverD, HuangA, MaddisonCJ, GuezA, SifreL, Van Den DriesscheG, SchrittwieserJ, AntonoglouI, PanneershelvamV, LanctotM. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484-489

[45]

SilverD, HubertT, SchrittwieserJ, AntonoglouI, LaiM, GuezA, LanctotM, SifreL, KumaranD, GraepelT, LillicrapT, SimonyanK, HassabisD. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018, 362: 1140-1144

[46]

SinghB, KumarR, SinghVP. Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev, 2022, 55: 945-990

[47]

SuZ, ZhouM, HanF, ZhuY, SongD, GuoT. Attitude control of underwater glider combined reinforcement learning with active disturbance rejection control. J Mar Sci Technol, 2019, 24: 686-704

[48]

SunY, LuoX, RanX, ZhangG. A 2D optimal path planning algorithm for autonomous underwater vehicle driving in unknown underwater canyons. JMSE, 2021, 9: 252

[49]

SunY, RanX, ZhangG, XuH, WangX. AUV 3D path planning based on the improved hierarchical deep Q network. Journal of Marine Science and Engineering, 2020, 8: 145

[50]

SunY, YanC, XiangX, ZhouH, TangD, ZhuY. Towards end-toend formation control for robotic fish via deep reinforcement learning with non-expert imitation. Ocean Engineering, 2023, 271: 113811

[51]

SuttonRS. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Machine Learning Proceedings, 1990 216-224

[52]

SuttonRS. Learning to predict by the methods of temporal differences. Mach Learn, 1988, 3: 9-44

[53]

SuttonRS, BartoAG Reinforcement learning, 2018 Second edition

[54]

TerraccianoDS, BazzarelloL, CaitiA, CostanziR, ManzariV. Marine robots for underwater surveillance. Curr Robot Rep, 2020, 1: 159-167

[55]

ThorpeSA An introduction to ocean turbulence, 2007 Cambridge Cambridge University Press

[56]

TongR, FengY, WangJ, WuZ, TanM, YuJ. A survey on reinforcement learning methods in bionic underwater robots. Biomimetics, 2023, 8: 168

[57]

Van HasseltH, GuezA, SilverD. Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, 2016 2094-2100

[58]

VuQT, PhamMH, NguyenVD, DuongVT, NguyenHH, NguyenTT NguyenDC, VuNP, LongBT, PutaH, SattlerKU. Optimization of central pattern generator-based locomotion controller for fish robot using deep deterministic policy gradient. Advances in engineering research and application, lecture notes in networks and systems, 2023

[59]

WangY, CaoJ, SunJ, ZouX, SunC. Path following control for unmanned surface vehicles: a reinforcement learning-based method with experimental validation. IEEE Trans. Neural Netw. Learning Syst., 2024, 35: 18237-18250

[60]

WangY, LuC, WuP, ZhangX. Path planning for unmanned surface vehicle based on improved Q-Learning algorithm. Ocean Engineering, 2024, 292: 116510

[61]

WangY, XiM, WengY. Intelligent path planning algorithm of autonomous underwater vehicle based on vision under ocean current. Expert Systems, 2025, 42: e13399

[62]

WangZ, SchaulT, HesselM, HasseltH, LanctotM, FreitasN. Dueling network architectures for deep reinforcement learning. International Conference on Machine Learning, 2016 1995-2003

[63]

WangZ, WenZ, XiaQ, CaiW. Deep reinforcement learning based multi-UUV cooperative control for target capturing. IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress, 2022 Falerna, Italy (DASC/PiCom/CBDCom/CyberSciTech) 1-6

[64]

WangZ, XiangX, DuanY, YangS. Adversarial deep reinforcement learning based robust depth tracking control for underactuated autonomous underwater vehicle. Engineering Applications of Artificial Intelligence, 2024, 130: 107728

[65]

WangZ, ZhangS, FengX, SuiY. Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 2021, 235: 1787-1796

[66]

WatkinsCJCH Learning from delayed rewards, 1989

[67]

WeiW, WangJ, DuJ, FangZ, RenY, ChenCLP. Differential game-based deep reinforcement learning in underwater target hunting task. IEEE Transactions on Neural Networks and Learning Systems 36(1), 2023 462-474

[68]

WeiX, WangH, TangY. Deep hierarchical reinforcement learning based formation planning for multiple unmanned surface vehicles with experimental results. Ocean Engineering, 2023, 286: 115577

[69]

WenJ, WangA, ZhuJ, XiaF, PengZ, ZhangW. Adaptive energy-efficient reinforcement learning for AUV 3D motion planning in complex underwater environments. Ocean Engineering, 2024, 312: 119111

[70]

WilliamsRJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229-256

[71]

WongC, YangE, YanXT, GuD. Autonomous robots for harsh environments: a holistic overview of current solutions and ongoing challenges. Systems Science & Control Engineering, 2018, 6: 213-219

[72]

WynnRB, HuvenneVAI, LeB T, MurtonBJ, ConnellyDP, BettBJ, RuhlHA, MorrisKJ, PeakallJ, ParsonsDR, SumnerEJ, DarbySE, DorrellRM, HuntJE. Autonomous underwater vehicles (AUVs): their past, present and future contributions to the advancement of marine geoscience. Marine Geology, 2014, 352: 451-468

[73]

XiaJ, LuoY, LiuZ, ZhangY, ShiH, LiuZ. Cooperative multitarget hunting by unmanned surface vehicles based on multiagent reinforcement learning. Defence Technology, 2023, 29: 80-94

[74]

XuJ, HuangF, WuD, CuiY, YanZ, ZhangK. Deep reinforcement learning based multi-AUVs cooperative decisionmaking for attack-defense confrontation missions. Ocean Engineering, 2021, 239: 109794

[75]

XuJ, ZhangZ, WangJ, HanZ, RenY. Multi-AUV pursuitevasion game in the internet of underwater things: an efficient training framework via offline reinforcement learning. IEEE Internet of Things Journal, 2024, 11: 31273-31286

[76]

YanZ, YanJ, WuY, CaiS, WangH. A novel reinforcement learning based tuna swarm optimization algorithm for autonomous underwater vehicle path planning. Mathematics and Computers in Simulation, 2023, 209: 55-86

[77]

YangX, WuZ, YuJ. Design and implementation of a robotic shark with a novel embedded vision system. IEEE International Conference on Robotics and Biomimetics (ROBIO), Qingdao, China, 2016 841-846

[78]

YaoX, WangF, YuanC, WangJ, WangX. Path planning for autonomous underwater vehicles based on interval optimization in uncertain flow fields. Ocean Engineering, 2021, 234: 108675

[79]

YoergerDR, JakubaM, BradleyAM, BinghamB ThrunS, BrooksR, Durrant-WhyteH. Techniques for deep sea near bottom survey using an autonomous underwater vehicle. Robotics research, springer tracts in advanced robotics, 2007 Berlin, Heidelberg Springer Berlin Heidelberg 416-429

[80]

YooB, KimJ. Path optimization for marine vehicles in ocean currents using reinforcement learning. J Mar Sci Technol, 2016, 21: 334-343

[81]

YouK, WuH, HsuY, SongS. A selected review of reinforcement learning-based control for autonomous underwater vehicles. Sci. Sin. -Inf., 2020, 50: 1798

[82]

YuH, ZhangS. Collaborative task decision-making of multi-UUV in dynamic environments based on deep reinforcement learning. Ships and Offshore Structures, 2024 1-11

[83]

YuJ, WuZ, YangX, YangY, ZhangP. Underwater target tracking control of an untethered robotic fish with a camera stabilizer. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51: 6523-6534

[84]

YuR, ShiZ, HuangC, LiT, MaQ. Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle. 36th Chinese Control Conference (CCC), IEEE, Dalian, China, 2017 4958-4965

[85]

YuhJ. Design and control of autonomous underwater robots: a survey. Autonomous Robots, 2000, 8: 7-24

[86]

ZhangD, PanG, CaoY, HuangQ, CaoY SunF, CangelosiA, ZhangJ, YuY, LiuH, FangB. Depth control of a biomimetic manta robot via reinforcement learning. Cognitive Systems and Information Processing, Communications in Computer and Information Science, 2023 Singapore Springer Nature Singapore 59-69

[87]

ZhangJ, RenJ, CuiY, FuD, CongJ. Multi-USV task planning method based on improved deep reinforcement learning. IEEE Internet of Things Journal, 2024, 11: 18549-18567

[88]

ZhangT, TianR, YangH, WangC, SunJ, ZhangS, XieG. From simulation to reality: a learning framework for fish-like robots to perform control tasks. IEEE Trans. Robot., 2022, 38: 3861-3878

[89]

ZhuS, HanG, LinC, TaoQ. Underwater target tracking based on hierarchical software-defined multi-AUV reinforcement learning: a multi-AUV advantage-attention actor-critic approach. IEEE Trans. on Mobile Comput., 2024, 23: 13639-13653

[90]

ZhuY, DaiC, XiaJ, WuH, AnL. Adaptive Q-learning algorithm for AUV route planning. IOP Conf. Ser.: Mater. Sci. Eng., 2019, 569: 052029

RIGHTS & PERMISSIONS

Harbin Engineering University and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF

212

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/