PDF
Abstract
Remotely operated vehicles (ROVs) are playing indispensable roles in the ongoing exploration and utilization of ocean resources as they offer flexibility and efficiency. Deep reinforcement learning (DRL) algorithms have been widely used to enhance ROV autonomy, reduce operator workload, and minimize human errors in operations. However, traditional DRL methods rely on a well-crafted reward function specific to the task, which is often challenging to design precisely. Learning from demonstration offers an alternative way, as it enables agents to imitate expert trajectories and refine their policies without relying on reward functions. However, although most existing studies assume that detailed action or control information is available from expert demonstrations, such data are typically hard to obtain in practice. To overcome this limitation, we propose and implement an imitation learning from the observation method for ROV path tracking. In our approach, policy learning is derived solely from observed expert trajectories without the need for explicit action data. We evaluated our method on both straight-line and sinusoidal tracking tasks, and compared the results to those of proximal policy optimization (PPO), a traditional DRL algorithm, using predefined rewards. The experimental results demonstrate that our approach achieves a performance comparable to that of PPO, while offering a faster learning rate and enhanced adaptability to different tasks.
Keywords
Imitation learning
/
Deep reinforcement learning
/
Learning from observation
/
Path tracking
/
Underwater robot
Cite this article
Download citation ▾
Jun Wang, Song Xiang, Tian Shen, Zheng Fang, Shilong Niu, Xingwei Pan, Guangliang Li.
Imitation learning from observation for ROV path tracking.
Intelligent Marine Technology and Systems, 2025, 3(1): 20 DOI:10.1007/s44295-025-00069-0
| [1] |
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning. ACM, pp 1–8
|
| [2] |
Cadengue LS, Lima GS, Bessa WM (2020) Intelligent depth control of underwater robots using artificial neural networks and reinforcement learning. In: 2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE). IEEE, pp 1–5
|
| [3] |
Cai W, Tao CH, Wang Y, Wu T, Xu CH, Zhang GY et al (2023) Applications of autonomous underwater vehicle in submarine hydrothermal fields: a review. Robot 45:483–495. https://doi.org/10.13973/j.cnki.robot.220100
|
| [4] |
ChenTH, ZhangZ, FangZ, JiangD, LiGL. Imitation learning from imperfect demonstrations for AUV path tracking and obstacle avoidance. Ocean Eng, 2024, 298: 117287
|
| [5] |
ChengCX, ShaQX, HeB, LiGL. Path planning and obstacle avoidance for AUV: a review. Ocean Eng, 2021, 235: 109355
|
| [6] |
DodgeKL, KukulyaAL, BurkeE, BaumgartnerMF. TurtleCam: a “smart” autonomous underwater vehicle for investigating behaviors and habitats of sea turtles. Front Mar Sci, 2018, 5: 90
|
| [7] |
FanYX, DongHY, ZhaoXW, DenissenkoP. Path-following control of unmanned underwater vehicle based on an improved TD3 deep reinforcement learning. IEEE Trans Control Syst Technol, 2024, 32(5): 1904-1919
|
| [8] |
FangY, HuangZW, PuJY, ZhangJS. AUV position tracking and trajectory control based on fast-deployed deep reinforcement learning method. Ocean Eng, 2022, 245: 110452
|
| [9] |
FangZ, ChenTH, ShenT, JiangD, ZhangZ, LiGL. Multi-agent generative adversarial interactive self-imitation learning for AUV formation control and obstacle avoidance. IEEE Robot Autom Lett, 2025, 10(5): 4356-4363
|
| [10] |
FangZ, JiangD, HuangJ, ChengCX, ShaQX, HeB, et al.. Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning. Ocean Eng, 2022, 262: 112182
|
| [11] |
Fossen TI (2016) Handbook of marine craft hydrodynamics and motion control. IEEE Control Syst Mag 36(1):78–79
|
| [12] |
Ho J, Ermon S (2016) Generative adversarial imitation learning. In: 30th Conference on Neural Information Processing Systems (NIPS). Curran Associates, pp 1–9. https://proceedings.neurips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf
|
| [13] |
JiangD, HuangJ, FangZ, ChengCX, ShaQX, HeB, et al.. Generative adversarial interactive imitation learning for path following of autonomous underwater vehicle. Ocean Eng, 2022, 260: 111971
|
| [14] |
JuH, JuanRS, GomezR, NakamuraK, LiGL. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nat Mach Intell, 2022, 4(12): 1077-1087
|
| [15] |
Khan A (2018) Deep reinforcement learning based tracking behavior for underwater vehicles. Master thesis, NTNU
|
| [16] |
Manhães MMM, Scherer SA, Voss M, Douat LR, Rauschenbach T (2016) UUV simulator: a Gazebo-based package for underwater intervention and multi-robot simulation. In: OCEANS 2016 MTS/IEEE Monterey. IEEE, pp 1–8
|
| [17] |
Ng AY, Russell S (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp 663–670
|
| [18] |
NiuSL, PanXW, WangJ, LiGL. Deep reinforcement learning from human preferences for ROV path tracking. Ocean Eng, 2025, 317: 120036
|
| [19] |
OhremSJ, AmundsenHB, CaharijaW, HoldenC. Robust adaptive backstepping DP control of ROVs. Control Eng Practice, 2022, 127: 105282
|
| [20] |
Qu XY, Li YP, Xu GF (2020) An AUV adaptive front-tracking algorithm based on data-driven. In: Proceedings of the 11th International Conference on Modelling, Identification and Control (ICMIC2019). Springer, pp 541–554
|
| [21] |
Ratliff ND, Bagnell JA, Zinkevich MA (2006) Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, pp 729–736
|
| [22] |
ShenQH, YangQ, ZhuCF. Application of measuring underwater robot in defect detection of underwater structures. Water Resour Tech Superv, 2021, 9: 1-4
|
| [23] |
SNAME (1950) Nomenclature for treating the motion of a submerged body through a fluid. In: Technical and Research Bulletin. Society of Naval Architects and Marine Engineers, New York
|
| [24] |
Song JM, Ren HY, Sadigh D, Ermon S (2018) Multi-agent generative adversarial imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS, pp 7472-7483
|
| [25] |
Walker KL, Stokes AA, Kiprakis A, Giorgio-Serchi F (2020) Investigating PID control for station keeping ROVs. In: UKRAS20 Conference: “Robots into the real world” Proceedings. pp 51–53. https://doi.org/10.31256/Ky3Xg3B
|
| [26] |
Wulfmeier M, Wang DZ, Posner I (2016) Watch this: scalable cost-function learning for path planning in urban environments. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 2089–2095. https://doi.org/10.1109/IROS.2016.7759328
|
| [27] |
YuanJ, LiuHL, WanJH, LiH, ZhangWX. Combined depth and heading control and experiment of ROV under the influence of residual buoyancy, current disturbance, and control dead zone. J Field Robot, 2023, 40(2): 330-345
|
| [28] |
Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. AAAI, pp 1433–1438
|
Funding
Natural Science Foundation of China(51809246)
Qingdao Natural Science Foundation(23-2-1-153-zyyd-jch)
Young Taishan Scholars Program(tsqn202408072)
RIGHTS & PERMISSIONS
The Author(s)