Transfer Learning for Deep Reinforcement Learning-Based Path Following of Autonomous Surface Vessels

Aniket Malviya; Suresh Rajendran; Xueqian Zhou

doi:10.1007/s11804-026-00820-x

Journal of Marine Science and Application ›› 2026, Vol. 25 ›› Issue (3) :728 -744. DOI: 10.1007/s11804-026-00820-x

Research Article

research-article

Transfer Learning for Deep Reinforcement Learning-Based Path Following of Autonomous Surface Vessels

Author information +

History +

PDF

Abstract

Deep Reinforcement Learning (DRL) offers a powerful, model-free, and data-driven approach for the navigation and control of Autonomous Surface Vessels (ASVs). The primary challenge, however, lies in the extensive training required for an agent to converge to an effective policy within a complex simulation, leading to significant computational overhead. This paper presents a multi-stage training framework that uses Transfer Learning to pass knowledge between different simulation models, resulting in a highly robust DRL controller for ASVs. The proposed framework utilizes the Deep Deterministic Policy Gradient (DDPG) algorithm to develop the data-driven controller. First, a foundational policy is efficiently learned using a simplified first-order Nomoto dynamics and second-order Nomoto dynamics, which captures the fundamental vessel dynamics. This pre-trained policy is then transferred to a complex, nonlinear Manoeuvring Modelling Group (MMG) model, significantly accelerating training convergence. Subsequently, the agent is fine-tuned within the MMG simulation with environmental disturbances. The models are evaluated on various trajectories during testing to ensure robust performance. The accuracy of the DRL controller is assessed by measuring heading error (e_ψ) and cross-track error (y_e). A traditional Proportional-Integral-Derivative (PID) controller is implemented and compared to benchmark the DRL controller’s effectiveness, to highlight the relative advantages and limitations of each approach.

Keywords

Deep reinforcement learning (DRL) / Autonomous surface vessels (ASVs) / Deep deterministic policy gradient (DDPG) / Transfer learning / Proportional-integral-derivative (PID) controller / Line of sight (LOS) guidance algorithm

Cite this article

Download citation ▾

Aniket Malviya, Suresh Rajendran, Xueqian Zhou. Transfer Learning for Deep Reinforcement Learning-Based Path Following of Autonomous Surface Vessels. Journal of Marine Science and Application, 2026, 25 (3) : 728-744 DOI:10.1007/s11804-026-00820-x

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Busoniu L, De Bruin T, Tolić D, Kober J, Palunko I. Reinforcement learning for control: Performance, stability, and deep approximators. Annual Reviews in Control, 2018, 46: 8-28.

[2]	Katebi MR, Moradi MH. Predictive PID controllers. IEE Proceedings-Control Theory and Applications, 2001, 148(6): 478-487.

[3]	Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL. Optimal and autonomous control using reinforcement learning: A survey. IEEE transactions on neural networks and learning systems, 2017, 29(6): 2042-2062.

[4]	Lekkas AM, Fossen TI. Line-of-sight guidance for path following of marine vehicles. Advanced in marine robotics, 2013, 5: 63-92

[5]	Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971. https://doi.org/10.48550/arXiv.1509.02971

[6]	Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602. https://doi.org/10.48550/arXiv.1312.5602

[7]	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.

[8]	Pan SJ, Yang Q. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 2009, 22(10): 1345-1359.

[9]	Paramesh S, Rajendran S. A unified seakeeping and manoeuvring model with a PID controller for path following of a KVLCC2 tanker in regular waves. Applied Ocean Research, 2021, 116: 102860.

[10]	Perera LP, Ferrari V, Santos FP, Hinostroza MA, Soares CG. Experimental evaluations on ship autonomous navigation and collision avoidance by intelligent guidance. IEEE Journal of Oceanic Engineering, 2014, 40(2): 374-387.

[11]	Puterman ML. Markov decision processes: discrete stochastic dynamic programming, 2014

[12]	Salvesen N. Second-Order Steady-State Forces and Moments on Surface Ships in Oblique Regular Waves, 1974. Bethesda, Maryland, David Taylor Naval Ship Research and Development Center

[13]	Sandeepkumar R, Rajendran S, Mohan R, Pascoal, Antonio. A unified ship manoeuvring model with a nonlinear model predictive controller for path following in regular waves. Ocean Engineering, 2021243

[14]	Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. International conference on machine learning, 2014387-395

[15]	Sivaraj S, Dubey A, Rajendran S. On the performance of different deep reinforcement learning based controllers for the path-following of a ship. Ocean Engineering, 2023, 286: 115607.

[16]	Sutton RS, Barto AG. Reinforcement learning: An introduction second edition. Adaptive computation and machine learning, 2018. Cambridge MA and London, The MIT Press

[17]	Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research: 10(7). https://doi.org/10.48550/arXiv.2009.07888

[18]	Tomera M. Fuzzy self-tuning PID controller for a ship autopilot. Marine Navigation, 201793-103.

[19]	Yasukawa H, Yoshimura Y. Introduction of MMG standard method for ship maneuvering predictions. Journal of marine science and technology, 2015, 20(1): 37-52.