CA-PPO: a cross-attention PPO-based task allocation algorithm for single-USV debris collection

Binjie Li; Xiaozhu Wu; Xin Chen; Shuailong Zhang

doi:10.1007/s44295-026-00102-w

Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) :13 DOI: 10.1007/s44295-026-00102-w

Research Paper

research-article

CA-PPO: a cross-attention PPO-based task allocation algorithm for single-USV debris collection

Author information +

History +

PDF

Abstract

With increasing severity of water pollution, intelligent cleaning technologies based on unmanned systems have attracted widespread attention. To address the dynamic characteristics of floating debris on water surfaces, such as random movement and easy outflow, existing reinforcement learning methods face several challenges in practical scheduling tasks, including the inadequate prediction of future states, the weak prioritization of critical targets, and the insufficient consideration of resource constraints. This paper proposes cross-attention-proximal policy optimization (CA-PPO), a task allocation algorithm for a single unmanned surface vehicle (USV), based on the proximal policy optimization (PPO) algorithm. The proposed algorithm constructed a cleaning task environment that incorporated a flow prediction mechanism for floating debris by considering the battery and load limitations of the USV. A cross-attention module enhanced the policy network’s perception of key debris targets. The experimental results demonstrated that the proposed CA-PPO method outperformed traditional heuristic approaches and other reinforcement learning algorithms in terms of debris collection rate, outflow rate, energy efficiency, and movement distance efficiency.

Keywords

Unmanned surface vehicle (USV) / Task allocation / Reinforcement learning / PPO / Cross-attention

Cite this article

Download citation ▾

Binjie Li, Xiaozhu Wu, Xin Chen, Shuailong Zhang. CA-PPO: a cross-attention PPO-based task allocation algorithm for single-USV debris collection. Intelligent Marine Technology and Systems, 2026, 4 (1) : 13 DOI:10.1007/s44295-026-00102-w

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Al-Khayat JA, Veerasingam S, Aboobacker VM, Vethamony P. Hitchhiking of encrusting organisms on floating marine debris along the west coast of Qatar, Arabian/Persian Gulf. Sci Total Environ, 2021, 776 145985

[2]	Burkard R, Dell’Amico M, Martello S. Assignment problems: revised reprint, 2012, Philadelphia. SIAM

[3]	Chang HC, Hsu YL, Hung SS, Ou GR, Wu JR, Hsu C. Autonomous water quality monitoring and water surface cleaning for unmanned surface vehicle. Sensors, 2021, 21(4): 1102

[4]	Deng TP, Xu XH, Ding ZY, Xiao X, Zhu M, Peng K. Automatic collaborative water surface coverage and cleaning strategy of UAV and USVs. Digit Commun Netw, 2025, 11(2): 365-376

[5]	Ferdosian N, Othman M, Ali BM, Lun KY. Greedy–knapsack algorithm for optimal downlink resource allocation in LTE networks. Wirel Netw, 2016, 22(5): 1427-1440

[6]	Golden BL, Raghavan S, Wasil EA. The vehicle routing problem: latest advances and new challenges, 2008, New York. Springer 43

[7]	Haldorai A, Babitha Lincy R, Suriya M, Balakrishnan M. An improved single short detection method for smart vision-based water garbage cleaning robot. Cogn Robot, 2024, 4: 19-29

[8]	Huang Q, Liu H, Zhai X (2023) DSN: a DDPG-based scheduling framework for optimal task allocation in cloud data centers. In: 2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE). IEEE, pp 324–329. https://doi.org/10.1109/CBASE60015.2023.10439110

[9]	Iqbal S, Sha F (2018) Actor-attention-critic for multi-agent reinforcement learning. In: 36th International Conference on Machine Learning: ICML 2019. PMRL, pp 2961–2970. https://proceedings.mlr.press/v97/

[10]	Kong M, Wang WZ, Deveci M, Zhang YJ, Wu XZ, Coffman D. A novel carbon reduction engineering method-based deep Q-learning algorithm for energy-efficient scheduling on a single batch-processing machine in semiconductor manufacturing. Int J Prod Res, 2024, 62(18): 6449-6472

[11]	Kong SH, Tian MJ, Qiu CL, Wu ZX, Yu JZ. IWSCR: an intelligent water surface cleaner robot for collecting floating garbage. IEEE Trans Syst Man Cybern Syst, 2021, 51(10): 6358-6368

[12]	Li B, Liang SY, Gan ZG, Chen DQ, Gao PX. Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning. Int J Bio-Inspired Comput, 2021, 18(2): 82-91

[13]	Li HF, Yang HA, Sheng ZM, Liu C, Chen YX. Multi-UAV collaborative distributed dynamic task allocation based on MAPPO. Control Decis, 2025, 40(5): 1429-1437

[14]	Liu B, Wang SL, Li QH, Zhao XY, Pan YQ, Wang CH. Task assignment of UAV swarms based on deep reinforcement learning. Drones, 2023, 7(5): 297

[15]	Liu K, Zhao YY, Wang G, Peng B. Self-attention-based multi-agent continuous control method in cooperative environments. Inf Sci, 2022, 585: 454-470

[16]	Luo ZL, Chakraborty N, Sycara K. Distributed algorithms for multirobot task assignment with task deadline constraints. IEEE Trans Autom Sci Eng, 2015, 12(3): 876-888

[17]	Lyu YC, Zhang WC, Zhang YX, Ma JH (2023) Deep reinforcement learning for unmanned aerial vehicles cluster task allocation. In: Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023). SPIE, pp 1671–1679. https://doi.org/10.1117/12.3004002

[18]	Marques N, Figueira G, Guimares L. Dynamic dispatching rule selection for the job shop scheduling problem. Comput Ind Eng, 2025, 210 111471

[19]	Morales-Caselles C, Viejo J, Martí E, González-Fernández D, Pragnell-Raasch H, González-Gordillo JI, et al. . An inshore–offshore sorting system revealed from global classification of ocean litter. Nat Sustain, 2021, 4(6): 484-493

[20]	Pan ZJ, Wang JH, Zheng X, Tian Y, Tian YN, Zhang MD, et al. . Review of research on structure and autonomous control of water surface garbage cleaning robots. Compu Eng Appl, 2024, 60(11): 17-31

[21]	Phirke S, Patel A, Jani J (2021) Design of an autonomous water cleaning bot. Mater Today-Proc 46:8742–8747. https://doi.org/10.1016/j.matpr.2021.04.044

[22]	Secrest BR (2001) Traveling salesman problem for surveillance mission using particle swarm optimization. In: Technical report, pp 1–131

[23]	Smith SL, Bullo F. Monotonic target assignment for robotic networks. IEEE Trans Automat Control, 2009, 54(9): 2042-2057

[24]	Song L, Li Y, Xu J. Dynamic job-shop scheduling based on transformer and deep reinforcement learning. Processes, 2023, 11(12): 3434

[25]	Sutton RS, Barto AG. Reinforcement learning: an introduction. IEEE Trans Neural Netw, 1998, 9(5): 1054

[26]	Tang W, Gao H, Liu SY. Design and implementation of small waters intelligent garbage cleaning robot system based on raspberry pi. Sci Technol Eng, 2019, 19(34): 239-247

[27]	Topcuoglu H, Hariri S, Wu MY. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst, 2002, 13(3): 260-274

[28]	Wang LB, Hu X, Wang Y, Xu SJ, Ma SJ, Yang KX, et al. . Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Comput Netw, 2021, 190 107969

[29]	Wang ZW, Wang B, He X, Fei Q (2023) Research on multi-agent task allocation and path planning based on Pri-MADDPG. In: 2023 China Automation Congress (CAC). IEEE, pp 6569–6574. https://doi.org/10.1109/CAC59555.2023.10452082

[30]	Xiao PF, Zhang CY, Meng LL, Hong H, Dai W. Non-permutation flow shop scheduling problem based on deep reinforcement learning. Comput Integr Manuf Syst, 2021, 27(1): 192-206

[31]	Xue S, Zhao N, Wang LQ, Zhang WD, Zhang JL, Zhu FX. Multi-agent self-attention reinforcement learning for multi-USV hunting target. Neural Netw, 2025, 189 107574

[32]	Yan SY, Lu CC, Hsieh JH, Lin HC. A network flow model for the dynamic and flexible berth allocation problem. Comput Ind Eng, 2015, 81: 65-77

[33]	Yin YF, Guo Y, Su QR, Wang Z. Task allocation of multiple unmanned aerial vehicles based on deep transfer reinforcement learning. Drones, 2022, 6(8): 215

[34]	Yin ZZ, Liu JH, Wang DP. Multi-AGV task allocation with attention based on deep reinforcement learning. Int J Pattern Recognit Artif Intell, 2022, 36(9): 2252015

[35]	Yu JJ, Chung SJ, Voulgaris PG. Target assignment in robotic networks: distance optimality guarantees and hierarchical strategies. IEEE Trans Autom Control, 2014, 60(2): 327-341

[36]	Zhang J, Ren J, Cui Y, Fu D, Cong J. Multi-USV task planning method based on improved deep reinforcement learning. IEEE Internet Things J, 2024, 11(10): 18549-18567

[37]	Zhu JN, Yang YX, Cheng YW. SMURF: a fully autonomous water surface cleaning robot with a novel coverage path planning method. J Mar Sci Eng, 2022, 10(11): 1620