CA-PPO: a cross-attention PPO-based task allocation algorithm for single-USV debris collection
Binjie Li , Xiaozhu Wu , Xin Chen , Shuailong Zhang
Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) : 13
With increasing severity of water pollution, intelligent cleaning technologies based on unmanned systems have attracted widespread attention. To address the dynamic characteristics of floating debris on water surfaces, such as random movement and easy outflow, existing reinforcement learning methods face several challenges in practical scheduling tasks, including the inadequate prediction of future states, the weak prioritization of critical targets, and the insufficient consideration of resource constraints. This paper proposes cross-attention-proximal policy optimization (CA-PPO), a task allocation algorithm for a single unmanned surface vehicle (USV), based on the proximal policy optimization (PPO) algorithm. The proposed algorithm constructed a cleaning task environment that incorporated a flow prediction mechanism for floating debris by considering the battery and load limitations of the USV. A cross-attention module enhanced the policy network’s perception of key debris targets. The experimental results demonstrated that the proposed CA-PPO method outperformed traditional heuristic approaches and other reinforcement learning algorithms in terms of debris collection rate, outflow rate, energy efficiency, and movement distance efficiency.
Unmanned surface vehicle (USV) / Task allocation / Reinforcement learning / PPO / Cross-attention
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
Huang Q, Liu H, Zhai X (2023) DSN: a DDPG-based scheduling framework for optimal task allocation in cloud data centers. In: 2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE). IEEE, pp 324–329. https://doi.org/10.1109/CBASE60015.2023.10439110 |
| [9] |
Iqbal S, Sha F (2018) Actor-attention-critic for multi-agent reinforcement learning. In: 36th International Conference on Machine Learning: ICML 2019. PMRL, pp 2961–2970. https://proceedings.mlr.press/v97/ |
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
Lyu YC, Zhang WC, Zhang YX, Ma JH (2023) Deep reinforcement learning for unmanned aerial vehicles cluster task allocation. In: Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023). SPIE, pp 1671–1679. https://doi.org/10.1117/12.3004002 |
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Phirke S, Patel A, Jani J (2021) Design of an autonomous water cleaning bot. Mater Today-Proc 46:8742–8747. https://doi.org/10.1016/j.matpr.2021.04.044 |
| [22] |
Secrest BR (2001) Traveling salesman problem for surveillance mission using particle swarm optimization. In: Technical report, pp 1–131 |
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
Wang ZW, Wang B, He X, Fei Q (2023) Research on multi-agent task allocation and path planning based on Pri-MADDPG. In: 2023 China Automation Congress (CAC). IEEE, pp 6569–6574. https://doi.org/10.1109/CAC59555.2023.10452082 |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
The Author(s)
/
| 〈 |
|
〉 |