Underwater single target tracking with self-prompting

Xuelin Liu; Jingjing Xiao; Xinghui Dong

doi:10.1007/s44295-025-00067-2

Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) :17 DOI: 10.1007/s44295-025-00067-2

Research Paper

Underwater single target tracking with self-prompting

Author information +

History +

PDF

Abstract

Underwater visual object tracking (UVOT) is of great importance to marine applications; however, it remains understudied within mainstream computer vision research. Although existing approaches that leverage the prompt information to enhance the performance of single object tracking approaches primarily rely on auxiliary modal data, the inherent semantic misalignment persists across modalities, with unavoidable feature redundancy and cross-modality noise. To address these issues, we propose a self-prompt single target tracking network, namely, SPTrack, on top of intrinsic image cues. The proposed network extracts global features from raw images as scene-aware prompts and is coupled with a feature-pruning mechanism to eliminate multiscale feature redundancy. Ultimately, the perception capability of the tracker in dynamic scenarios is improved. The experimental results derived from a recent underwater object tracking data set demonstrated that the proposed SPTrack achieved area under the curve (AUC) values of 0.545, with a real-time inference speed of 38.5 FPS. We also performed experiments on two open-air object tracking data sets, and a remarkable performance was also obtained. These promising results are attributed to our proposed solution for object tracking in complex underwater scenarios, which specifically addresses challenges (such as occlusion and light scattering) through scene-adaptive feature learning.

Keywords

Underwater object tracking / Single target tracking / Self-prompting

Cite this article

Download citation ▾

Xuelin Liu, Jingjing Xiao, Xinghui Dong. Underwater single target tracking with self-prompting. Intelligent Marine Technology and Systems, 2025, 3(1): 17 DOI:10.1007/s44295-025-00067-2

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Alawode B, Guo Y, Ummar M, Werghi N, Dias J, Mian A et al (2022) UTB180: a high-quality benchmark for underwater tracking. In: Lecture notes in computer science, vol 13845. Springer, Cham, pp 442–458

[2]	Alawode B, Dharejo FA, Ummar M, Guo YH, Mahmood A, Werghi N et al (2023) Improving underwater visual tracking with a large scale dataset and image enhancement. Preprint at arXiv:2308.15816

[3]	Bai YF, Zhao ZY, Gong YH, Wei X (2024) ARTrackV2: prompting autoregressive tracker where to look and how to describe. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 19048–19057

[4]	Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp 6182–6191

[5]	Chen X, Peng HW, Wang D, Lu HC, Hu H (2023) SeqTrack: sequence to sequence learning for visual object tracking. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 14572–14581

[6]	Chen X, Yan B, Zhu JW, Wang D, Yang XY, Lu HC (2021) Transformer tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 8126–8135

[7]	Cui YT, Jiang C, Wang LM, Wu GS (2022) MixFormer: end-to-end tracking with iterative mixed attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 13598–13608

[8]	Cui YT, Song TH, Wu GS, Wang LM (2024) MixFormerV2: efficient fully transformer tracking. Preprint at arXiv:2305.15896

[9]	Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4655–4664

[10]	Fan H, Lin LT, Yang F, Chu P, Deng G, Yu SJ et al (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 5369–5378

[11]	Gao SY, Zhou CL, Ma C, Wang XG, Yuan JS (2022) AiAtrack: attention in attention for transformer visual tracking. In: Lecture notes in computer science, vol 13682. Springer, Cham, pp 146–164

[12]	He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 770–778

[13]	HuXT, ZhongBN, LiangQH, ZhangSP, LiN, LiXX, et al.. Transformer tracking via frequency fusion. IEEE Trans Circuits Syst Video Technol, 2024, 34(2): 1020-1031

[14]	Li B, Wu W, Wang Q, Zhang FY, Xing JL, Yan JJ (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4277–4286

[15]	LiYF, HuoW, LiuZY, WangB, LiY. UStark: underwater image domain-adaptive tracker based on stark. J Electron Imaging, 2022, 31(5): 053012

[16]	LiYF, WangB, LiY, LiuZY, HuoW, LiYM, et al.. Underwater object tracker: UOSTrack for marine organism grasping of underwater vehicles. Ocean Eng, 2023, 285: 115449

[17]	Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F et al (2022) Transforming model prediction for tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 8721–8730

[18]	PanettaK, KezebouL, OludareV, AgaianS. Comprehensive underwater object tracking benchmark dataset and underwater image enhancement with GAN. IEEE J Ocean Eng, 2021, 47(1): 59-75

[19]	QiuHY, LiN, LiPF, HouRT, ZhangYT, PengY. Boundary attention guided sparse feature learning for underwater object tracking in edge computing. ACM Trans Multimed Comput Commun Appl, 2024

[20]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. Preprint at arXiv:1706.03762

[21]	Wang P, Yang A, Men R, Lin JY, Bai S, Li ZK et al (2022a) OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. Preprint at arXiv:2202.03052

[22]	WangQQ, DuXZ, JinDD, ZhangL. Real-time ultrasound doppler tracking and autonomous navigation of a miniature helical robot for accelerating thrombolysis in dynamic blood flow. ACS Nano, 2022, 16(1): 604-616

[23]	WangX, ChenZ, JiangB, TangJ, LuoB, TaoDC. Beyond greedy search: tracking by multi-agent reinforcement learning-based beam search. IEEE Trans Image Proc, 2022, 31: 6239-6254

[24]	Wang X, Shu XJ, Zhang ZP, Jiang B, Wang YW, Tian YH et al (2021) Towards more flexible and accurate object tracking with natural language: algorithms and benchmark. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 13758–13768

[25]	Xiao JJ, Lan Q, Qiao LB, Leonardis A (2016) Semantic tracking: single-target tracking with inter-supervised convolutional networks. Preprint at arXiv:1611.06395

[26]	Xie JX, Zhong BN, Mo ZY, Zhang SP, Shi LT, Song SX et al (2024) Autoregressive queries for adaptive tracking with spatio-temporal transformers. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 19300–19309

[27]	Yang DW, He JF, Ma YC, Yu QJ, Zhang TZ (2023) Foreground-background distribution modeling transformer for visual object tracking. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp 10083–10093

[28]	Ye BT, Chang H, Ma BP, Shan SG, Chen XL (2022) Joint feature learning and relation modeling for tracking: a one-stream framework. In: Lecture notes in computer science, vol 13682. Springer, Cham, pp 341–357

[29]	YuY, LiYZ, SunX, DongJY. MPT: a large-scale multiphytoplankton tracking benchmark. Intell Mar Technol Syst, 2024, 2: 35

[30]	Zhang CH, Liu L, Huang GJ, Wen H, Zhou X, Wang YF (2024) Towards underwater camouflaged object tracking: an experimental evaluation of SAM and SAM 2. Preprint at arXiv:2409.16902v1

[31]	ZhaoHJ, WangX, WangD, LuHC, RuanX. Transformer vision-language tracking via proxy token guided cross-modal fusion. Pattern Recognit Lett, 2023, 168: 10-16

[32]	ZhengYZ, ZhongBN, LiangQH, LiGR, JiRR, LiXX. Towards unified token learning for vision-language tracking. IEEE Trans Circuits Syst Video Technol, 2023, 34(4): 2125-2135

[33]	Zhou L, Zhou ZK, Mao KG, He ZY (2023) Joint visual grounding and tracking with natural language specification. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 23151–23160

[34]	Zhou XY, Guo PX, Hong LY, Li JL, Zhang W, Ge WF et al (2024) Reading relevant feature from global representation memory for visual object tracking. Preprint at arXiv:2402.14392

[35]	ZhuYB, LiCL, WangX, TangJ, HuangZX. RGBT tracking via progressive fusion transformer with dynamically guided learning. IEEE Trans Circuits Syst Video Technol, 2024, 34(9): 8722-8735