Underwater single target tracking with self-prompting
Xuelin Liu , Jingjing Xiao , Xinghui Dong
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 17
Underwater single target tracking with self-prompting
Underwater visual object tracking (UVOT) is of great importance to marine applications; however, it remains understudied within mainstream computer vision research. Although existing approaches that leverage the prompt information to enhance the performance of single object tracking approaches primarily rely on auxiliary modal data, the inherent semantic misalignment persists across modalities, with unavoidable feature redundancy and cross-modality noise. To address these issues, we propose a self-prompt single target tracking network, namely, SPTrack, on top of intrinsic image cues. The proposed network extracts global features from raw images as scene-aware prompts and is coupled with a feature-pruning mechanism to eliminate multiscale feature redundancy. Ultimately, the perception capability of the tracker in dynamic scenarios is improved. The experimental results derived from a recent underwater object tracking data set demonstrated that the proposed SPTrack achieved area under the curve (AUC) values of 0.545, with a real-time inference speed of 38.5 FPS. We also performed experiments on two open-air object tracking data sets, and a remarkable performance was also obtained. These promising results are attributed to our proposed solution for object tracking in complex underwater scenarios, which specifically addresses challenges (such as occlusion and light scattering) through scene-adaptive feature learning.
Underwater object tracking / Single target tracking / Self-prompting
| [1] |
Alawode B, Guo Y, Ummar M, Werghi N, Dias J, Mian A et al (2022) UTB180: a high-quality benchmark for underwater tracking. In: Lecture notes in computer science, vol 13845. Springer, Cham, pp 442–458 |
| [2] |
Alawode B, Dharejo FA, Ummar M, Guo YH, Mahmood A, Werghi N et al (2023) Improving underwater visual tracking with a large scale dataset and image enhancement. Preprint at arXiv:2308.15816 |
| [3] |
Bai YF, Zhao ZY, Gong YH, Wei X (2024) ARTrackV2: prompting autoregressive tracker where to look and how to describe. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 19048–19057 |
| [4] |
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp 6182–6191 |
| [5] |
Chen X, Peng HW, Wang D, Lu HC, Hu H (2023) SeqTrack: sequence to sequence learning for visual object tracking. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 14572–14581 |
| [6] |
Chen X, Yan B, Zhu JW, Wang D, Yang XY, Lu HC (2021) Transformer tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 8126–8135 |
| [7] |
Cui YT, Jiang C, Wang LM, Wu GS (2022) MixFormer: end-to-end tracking with iterative mixed attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 13598–13608 |
| [8] |
Cui YT, Song TH, Wu GS, Wang LM (2024) MixFormerV2: efficient fully transformer tracking. Preprint at arXiv:2305.15896 |
| [9] |
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4655–4664 |
| [10] |
Fan H, Lin LT, Yang F, Chu P, Deng G, Yu SJ et al (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 5369–5378 |
| [11] |
Gao SY, Zhou CL, Ma C, Wang XG, Yuan JS (2022) AiAtrack: attention in attention for transformer visual tracking. In: Lecture notes in computer science, vol 13682. Springer, Cham, pp 146–164 |
| [12] |
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 770–778 |
| [13] |
|
| [14] |
Li B, Wu W, Wang Q, Zhang FY, Xing JL, Yan JJ (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4277–4286 |
| [15] |
|
| [16] |
|
| [17] |
Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F et al (2022) Transforming model prediction for tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 8721–8730 |
| [18] |
|
| [19] |
|
| [20] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. Preprint at arXiv:1706.03762 |
| [21] |
Wang P, Yang A, Men R, Lin JY, Bai S, Li ZK et al (2022a) OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. Preprint at arXiv:2202.03052 |
| [22] |
|
| [23] |
|
| [24] |
Wang X, Shu XJ, Zhang ZP, Jiang B, Wang YW, Tian YH et al (2021) Towards more flexible and accurate object tracking with natural language: algorithms and benchmark. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 13758–13768 |
| [25] |
Xiao JJ, Lan Q, Qiao LB, Leonardis A (2016) Semantic tracking: single-target tracking with inter-supervised convolutional networks. Preprint at arXiv:1611.06395 |
| [26] |
Xie JX, Zhong BN, Mo ZY, Zhang SP, Shi LT, Song SX et al (2024) Autoregressive queries for adaptive tracking with spatio-temporal transformers. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 19300–19309 |
| [27] |
Yang DW, He JF, Ma YC, Yu QJ, Zhang TZ (2023) Foreground-background distribution modeling transformer for visual object tracking. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp 10083–10093 |
| [28] |
Ye BT, Chang H, Ma BP, Shan SG, Chen XL (2022) Joint feature learning and relation modeling for tracking: a one-stream framework. In: Lecture notes in computer science, vol 13682. Springer, Cham, pp 341–357 |
| [29] |
|
| [30] |
Zhang CH, Liu L, Huang GJ, Wen H, Zhou X, Wang YF (2024) Towards underwater camouflaged object tracking: an experimental evaluation of SAM and SAM 2. Preprint at arXiv:2409.16902v1 |
| [31] |
|
| [32] |
|
| [33] |
Zhou L, Zhou ZK, Mao KG, He ZY (2023) Joint visual grounding and tracking with natural language specification. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 23151–23160 |
| [34] |
Zhou XY, Guo PX, Hong LY, Li JL, Zhang W, Ge WF et al (2024) Reading relevant feature from global representation memory for visual object tracking. Preprint at arXiv:2402.14392 |
| [35] |
|
The Author(s)
/
| 〈 |
|
〉 |