PDF
Abstract
Efficient Convolution Operator (ECO) algorithms have achieved impressive performances in visual tracking. However, its feature extraction network of ECO is unconducive for capturing the correlation features of occluded and blurred targets between long-range complex scene frames. More so, its fixed weight fusion strategy does not use the complementary properties of deep and shallow features. In this paper, we propose a new target tracking method, namely ECO++, using deep feature adaptive fusion in a complex scene, in the following two aspects: First, we constructed a new temporal convolution mode and used it to replace the underlying convolution layer in Conformer network to obtain an improved Conformer network. Second, we adaptively fuse the deep features, which output through the improved Conformer network, by combining the Peak to Sidelobe Ratio (PSR), frame smoothness scores and adaptive adjustment weight. Extensive experiments on the OTB-2013, OTB-2015, UAV123, and VOT2019 benchmarks demonstrate that the proposed approach outperforms the state-of-the-art algorithms in tracking accuracy and robustness in complex scenes with occluded, blurred, and fast-moving targets.
Keywords
Deep features
/
Adaptive feature fusion
/
Correlation filtering
/
Target tracking
/
Data augmentation
Cite this article
Download citation ▾
Yuhan Liu, He Yan, Qilie Liu, Wei Zhang, Junbin Huang.
ECO++: Adaptive deep feature fusion target tracking method in complex scene.
, 2024, 10(5): 1352-1364 DOI:10.1016/j.dcan.2022.10.020
| [1] |
Y. Qi, L. Qin, J. Zhang, S. Zhang, Q. Huang, M.-H. Yang, Structure-aware local sparse coding for visual tracking, IEEE Trans. Image Process. 27 (8) (2018) 3857-3869.
|
| [2] |
M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, Atom: accurate tracking by overlap maximization,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4660-4669.
|
| [3] |
Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, W. Hu, Distractor-aware siamese networks for visual object tracking, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 101-117.
|
| [4] |
N. Wang, W. Zhou, Y. Song, C. Ma, W. Liu, H. Li, Unsupervised deep representation learning for real-time tracking, Int. J. Comput. Vis. 129 (2) (2021) 400-418.
|
| [5] |
H. Nam, B. Han,Learning multi-domain convolutional neural networks for visual tracking, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4293-4302.
|
| [6] |
Y. Sun, C. Sun, D. Wang, Y. He, H. Lu,Roi pooled correlation filters for visual tracking, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5783-5791.
|
| [7] |
S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, M. Felsberg, J. Matas,Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook, arXiv preprint arXiv:2112.02838.
|
| [8] |
L. Tian, P. Huang, Z. Lin, T. Lv, Dcfnetþþ: more advanced correlation filters network for real-time object tracking, IEEE Sensor. J. 21 (10) (2021) 11329-11338.
|
| [9] |
S.M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, S. Kasaei, Deep learning for visual tracking: a comprehensive survey, IEEE Trans. Intell. Transport. Syst. 23 (5)(2022) 3943-3968.
|
| [10] |
K. Dai, D. Wang, H. Lu, C. Sun, J. Li,Visual tracking via adaptive spatially-regularized correlation filters, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4670-4679.
|
| [11] |
Y. Li, C. Fu, F. Ding, Z. Huang, G. Lu, Autotrack: towards high-performance visual tracking for uav with automatic spatio-temporal regularization,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11923-11932.
|
| [12] |
M. Danelljan, A. Robinson, F. Shahbaz Khan, M. Felsberg, Beyond correlation filters: learning continuous convolution operators for visual tracking,in: European Conference on Computer Vision, Springer, 2016, pp. 472-488.
|
| [13] |
N. Wang, W. Zhou, Q. Tian, R. Hong, M. Wang, H. Li,Multi-cue correlation filters for robust visual tracking, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4844-4853.
|
| [14] |
M. Danelljan, G. Bhat, F. Shahbaz Khan, M. Felsberg, Eco: efficient convolution operators for tracking,in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6638-6646.
|
| [15] |
Z. Peng, W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao, Q. Ye, Conformer: local features coupling global representations for visual recognition,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 367-376.
|
| [16] |
B. Yan, H. Peng, J. Fu, D. Wang, H. Lu,Learning spatio-temporal transformer for visual tracking, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10448-10457.
|
| [17] |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x 16 words: Transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929.
|
| [18] |
P. Chen, S. Liu, H. Zhao, J. Jia, Gridmask Data Augmentation, 2020. https://doi.org/10.48550/arXiv.2001.04086.
|
| [19] |
A.W. Smeulders, D.M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, M. Shah, Visual tracking: an experimental survey, IEEE Trans. Pattern Anal. Mach. Intell. 36 (7)(2013) 1442-1468.
|
| [20] |
D.S. Bolme, J.R. Beveridge, B.A. Draper, Y.M. Lui, Visual object tracking using adaptive correlation filters, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 2544-2550.
|
| [21] |
J. Shao, B. Du, C. Wu, L. Zhang, Can we track targets from space? a hybrid kernel correlation filter tracker for satellite video, IEEE Trans. Geosci. Rem. Sens. 57 (11)(2019) 8719-8731.
|
| [22] |
M. Danelljan, G. H€ager, F. Khan, M. Felsberg, Accurate scale estimation for robust visual tracking, Nottingham, in: British Machine Vision Conference, September vols. 1-5 Bmva Press, 2014, 2014.
|
| [23] |
J. Van De Weijer, C. Schmid, J. Verbeek, D. Larlus, Learning color names for real-world applications, IEEE Trans. Image Process. 18 (7) (2009) 1512-1523.
|
| [24] |
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition ( CVPR’05), vol. 1, Ieee, 2005, pp. 886-893.
|
| [25] |
C. Sun, D. Wang, H. Lu, M.-H. Yang,Learning spatial-aware regressions for visual tracking, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8962-8970.
|
| [26] |
T. Xu, Z.-H. Feng, X.-J. Wu, J. Kittler, Joint group feature selection and discriminative filter learning for robust visual object tracking,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7950-7960.
|
| [27] |
Y. Sui, Z. Zhang, G. Wang, Y. Tang, L. Zhang, Real-time visual tracking: promoting the robustness of correlation filter learning,in: European Conference on Computer Vision, Springer, 2016, pp. 662-678.
|
| [28] |
M. Danelljan, G. Hager, F. Shahbaz Khan, M. Felsberg,Learning spatially regularized correlation filters for visual tracking, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4310-4318.
|
| [29] |
W. Liu, X. Tang, C. Zhao, Robust rgbd tracking via weighted convolution operators, IEEE Sensor. J. 20 (8) (2020) 4496-4503.
|
| [30] |
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Ieee, 2009, pp. 248-255.
|
| [31] |
K. Simonyan,A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv:1409.1556.
|
| [32] |
S.R. Abdani, M.A. Zulkifley, A. Hussain, Compact convolutional neural networks for pterygium classification using transfer learning, in: 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), IEEE, 2019, pp. 140-143.
|
| [33] |
K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: European Conference on Computer Vision, Springer, 2016, pp. 630-645.
|
| [34] |
P. Wang, H. Fu, X. Li, J. Guo, R. Di, Multi-feature fusion tracking algorithm based on generative compression network, Future Generat. Comput. Syst. 124 (32) (2021) 206-214.
|
| [35] |
Z. Zhang, H. Peng,Deeper and wider siamese networks for real-time visual tracking, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4591-4600.
|
| [36] |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: hierarchical vision transformer using shifted windows,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012-10022.
|
| [37] |
B. Kim, J. Lee, J. Kang, E.-S. Kim, H.J. Kim, Hotr: end-to-end human-object interaction detection with transformers,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR), 2021, pp. 74-83.
|
| [38] |
A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, A. Vaswani, Bottleneck transformers for visual recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR), 2021, pp. 16519-16529.
|
| [39] |
Z. Huang, S. Zhang, L. Pan, Z. Qing, M. Tang, Z. Liu, M.H. Ang Jr., Tada! temporally-adaptive convolutions for video understanding, in: ICLR, 2022.
|
| [40] |
A. Dabouei, S. Soleymani, F. Taherkhani, N.M. Nasrabadi, Supermix: supervising the mixing data augmentation,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR), 2021, pp. 13794-13803.
|
| [41] |
M. Afifi, M.S. Brown, What else can fool deep learning? addressing color constancy errors on deep neural network performance, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 243-252.
|
| [42] |
E. Harris, A. Marcu, M. Painter, M. Niranjan, A. Prügel-Bennett, J. Hare,Fmix: Enhancing Mixed Sample Data Augmentation, arXiv preprint arXiv:2002.12047.
|
| [43] |
Z. Zhong, L. Zheng, G. Kang, S. Li, Y. Yang,Random erasing data augmentation, in:Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 13001-13008.
|
| [44] |
A. Lukezic, T. Vojir, L. Cehovin Zajc, J. Matas, M. Kristan,Discriminative correlation filter with channel and spatial reliability, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6309-6318.
|
| [45] |
Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R.W. Lau, M.-H. Yang, Vital: visual tracking via adversarial learning,in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8990-8999.
|
| [46] |
C. Ma, J.-B. Huang, X. Yang, M.-H. Yang,Hierarchical convolutional features for visual tracking, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3074-3082.
|
| [47] |
Y. Qi, S. Zhang, L. Qin, Q. Huang, H. Yao, J. Lim, M.-H. Yang, Hedging deep features for visual tracking, IEEE Trans. Pattern Anal. Mach. Intell. 41 (5) (2019) 1116-1130.
|
| [48] |
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamrpnþþEvolution of siamese visual tracking with very deep networks, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4282-4291.
|
| [49] |
S. Zhao, T. Xu, X.J. Wu, X.F. Zhu, Adaptive feature fusion for visual object tracking, Pattern Recogn. 111 (9) (2021) 107679.
|
| [50] |
T. DeVries,G. W. Taylor, Improved Regularization of Convolutional Neural Networks with Cutout, arXiv preprint arXiv:1708.04552.
|
| [51] |
I. Loshchilov, F. Hutter,Decoupled weight decay regularization, in:International Conference on Learning Representations, 2019. URL, https://openreview.net/forum?id=Bkg6RiCqY7.
|
| [52] |
G. Ghiasi, T.-Y. Lin, Q.V. Le, Dropblock: A regularization method for convolutional networks, Adv. Neural Inf. Process. Syst. 31 (2018) 10750-10760.
|
| [53] |
Y. Wu, J. Lim, M.-H. Yang, Online object tracking: a benchmark,in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2411-2418.
|
| [54] |
Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) (2015) 1834-1848.
|
| [55] |
M. Mueller, N. Smith, B. Ghanem, A benchmark and simulator for uav tracking, in: European Conference on Computer Vision, Springer, 2016, pp. 445-461.
|
| [56] |
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J.-K. Kamarainen, L. Cehovin Zajc, O. Drbohlav, A. Lukezic, A. Berg, et al., The seventh visual object tracking vot 2019 challenge results,in:Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, IEEE, 2019, pp. 2206-2241.
|
| [57] |
J. Hu, L. Shen, G. Sun,Squeeze-and-excitation networks, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141.
|
| [58] |
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, Eca-net: efficient channel attention for deep convolutional neural networks,in:IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
|
| [59] |
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al., Searching for mobilenetv3,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314-1324.
|
| [60] |
J. Zheng, C. Ma, H. Peng, X. Yang,Learning to track objects from unlabeled videos, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13546-13555.
|
| [61] |
Z. Zhang, H. Peng, J. Fu, B. Li, W. Hu, Ocean: object-aware anchor-free tracking,in: European Conference on Computer Vision, Springer, 2020, pp. 771-787.
|
| [62] |
S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.-M. Cheng, S.L. Hicks, P.H. Torr, Struck: structured output tracking with kernels, IEEE Trans. Pattern Anal. Mach. Intell. 38 (10) (2015) 2096-2109.
|
| [63] |
Q. Guo, R. Han, W. Feng, Z. Chen, L. Wan, Selective spatial regularization by reinforcement learned decision making for object tracking, IEEE Trans. Image Process. 29 (2019) 2999-3013.
|
| [64] |
T. Yang, A.B. Chan, Visual tracking via dynamic memory networks, IEEE Trans. Pattern Anal. Mach. Intell. 43 (1) (2019) 360-374.
|
| [65] |
L. Huang, X. Zhao, K. Huang, Bridging the gap between detection and tracking: a unified approach,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3999-4009.
|
| [66] |
T. Yang, P. Xu, R. Hu, H. Chai, A.B. Chan, Roam: recurrently optimizing tracking model,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6718-6727.
|