Robust feature learning for online discriminative tracking without large-scale pre-training

Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU

PDF(838 KB)
PDF(838 KB)
Front. Comput. Sci. ›› 2018, Vol. 12 ›› Issue (6) : 1160-1172. DOI: 10.1007/s11704-017-6281-8
RESEARCH ARTICLE

Robust feature learning for online discriminative tracking without large-scale pre-training

Author information +
History +

Abstract

Owing to the inherent lack of training data in visual tracking, recent work in deep learning-based trackers has focused on learning a generic representation offline from large-scale training data and transferring the pre-trained feature representation to a tracking task. Offline pre-training is time-consuming, and the learned generic representation may be either less discriminative for tracking specific objects or overfitted to typical tracking datasets. In this paper, we propose an online discriminative tracking method based on robust feature learning without large-scale pre-training. Specifically, we first design a PCA filter bank-based convolutional neural network (CNN) architecture to learn robust features online with a few positive and negative samples in the high-dimensional feature space. Then, we use a simple softthresholding method to produce sparse features that are more robust to target appearance variations.Moreover, we increase the reliability of our tracker using edge information generated from edge box proposals during the process of visual tracking. Finally, effective visual tracking results are achieved by systematically combining the tracking information and edge box-based scores in a particle filtering framework. Extensive results on the widely used online tracking benchmark (OTB- 50) with 50 videos validate the robustness and effectiveness of the proposed tracker without large-scale pre-training.

Keywords

visual tracking / convolutional neural networks / PCA / Edge Box

Cite this article

Download citation ▾
Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU. Robust feature learning for online discriminative tracking without large-scale pre-training. Front. Comput. Sci., 2018, 12(6): 1160‒1172 https://doi.org/10.1007/s11704-017-6281-8

References

[1]
Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–577
CrossRef Google scholar
[2]
Danelljan M, Khan F S, Felsberg M, Weijer J V D. Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1090–1097
CrossRef Google scholar
[3]
Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. International Journal of Computer Vision, 2008, 77(1–3): 125–141
CrossRef Google scholar
[4]
Wang Q, Chen F, Xu W L, Yang M H. Object tracking via partial least squares analysis. IEEE Transactions on Image Processing, 2012, 21(10): 4454–4465
CrossRef Google scholar
[5]
Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
CrossRef Google scholar
[6]
Grabner H, Bischof H. On-line boosting and vision. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 260–267
CrossRef Google scholar
[7]
Hare S, Saffari A, Torr P. Struck: structured output tracking with kernels. IEEE International Conference on Computer Vision and Pattern Recognition. 2011
CrossRef Google scholar
[8]
Yao R, Shi Q F, Shen C H, Zhang Y N, Hengel A V D. Part-based visual tracking with online latent structural learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2363–2370
CrossRef Google scholar
[9]
Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(12): 2037–2041
CrossRef Google scholar
[10]
Takala V, Pietikainen M. Multi-object tracking using color, texture and motion. In: Proceedings of the IEEE Conference on Computer Vission and Pattern Recognition. 2007, 1–7
CrossRef Google scholar
[11]
Yang F, Lu H, Zhang W, Yang G. Visual tracking via bag of features. IEEE Transactions on Image Processing, 2012, 6(2): 115–128
CrossRef Google scholar
[12]
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
CrossRef Google scholar
[13]
Godec M, Roth P M, Bischof H. Hough-based tracking of non-rigid objects. Computer Vision and Image Understanding, 2011, 117(10): 1245–1256
CrossRef Google scholar
[14]
Lu Y, Wu T F, Zhu S C. Online object tracking, learning and parsing with and-or graphs. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3462–3469
CrossRef Google scholar
[15]
Grabner H, Matas J, Gool L V, Cattin P. Tracking the invisible: learning where the object might be. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2010
CrossRef Google scholar
[16]
Fan J L, Shen X H, Wu Y. Scribble tracker: a matting-based approach for robust tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1633–1634
CrossRef Google scholar
[17]
Porikli F, Tuzel O, Meer P. Covariance tracking using model update based on lie algebra. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 728–735
CrossRef Google scholar
[18]
Wu Y, Cheng J, Wang J, Lu H, Wang J, Ling H, Blasch E, Bai L. Real-time probabilistic covariance tracking with efficient model update. IEEE Transactions on Image Processing, 2012, 21(5): 2824–2837
CrossRef Google scholar
[19]
Li X, Dick A, Shen C H, Hengel A V D, Wang H Z. Incremental learning of 3D-DCT compact representations for robust visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 863–881
CrossRef Google scholar
[20]
Isard M, Blake A. CONDENSATION—conditional density propagation for visual tracking. International Journal of Computer Vision, 1998, 29(1): 5–28
CrossRef Google scholar
[21]
Wang S, Lu H, Yang F, Yang M H. Superpixel tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 1323–1330
[22]
Smeulders A W M, Chu D M, Cucchiara R, Calderara S, Dehghan A, Shah M. Visual tracking: an experimental survey. IEEE Transactions on Pattern Analysis andMachine Intelligence, 2014, 36(7): 1442–1468
[23]
Li X, Hu W, Shen C, Zhang Z, Dick A, van den Hengel A. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology, 2013, 4(4): 1–42
CrossRef Google scholar
[24]
Collins R T, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1631–1643
CrossRef Google scholar
[25]
Mei X, Ling H. Robust visual tracking using L1 minimization. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1436–1443
[26]
Bao C, Wu Y, Ling H, Ji H. Real time robust L1 tracker using accelerated proximal gradient approach. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1830–1837
[27]
Zhang K H, Zhang L, Yang M H. Real-time compressive tracking. In: Proceedings of European Conference on Compute Vision. 2012, 864–877
CrossRef Google scholar
[28]
Zhang T, Ghanem B, Liu S, Ahuja N. Low-rank sparse learning for robust visual tracking. In: Proceedings of European Conference on Compute Vision. 2012, 470–484
CrossRef Google scholar
[29]
Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1822–1829
[30]
Zhang Z, Wong K H. Pyramid-based visual tracking using sparsity represented mean transform. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1226–1233
CrossRef Google scholar
[31]
Zhong B N, Yao H X, Chen S, Ji R R, Chin T J, Wang H Z. Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition, 2014, 47(3): 1395–1410
CrossRef Google scholar
[32]
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D. Multistore tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2015, 749–758
[33]
Bai Y, Tang M. Robust tracking via weakly supervised ranking SVM. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1854–1861
[34]
Zuo W M, Wu X H, Lin L, Zhang L, Yang M H. Learning support correlation filters for visual tracking. 2016, arXiv preprint arXiv:1601.06032
[35]
Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422
CrossRef Google scholar
[36]
Babenko B, Yang M, Belongie S. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619–1632
CrossRef Google scholar
[37]
Santner J, Leistner C, Saffari A, Pock T, Bischof H. PROST: parallel robust online simple tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 723–730
CrossRef Google scholar
[38]
Gall J, Yao A, Van L, Lempitsky V. Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2188–2202
CrossRef Google scholar
[39]
Zhang L, Maaten L V D. Preserving structure in model-free tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 756–769
CrossRef Google scholar
[40]
Duffner S, Garcia C. Pixeltrack: a fast adaptive algorithm for tracking non-rigid objects. International Conference on Computer Vision. 2013, 2480–2487
CrossRef Google scholar
[41]
Cehovin L, Kristan M, Leonardis A. Robust visual tracking using an adaptive coupled-layer visual model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 941–953
CrossRef Google scholar
[42]
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
CrossRef Google scholar
[43]
Chen Z, Hong Z B, Tao D C. An experimental survey on correlation filter-based tracking. 2015, arXiv preprint arXiv:1509.05520
[44]
Liang P P, Liao C Y, Mei X, Ling H B. Adaptive objectness for object tracking. 2015, arXiv preprint arXiv:1501.00909
[45]
Cheng M M, Zhang Z M, Lin W Y, Torr P. BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293
CrossRef Google scholar
[46]
Hua Y, Alahari K, Schmid C. Online object tracking with proposal selection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3092–3100.
CrossRef Google scholar
[47]
Zhu G, Porikli F, Li H D. Tracking randomly moving objects on Edge Box proposals. 2015, arXiv preprint arXiv:1507.08085
[48]
Gan Y, Liu J, Dong J Y, Zhong G Q. A PCA-based convolutional network. 2015, arXiv preprint arXiv:1505.03703
[49]
Guo Y W, Chen Y, Tang F, Li A, Luo W T, Liu M M. Object tracking using learned feature manifolds. Computer Vision and Image Understanding, 2014, 118: 128–139
CrossRef Google scholar
[50]
Fan J L, Xu W, Wu Y, Gong Y H. Human tracking using convolutional neural networks. TEEE Transactions on Neural Networks, 2010, 21(10): 1610–1623
CrossRef Google scholar
[51]
Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In: Proceedings of Neural Information Processing Systems Conference. 2013, 809–817
[52]
Wang L, Liu T, Wang G, Chan K L, Yang Q. Video tracking using learned hierarchical features. IEEE Transactions on Image Processing, 2015, 24(4): 1424–1435
CrossRef Google scholar
[53]
Li H X, Li Y, Porikli F. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceedings of British Machine Vision Conference. 2014
CrossRef Google scholar
[54]
Wang L J, Ouyang W L, Wang X G, Lu H C. Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3119–3127
CrossRef Google scholar
[55]
Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of International Conference on Machine Learning. 2015, 597–606
[56]
Ma C, Huang J B, Yang X K, Yang M H. Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3074–3082
CrossRef Google scholar
[57]
Nam H S, Han B Y. Learning multi-domain convolutional neural networks for visual tracking. 2015, arXiv preprint arXiv:1510.07945
[58]
Elad M, Figueiredo M A, Ma Y. On the role of sparse and redundant representations in image processing. Proceedings of the IEEE, 2010, 98(6): 972–982
CrossRef Google scholar
[59]
Wu Y, Lim J W, Yang M H. Online object tracking: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2411–2418
CrossRef Google scholar
[60]
Yilmaz A, Javed O. Shah M. Object tracking: a survey. ACM Computing Surveys, 2006, 38(4): 13.
CrossRef Google scholar
[61]
Dollár P, Zitnick C T. Structured forests for fast edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2013, 1841–1848
CrossRef Google scholar
[62]
Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Compute Vision. 2014, 391–405
CrossRef Google scholar
[63]
Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Compute Vision. 2014
CrossRef Google scholar
[64]
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
CrossRef Google scholar
[65]
Gao J, Ling H, Hu W, Xing J. Transfer learning based visual tracking with gaussian processes regression. In: Proceedings of European Conference on Compute Vision. 2014
CrossRef Google scholar

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(838 KB)

Accesses

Citations

Detail

Sections
Recommended

/