Robust feature learning for online discriminative tracking without large-scale pre-training
Jun ZHANG, Bineng ZHONG, Pengfei WANG, Cheng WANG, Jixiang DU
Robust feature learning for online discriminative tracking without large-scale pre-training
Owing to the inherent lack of training data in visual tracking, recent work in deep learning-based trackers has focused on learning a generic representation offline from large-scale training data and transferring the pre-trained feature representation to a tracking task. Offline pre-training is time-consuming, and the learned generic representation may be either less discriminative for tracking specific objects or overfitted to typical tracking datasets. In this paper, we propose an online discriminative tracking method based on robust feature learning without large-scale pre-training. Specifically, we first design a PCA filter bank-based convolutional neural network (CNN) architecture to learn robust features online with a few positive and negative samples in the high-dimensional feature space. Then, we use a simple softthresholding method to produce sparse features that are more robust to target appearance variations.Moreover, we increase the reliability of our tracker using edge information generated from edge box proposals during the process of visual tracking. Finally, effective visual tracking results are achieved by systematically combining the tracking information and edge box-based scores in a particle filtering framework. Extensive results on the widely used online tracking benchmark (OTB- 50) with 50 videos validate the robustness and effectiveness of the proposed tracker without large-scale pre-training.
visual tracking / convolutional neural networks / PCA / Edge Box
[1] |
Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–577
CrossRef
Google scholar
|
[2] |
Danelljan M, Khan F S, Felsberg M, Weijer J V D. Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1090–1097
CrossRef
Google scholar
|
[3] |
Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. International Journal of Computer Vision, 2008, 77(1–3): 125–141
CrossRef
Google scholar
|
[4] |
Wang Q, Chen F, Xu W L, Yang M H. Object tracking via partial least squares analysis. IEEE Transactions on Image Processing, 2012, 21(10): 4454–4465
CrossRef
Google scholar
|
[5] |
Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
CrossRef
Google scholar
|
[6] |
Grabner H, Bischof H. On-line boosting and vision. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 260–267
CrossRef
Google scholar
|
[7] |
Hare S, Saffari A, Torr P. Struck: structured output tracking with kernels. IEEE International Conference on Computer Vision and Pattern Recognition. 2011
CrossRef
Google scholar
|
[8] |
Yao R, Shi Q F, Shen C H, Zhang Y N, Hengel A V D. Part-based visual tracking with online latent structural learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2363–2370
CrossRef
Google scholar
|
[9] |
Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(12): 2037–2041
CrossRef
Google scholar
|
[10] |
Takala V, Pietikainen M. Multi-object tracking using color, texture and motion. In: Proceedings of the IEEE Conference on Computer Vission and Pattern Recognition. 2007, 1–7
CrossRef
Google scholar
|
[11] |
Yang F, Lu H, Zhang W, Yang G. Visual tracking via bag of features. IEEE Transactions on Image Processing, 2012, 6(2): 115–128
CrossRef
Google scholar
|
[12] |
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
CrossRef
Google scholar
|
[13] |
Godec M, Roth P M, Bischof H. Hough-based tracking of non-rigid objects. Computer Vision and Image Understanding, 2011, 117(10): 1245–1256
CrossRef
Google scholar
|
[14] |
Lu Y, Wu T F, Zhu S C. Online object tracking, learning and parsing with and-or graphs. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3462–3469
CrossRef
Google scholar
|
[15] |
Grabner H, Matas J, Gool L V, Cattin P. Tracking the invisible: learning where the object might be. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2010
CrossRef
Google scholar
|
[16] |
Fan J L, Shen X H, Wu Y. Scribble tracker: a matting-based approach for robust tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1633–1634
CrossRef
Google scholar
|
[17] |
Porikli F, Tuzel O, Meer P. Covariance tracking using model update based on lie algebra. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 728–735
CrossRef
Google scholar
|
[18] |
Wu Y, Cheng J, Wang J, Lu H, Wang J, Ling H, Blasch E, Bai L. Real-time probabilistic covariance tracking with efficient model update. IEEE Transactions on Image Processing, 2012, 21(5): 2824–2837
CrossRef
Google scholar
|
[19] |
Li X, Dick A, Shen C H, Hengel A V D, Wang H Z. Incremental learning of 3D-DCT compact representations for robust visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 863–881
CrossRef
Google scholar
|
[20] |
Isard M, Blake A. CONDENSATION—conditional density propagation for visual tracking. International Journal of Computer Vision, 1998, 29(1): 5–28
CrossRef
Google scholar
|
[21] |
Wang S, Lu H, Yang F, Yang M H. Superpixel tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 1323–1330
|
[22] |
Smeulders A W M, Chu D M, Cucchiara R, Calderara S, Dehghan A, Shah M. Visual tracking: an experimental survey. IEEE Transactions on Pattern Analysis andMachine Intelligence, 2014, 36(7): 1442–1468
|
[23] |
Li X, Hu W, Shen C, Zhang Z, Dick A, van den Hengel A. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology, 2013, 4(4): 1–42
CrossRef
Google scholar
|
[24] |
Collins R T, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1631–1643
CrossRef
Google scholar
|
[25] |
Mei X, Ling H. Robust visual tracking using L1 minimization. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1436–1443
|
[26] |
Bao C, Wu Y, Ling H, Ji H. Real time robust L1 tracker using accelerated proximal gradient approach. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1830–1837
|
[27] |
Zhang K H, Zhang L, Yang M H. Real-time compressive tracking. In: Proceedings of European Conference on Compute Vision. 2012, 864–877
CrossRef
Google scholar
|
[28] |
Zhang T, Ghanem B, Liu S, Ahuja N. Low-rank sparse learning for robust visual tracking. In: Proceedings of European Conference on Compute Vision. 2012, 470–484
CrossRef
Google scholar
|
[29] |
Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1822–1829
|
[30] |
Zhang Z, Wong K H. Pyramid-based visual tracking using sparsity represented mean transform. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1226–1233
CrossRef
Google scholar
|
[31] |
Zhong B N, Yao H X, Chen S, Ji R R, Chin T J, Wang H Z. Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition, 2014, 47(3): 1395–1410
CrossRef
Google scholar
|
[32] |
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D. Multistore tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2015, 749–758
|
[33] |
Bai Y, Tang M. Robust tracking via weakly supervised ranking SVM. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1854–1861
|
[34] |
Zuo W M, Wu X H, Lin L, Zhang L, Yang M H. Learning support correlation filters for visual tracking. 2016, arXiv preprint arXiv:1601.06032
|
[35] |
Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422
CrossRef
Google scholar
|
[36] |
Babenko B, Yang M, Belongie S. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619–1632
CrossRef
Google scholar
|
[37] |
Santner J, Leistner C, Saffari A, Pock T, Bischof H. PROST: parallel robust online simple tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 723–730
CrossRef
Google scholar
|
[38] |
Gall J, Yao A, Van L, Lempitsky V. Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2188–2202
CrossRef
Google scholar
|
[39] |
Zhang L, Maaten L V D. Preserving structure in model-free tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 756–769
CrossRef
Google scholar
|
[40] |
Duffner S, Garcia C. Pixeltrack: a fast adaptive algorithm for tracking non-rigid objects. International Conference on Computer Vision. 2013, 2480–2487
CrossRef
Google scholar
|
[41] |
Cehovin L, Kristan M, Leonardis A. Robust visual tracking using an adaptive coupled-layer visual model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 941–953
CrossRef
Google scholar
|
[42] |
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
CrossRef
Google scholar
|
[43] |
Chen Z, Hong Z B, Tao D C. An experimental survey on correlation filter-based tracking. 2015, arXiv preprint arXiv:1509.05520
|
[44] |
Liang P P, Liao C Y, Mei X, Ling H B. Adaptive objectness for object tracking. 2015, arXiv preprint arXiv:1501.00909
|
[45] |
Cheng M M, Zhang Z M, Lin W Y, Torr P. BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293
CrossRef
Google scholar
|
[46] |
Hua Y, Alahari K, Schmid C. Online object tracking with proposal selection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3092–3100.
CrossRef
Google scholar
|
[47] |
Zhu G, Porikli F, Li H D. Tracking randomly moving objects on Edge Box proposals. 2015, arXiv preprint arXiv:1507.08085
|
[48] |
Gan Y, Liu J, Dong J Y, Zhong G Q. A PCA-based convolutional network. 2015, arXiv preprint arXiv:1505.03703
|
[49] |
Guo Y W, Chen Y, Tang F, Li A, Luo W T, Liu M M. Object tracking using learned feature manifolds. Computer Vision and Image Understanding, 2014, 118: 128–139
CrossRef
Google scholar
|
[50] |
Fan J L, Xu W, Wu Y, Gong Y H. Human tracking using convolutional neural networks. TEEE Transactions on Neural Networks, 2010, 21(10): 1610–1623
CrossRef
Google scholar
|
[51] |
Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In: Proceedings of Neural Information Processing Systems Conference. 2013, 809–817
|
[52] |
Wang L, Liu T, Wang G, Chan K L, Yang Q. Video tracking using learned hierarchical features. IEEE Transactions on Image Processing, 2015, 24(4): 1424–1435
CrossRef
Google scholar
|
[53] |
Li H X, Li Y, Porikli F. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceedings of British Machine Vision Conference. 2014
CrossRef
Google scholar
|
[54] |
Wang L J, Ouyang W L, Wang X G, Lu H C. Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3119–3127
CrossRef
Google scholar
|
[55] |
Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of International Conference on Machine Learning. 2015, 597–606
|
[56] |
Ma C, Huang J B, Yang X K, Yang M H. Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3074–3082
CrossRef
Google scholar
|
[57] |
Nam H S, Han B Y. Learning multi-domain convolutional neural networks for visual tracking. 2015, arXiv preprint arXiv:1510.07945
|
[58] |
Elad M, Figueiredo M A, Ma Y. On the role of sparse and redundant representations in image processing. Proceedings of the IEEE, 2010, 98(6): 972–982
CrossRef
Google scholar
|
[59] |
Wu Y, Lim J W, Yang M H. Online object tracking: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2411–2418
CrossRef
Google scholar
|
[60] |
Yilmaz A, Javed O. Shah M. Object tracking: a survey. ACM Computing Surveys, 2006, 38(4): 13.
CrossRef
Google scholar
|
[61] |
Dollár P, Zitnick C T. Structured forests for fast edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2013, 1841–1848
CrossRef
Google scholar
|
[62] |
Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Compute Vision. 2014, 391–405
CrossRef
Google scholar
|
[63] |
Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Compute Vision. 2014
CrossRef
Google scholar
|
[64] |
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
CrossRef
Google scholar
|
[65] |
Gao J, Ling H, Hu W, Xing J. Transfer learning based visual tracking with gaussian processes regression. In: Proceedings of European Conference on Compute Vision. 2014
CrossRef
Google scholar
|
/
〈 | 〉 |