Efficient image representation for object recognition via pivots selection

Bojun XIE, Yi LIU, Hui ZHANG, Jian YU

PDF(420 KB)
PDF(420 KB)
Front. Comput. Sci. ›› 2015, Vol. 9 ›› Issue (3) : 383-391. DOI: 10.1007/s11704-015-4182-7
RESEARCH ARTICLE

Efficient image representation for object recognition via pivots selection

Author information +
History +

Abstract

Patch-level features are essential for achieving good performance in computer vision tasks. Besides wellknown pre-defined patch-level descriptors such as scaleinvariant feature transform (SIFT) and histogram of oriented gradient (HOG), the kernel descriptor (KD) method [1] offers a new way to “grow-up” features from a match-kernel defined over image patch pairs using kernel principal component analysis (KPCA) and yields impressive results.

In this paper, we present efficient kernel descriptor (EKD) and efficient hierarchical kernel descriptor (EHKD), which are built upon incomplete Cholesky decomposition. EKD automatically selects a small number of pivot features for generating patch-level features to achieve better computational efficiency. EHKD recursively applies EKD to form image-level features layer-by-layer. Perhaps due to parsimony, we find surprisingly that the EKD and EHKD approaches achieved competitive results on several public datasets compared with other state-of-the-art methods, at an improved efficiency over KD.

Keywords

efficient kernel descriptor / efficient hierarchical kernel descriptor / incomplete Cholesky decomposition / patch-level features / image-level features

Cite this article

Download citation ▾
Bojun XIE, Yi LIU, Hui ZHANG, Jian YU. Efficient image representation for object recognition via pivots selection. Front. Comput. Sci., 2015, 9(3): 383‒391 https://doi.org/10.1007/s11704-015-4182-7

References

[1]
Bo L F, Ren X F, Fox D. Kernel descriptor for visual recognition. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2010, 244-252
[2]
Bosch A, Munōz X, Marti R. Which is the best way to organize/classify images by content? Image and Vision Computing, 2007, 25(6): 778-791
CrossRef Google scholar
[3]
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110
CrossRef Google scholar
[4]
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886-893
CrossRef Google scholar
[5]
Vogel J, Schiele B. Semantic modeling of natural scenes for contentbased image retrieval. International Journal of Computer Vision, 2007, 72(2): 133-157
CrossRef Google scholar
[6]
Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005, 524-531
[7]
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2169-2178
CrossRef Google scholar
[8]
Bo L F, Sminchisescu C. Efficient match kernel between sets of features for visual recognition. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2009, 135-143
[9]
Schölkopf B, Smola A, Müller K. Nonlinear component analysis as a kernel eigenvalue problem. Neurocomputing, 1998, 10(5): 1299-1319
CrossRef Google scholar
[10]
Xie B J, Liu Y, Zhang H, Yu J. Efficient kernel descriptor for image categorization via pivots selection. In: Proceedings of the IEEE International Conference on Image Processing. 2013, 3479-3483
CrossRef Google scholar
[11]
Wang P, Wang J D, Zeng G, Xu W W, Zha H B, Li S P. Supervised kernel descriptors for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2858-2865
CrossRef Google scholar
[12]
LeCun Y, Huang F J, Bottou L. Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2004, 97-104
CrossRef Google scholar
[13]
Hinton G, Osindero S, Teh Y. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
CrossRef Google scholar
[14]
Hinton G, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
CrossRef Google scholar
[15]
Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2012, 1106-1114
[16]
Bo L F, Lai K, Ren X F, Fox D. Object recognition with hierarchical kernel descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1729-1736
CrossRef Google scholar
[17]
Fine S, Scheinberg K. Efficient svm training using low-rank kernel representation. Journal of Machine Learning Research, 2001, 2: 243-264
[18]
Bach F R, Jordan M I. Kernel independent component analysis. Journal of Machine Learning Research, 2002, 3: 1-48
[19]
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145-175
CrossRef Google scholar
[20]
Li F F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007, 106(1): 59-70
CrossRef Google scholar
[21]
Li L J, Li F F. What, where and who? Classifying events by scene and object recognition. In: Proceedings of the IEEE International Conference on Computer Vision. 2007, 1-8
CrossRef Google scholar
[22]
Quattoni A, Torralba A. Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009, 413-420
CrossRef Google scholar
[23]
Torralba A, Fergus R, Freeman W. 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958-1970
CrossRef Google scholar
[24]
Fan R E, Chang KW, Hsieh C J, Wang X R, Lin C J. Liblinear: a library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871-1874
[25]
Shabou A, Borgne H L. Locality-constrained and spatially regularized coding for scene categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3618-3625
CrossRef Google scholar
[26]
Wang J J, Yang J C, Yu K, Lv F J, Huang T, Gong Y H. Localityconstrained linear coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 3360-3367
[27]
Jia Y Q, Huang C, Darrell T. Beyond spatial pyramids: receptive field learning for pooled image features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3370-3377
[28]
Boiman O, Shechtman E, Irani M. In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1-8
CrossRef Google scholar
[29]
Liu L Q, Wang L, Liu X W. In defense of soft-assignment coding. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 2486-2493
[30]
Zhu J, Li L J, Li F F, Xing E. Large margin learning of upstream scene understanding models. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2010, 2586-2594
[31]
Wu J, Rehg J. Centrist: a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1489-1501
CrossRef Google scholar
[32]
Li L J, Su H, Xing E, Li F F. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2010, 1378-1386
[33]
Pandey M, Lazebnik S. Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 1307-1314
CrossRef Google scholar
[34]
Singh S, Gupta A, Efros A A. Unsupervised discovery of mid-level discriminative patches. In: Proceedings of the European conference on Computer Vision. 2012, 73-86
CrossRef Google scholar
[35]
Ranzato M, Hinton G. Modeling pixel means and covariances using factorized third-order boltzmann machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2551-2558
CrossRef Google scholar
[36]
Le Q, Ngiam J, Chia Z C, Koh P, Ng A. Tiled convolutional neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems. 2010, 1279-1287
[37]
Yu K, Zhang T. Improved local coordinate coding using local tangents. In: Proceedings of International Conference on Machine Learning. 2010, 1215-1222
[38]
Coates A, Lee H, Ng A. An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2011, 215-223

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(420 KB)

Accesses

Citations

Detail

Sections
Recommended

/