PDF
(791KB)
Abstract
As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.
Keywords
image categorization
/
resource constraints
/
large scale classification
/
deep neural networks
Cite this article
Download citation ▾
Jian-Hao LUO, Wang ZHOU, Jianxin WU.
Image categorization with resource constraints: introduction, challenges and advances.
Front. Comput. Sci., 2017, 11(1): 13-26 DOI:10.1007/s11704-016-5514-6
| [1] |
Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
|
| [2] |
Wu J, Liu N, Geyer C, Rehg M J. C4: a real-time object detection framework. IEEE Transactions on Image Processing, 2013, 22(10): 4096–4107
|
| [3] |
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178
|
| [4] |
Datta R, Joshi D, Li J, Wang Z J. Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 5
|
| [5] |
Breitenstein M D, Reichlin F, Leibe B, Koller-Meier E, Van Gool L. Robust tracking-by-detection using a detector confidence particle filter. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1515–1522
|
| [6] |
Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8
|
| [7] |
Sánchez J, Perronnin F, Mensink T, Verbeek J. Image classification with the fisher vector: theory and practice. International Journal of Computer Vision, 2013, 105(3): 222–245
|
| [8] |
Arandjelovic R, Zisserman A. All about VLAD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1578–1585
|
| [9] |
Wu J, Rehg J M. CENTRIST: a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1489–1501
|
| [10] |
Wu J, Yang H. Linear regression-based efficient SVM learning for large-scale classification. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2357–2369
|
| [11] |
Perronnin F, Sánchez J, Liu Y. Large-scale image categorization with explicit data embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2297–2304
|
| [12] |
Vedaldi A, Zisserman A. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 480–492
|
| [13] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference. 2012, 1097–1105
|
| [14] |
Deng J, Berg A C, Li K, Li F-F. What does classifying more than 10,000 image categories tell us? In: Proceedings of the 11th European Conference on Computer Vision. 2010, 71–84
|
| [15] |
Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T. Largescale image classification: fast feature extraction and SVM training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1689–1696
|
| [16] |
Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the International Conference on Learning Representations. 2016
|
| [17] |
Gong Y, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. 2014, arXiv preprint arXiv: 1412.6115
|
| [18] |
Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 2285–2294
|
| [19] |
Hinton G, Oriol V, Jeff D. Distilling the knowledge in a neural network. In: Proceedings of the Neural Information Processing Systems Workshop. 2014
|
| [20] |
Hsieh C-J, Chang K-W, Lin C-J, Keerthi S S, Sundararajan S. A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 408–415
|
| [21] |
Yuan G X, Ho C H, Lin C J. Recent advances of large-scale linear classification. Proceedings of the IEEE, 2012, 100(9): 2584–2603
|
| [22] |
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming, 2011, 127(1): 3–30
|
| [23] |
Wu J. Power mean SVM for large scale visual classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2344–2351
|
| [24] |
Williams CSeeger M. Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems. 2001, 682–688
|
| [25] |
Djuric N, Lan L, Vucetic S, Wang Z. BudgetedSVM: a toolbox for scalable SVM approximations. The Journal of Machine Learning Research, 2013, 14(1): 3813–3817
|
| [26] |
Odone F, Barla A, Verri A. Building kernels from binary strings for image matching. IEEE Transactions on Image Processing, 2005, 14(2): 169–180
|
| [27] |
Wu J. A fast dual method for HIK SVM learning. In: Proceedings of the 11th European Conference on Computer Vision. 2010, 552–565
|
| [28] |
Zhang Y, Wu J, Cai J, Lin W. Flexible image similarity computation using hyper-spatial matching. IEEE Transactions on Image Processing, 2014, 23(9): 4112–4125
|
| [29] |
Deshpande A, Rademacher L, Vempala S, Wang G. Matrix approximation and projective clustering via volume sampling. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. 2006, 1117–1126
|
| [30] |
Zhang K, Tsang I W, Kwok J T. Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 1232–1239
|
| [31] |
Kumar S, Mohri M, Talwalkar A. Sampling methods for the Nyström method. The Journal ofMachine Learning Research, 2012, 13(1): 981–1006
|
| [32] |
Yang H, Wu J. Reduced heteroscedasticity linear regression for Nyström approximation. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1841–1847
|
| [33] |
Jiao L, Bo L, Wang L. Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks, 2007, 18(3): 685–697
|
| [34] |
Li F, Lebanon G, Sminchisescu C. Chebyshev approximations to the histogram χ2 kernel. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2424–2431
|
| [35] |
Rahimi ARecht B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007, 20: 1177–1184
|
| [36] |
Maji S, Berg A C. Max-margin additive classifiers for detection. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 40–47
|
| [37] |
Csurka G, Dance C, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 2004, 1–16
|
| [38] |
Chang C, Lin C. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
|
| [39] |
Wu J. Efficient HIK SVM learning for image classification. IEEE Transactions on Image Processing, 2012, 21(10): 4442–4453
|
| [40] |
Wu J, Yang H. Practical large scale classification with additive kernels. In: Proceedings of the Asian Conference on Machine Learning. 2012, 523–538
|
| [41] |
Bay H, Tuytelaars T, Van Gool L. Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision. 2006, 404–417
|
| [42] |
Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 2564–2571
|
| [43] |
Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1470–1477
|
| [44] |
Winn J, Criminisi A, Minka T. Object categorization by learned universal visual dictionary. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 1800–1807
|
| [45] |
Perronnin F. Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7): 1243–1256
|
| [46] |
Wu J, Tan W C, Rehg J M. Efficient and effective visual codebook generation using additive kernels. The Journal of Machine Learning Research, 2011, 12: 3097–3118
|
| [47] |
Schölkopf B, Platt J C, Shawe-Taylor J, Smola A J, Williamson R C. Estimating the support of a high-dimensional distribution. Neural computation, 2001, 13(7): 1443–1471
|
| [48] |
Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the International Conference on Computer Vision Theory and Applications. 2009, 331–340
|
| [49] |
Liu T, Moore A W, Yang K. An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the Neural Information Processing Systems Conference. 2005, 825–832
|
| [50] |
Zhang Y, Wu J X, Lin W Y. Exclusive visual descriptor quantization. In: Proceedings of the 11th Asian Conference on Computer Vision. 2012, 408–421
|
| [51] |
Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using randomized clustering forests. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 2006, 985–992
|
| [52] |
Moosmann F, Nowak E, Jurie F. Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(9): 1632–1646
|
| [53] |
Binder A, Wojcikiewicz W, Müller C, Kawanabe M. A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition. In: Proceedings of the 10th Asian Conference on Computer Vision. 2010, 95–108
|
| [54] |
Uijlings J R R, Smeulders A W M, Scha R J. Real-time visual concept classification. IEEE Transactions on Multimedia, 2010, 12(7): 665–681
|
| [55] |
Zabih R, Woodfill J. Non-parametric local transforms for computing visual correspondence. In: Proceedings of the 3rd European Conference on Computer Vision. 1994, 151–158
|
| [56] |
Wu J, Rehg J M. Where am I: place instance and category recognition using spatial PACT. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
|
| [57] |
Xiao Y, Wu J, Yuan J. mCENTRIST: a multi-channel feature generation mechanism for scene categorization. IEEE Transactions on Image Processing, 2014, 23(2): 823–836
|
| [58] |
Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 117–128
|
| [59] |
Norouzi M, Fleet D J. Cartesian k-means. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3017–3024
|
| [60] |
Ge T, He K, Ke Q, Sun J. Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2946–2953
|
| [61] |
Gong Y, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes. In: IEEE Conference on Computer Vision and Pattern Recognition. 2011, 817–824
|
| [62] |
Gong Y, Kumar S, Rowley H A, Lazebnik S. Learning binary codes for high-dimensional data using bilinear projections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 484–491
|
| [63] |
Schwartz W R, Kembhavi A. Human detection using partial least squares analysis. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 24–31
|
| [64] |
Zhang Y, Wu J, Cai J. Compact representation for image classification: to choose or to compress? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 907–914
|
| [65] |
Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238
|
| [66] |
Fleuret F. Fast binary feature selection with conditional mutual information. The Journal of Machine Learning Research, 2004, 5: 1531–1555
|
| [67] |
Sindhwani V, Sainath T N, Kumar S. Structured transforms for smallfootprint deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 3070–3078
|
| [68] |
Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Neural Information Processing Systems Conference. 2014, 1269–1277
|
| [69] |
Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. 2014, arXiv preprint arXiv: 1405.3866
|
| [70] |
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th ACM International Conference on Machine Learning. 2007, 473–480
|
| [71] |
Denil M, Shakibi B, Dinh L. Predicting parameters in deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2013, 2148–2156
|
| [72] |
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 1135–1143
|
| [73] |
Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. 2012, arXiv preprint arXiv: 1207.0580
|
| [74] |
Luo P, Zhu Z, Liu Z, Wang X, Tang X. Face model compression by distilling knowledge from neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016
|
| [75] |
Ba J, Caruana R. Do deep nets really need to be deep? In: Proceedings of the Neural Information Processing Systems Conference. 2014, 2654–2662
|
| [76] |
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9
|
| [77] |
Arora S, Bhaskara A, Ge R, Ma T. Provable bounds for learning some deep representations. 2013, arXiv preprint arXiv: 1310.6343
|
| [78] |
Iandola F N, Moskewicz M W, Ashraf K, Han S, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<1MB model size. 2016, arXiv preprint arXiv: 1602.07360
|
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag Berlin Heidelberg