Beyond bag of latent topics: spatial pyramid matching for scene category recognition
Fu-xiang LU, Jun HUANG
Beyond bag of latent topics: spatial pyramid matching for scene category recognition
We propose a heterogeneous, mid-level feature based method for recognizing natural scene categories. The proposed feature introduces spatial information among the latent topics by means of spatial pyramid, while the latent topics are obtained by using probabilistic latent semantic analysis (pLSA) based on the bag-of-words representation. The proposed feature always performs better than standard pLSA because the performance of pLSA is adversely affected in many cases due to the loss of spatial information. By combining various interest point detectors and local region descriptors used in the bag-of-words model, the proposed feature can make further improvement for diverse scene category recognition tasks. We also propose a two-stage framework for multi-class classification. In the first stage, for each of possible detector/descriptor pairs, adaptive boosting classifiers are employed to select the most discriminative topics and further compute posterior probabilities of an unknown image from those selected topics. The second stage uses the prod-max rule to combine information coming from multiple sources and assigns the unknown image to the scene category with the highest ‘final’ posterior probability. Experimental results on three benchmark scene datasets show that the proposed method exceeds most state-of-the-art methods.
Scene category recognition / Probabilistic latent semantic analysis / Bag-of-words / Adaptive boosting
[1] |
Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1): 119−139. [
CrossRef
Google scholar
|
[2] |
Harris, C., Stephens, M., 1988. A combined corner and edge detector. Alvey Vision Conf., p.147−151. [
CrossRef
Google scholar
|
[3] |
Hofmann, T., 1999. Probabilistic latent semantic indexing. Proc. 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.50−57. [
CrossRef
Google scholar
|
[4] |
Hu, Z.H., Cai, Y.Z., Li, Y.G.,
CrossRef
Google scholar
|
[5] |
Kadir, T., Brady, M., 2001. Saliency, scale and image description. Int. J. Comput. Vis., 45(2): 83−105. [
CrossRef
Google scholar
|
[6] |
Kwitt, R., Vasconcelos, N., Rasiwasia, N., 2012. Scene recognition on the semantic manifold. European Conf. on Computer Vision, p.359−372. [
CrossRef
Google scholar
|
[7] |
Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169−2178. [
CrossRef
Google scholar
|
[8] |
Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524−531. [
CrossRef
Google scholar
|
[9] |
Liu, J.G., Shah, M., 2007. Scene modeling using coclustering. IEEE Int. Conf. on Computer Vision, p.1−7. [
CrossRef
Google scholar
|
[10] |
Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2): 91−110. [
CrossRef
Google scholar
|
[11] |
Lu, F.X., Yang, X.K., Zhang, R.,
CrossRef
Google scholar
|
[12] |
Lu, F.X., Yang, X.K., Lin, W.Y.,
CrossRef
Google scholar
|
[13] |
Matas, J., Chum, O., Urban, M.,
CrossRef
Google scholar
|
[14] |
Mikolajczyk, K., Schmid, C., 2004. Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 60(1): 63−86. [
CrossRef
Google scholar
|
[15] |
Oliva, A., Torralba, A., 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis., 42(3): 145−175. [
CrossRef
Google scholar
|
[16] |
Qi, X.B., Xiao, R., Li, C.G.,
CrossRef
Google scholar
|
[17] |
Quelhas, P., Monay, F., Odobez, J.,
CrossRef
Google scholar
|
[18] |
Shechtman, E., Irani, M., 2007. Matching local selfsimilarities across images and videos. IEEE Conf. on Computer Vision and Pattern Recognition, p.1−8. [
CrossRef
Google scholar
|
[19] |
Wang, Z.L., Feng, J.S., Yan, S.C.,
CrossRef
Google scholar
|
[20] |
Wu, J.X., 2012. Efficient HIK SVM learning for image classification. IEEE Trans. Image Process., 21(10): 4442−4453. [
CrossRef
Google scholar
|
[21] |
Wu, J.X., Rehg, J.M., 2011. CENTRIST: a visual descriptor for scene categorization. IEEE Trans. Patt. Anal. Mach. Intell., 33(8): 1489−1501. [
CrossRef
Google scholar
|
[22] |
Zhang, J.G., Marszałek, M., Lazebnik, S.,
CrossRef
Google scholar
|
/
〈 | 〉 |