View-invariant human action recognition via robust locally adaptive multi-view learning
Jia-geng FENG, Jun XIAO
View-invariant human action recognition via robust locally adaptive multi-view learning
Human action recognition is currently one of the most active research areas in computer vision. It has been widely used in many applications, such as intelligent surveillance, perceptual interface, and content-based video retrieval. However, some extrinsic factors are barriers for the development of action recognition; e.g., human actions may be observed from arbitrary camera viewpoints in realistic scene. Thus, view-invariant analysis becomes important for action recognition algorithms, and a number of researchers have paid much attention to this issue. In this paper, we present a multi-view learning approach to recognize human actions from different views. As most existing multi-view learning algorithms often suffer from the problem of lacking data adaptiveness in the nearest neighborhood graph construction procedure, a robust locally adaptive multi-view learning algorithm based on learning multiple local L1-graphs is proposed. Moreover, an efficient iterative optimization method is proposed to solve the proposed objective function. Experiments on three public view-invariant action recognition datasets, i.e., ViHASi, IXMAS, and WVU, demonstrate data adaptiveness, effectiveness, and efficiency of our algorithm. More importantly, when the feature dimension is correctly selected (i.e.,>60), the proposed algorithm stably outperforms state-of-the-art counterparts and obtains about 6% improvement in recognition accuracy on the three datasets.
View-invariant / Action recognition / Multi-view learning / L1-norm / Local learning
[1] |
Ashraf, A.B., Lucey, S., Chen, T., 2008. Learning patch correspondences for improved viewpoint invariant face recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2008. 4587754]
|
[2] |
Balakrishnama, S., Ganapathiraju, A., 1998. Linear Discriminant Analysis—a Brief Tutorial. Institute for Signal and Information Processing, Mississippi State University,USA.
|
[3] |
Balasubramanian, M., Schwartz, E.L., 2002. The isomap algorithm and topological stability. Science, 295(5552):7. [doi:10.1126/science.295.5552.9r]
|
[4] |
Blum, A., Mitchell, T., 1998. Combining labeled and unlabeled data with co-training. Proc. 11th Annual Conf. on Computational Learning Theory, p.92–100. [doi:10.1145/ 279943.279962]
|
[5] |
Bobick, A.F., Davis, J.W., 2001. The recognition of human movement using temporal templates. IEEE Trans. Patt. Anal. Mach. Intell., 23(3):257–267. [doi:10.1109/34. 910878]
|
[6] |
Brémond, F., Thonnat, M., Zúñiga, M., 2006. Videounderstanding framework for automatic behavior recognition. Behav. Res. Methods, 38(3):416–426. [doi:10. 3758/BF03192795]
|
[7] |
Candès, E., Romberg, J., 2005. l1-Magic: Recovery of Sparse Signals via Convex Programming.
|
[8] |
Chen, C., Zhuang, Y.T., Xiao, J., 2010. Silhouette representation and matching for 3D pose discrimination—a comparative study. Image Vis. Comput., 28(4):654–667. [doi:10.1016/j.imavis.2009.10.008]
|
[9] |
Chen, H.S., Chen, H.T., Chen, Y.,
|
[10] |
Cheng, B., Yang, J., Yan, S.,
|
[11] |
de Sa Virginia, R., 2005. Spectral clustering with two views. Proc. 22nd Annual Int. Conf. on Machine Learning,p.20–27.
|
[12] |
Donoho, D.L., 2006. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun. Pure Appl. Math.,59(6):797–829. [doi:10.1002/cpa.20132]
|
[13] |
Donoho, D.L., Elad, M., Temlyakov, V.N., 2006. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory,52(1):6–18. [doi:10.1109/TIT.2005.860430]
|
[14] |
Feng, J.G., Xiao, J., 2013. View-invariant action recognition: a survey. J. Image Graph., 18(2):157–168 (in Chinese). [doi:10.11834/jig.20130205]
|
[15] |
Fu, Y., Xian, Y.M., 2001. Image classification based on multifeature and improved SVM ensemble. Comput. Eng.,37(21):196–198. [doi:10.3969/j.issn.1000-3428.2011.21. 067]
|
[16] |
He, X.F., Cai, D., Yan, S.,
|
[17] |
Jean, F., Bergevin, R., Albu, A.B., 2008. Trajectories normalization for viewpoint invariant gait recognition. Proc. 19th Int. Conf. on Pattern Recognition, p.1–4. [doi:10.1109/ICPR.2008.4761312]
|
[18] |
Junejo, I.N., Dexter, E., Laptev, I.,
|
[19] |
Lee, D.D., Seung, H.S., 1999. Learning the parts of objects by non-negative matrix factorization . Nature, 401(6755): 788–791. [doi:10.1038/44565]
|
[20] |
Lewandowski, M., Martinez-del-Rincon, J., Makris, D.,
|
[21] |
Long, B., Yu, P.S., Zhang, Z.F., 2008. A general model for multiple view unsupervised le arning. SIAM, p.822–833.
|
[22] |
Luo, Y., Wu, T., Hwang, J., 2003. Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks. Comput. Vis. Image Understand., 92(2-3):196–216. [doi:10.1016/j.cviu.2003. 08.001]
|
[23] |
23
|
[24] |
Mao, J.L., 2013. Adaptive multi-view learning and its application to image classification. J. Comput. Appl., 33(7): 1955–1959 (in Chinese). [doi:10.11772/j.issn.1001-9081. 2013.07.1955]
|
[25] |
Natarajan, P., Nevatia, R., 2008. View and scale invariant action recognition using multiview shape-flow models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2008.4587716]
|
[26] |
Natarajan, P., Singh, V.K., Nevatia, R., 2010. Learning 3D action models from a few 2D videos for view invariant action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2006–2013. [doi:10.1109/ CVPR.2010.5539876]
|
[27] |
Parameswaran, V., Chellappa, R., 2006. View invariance for human action recognition. Int. J. Comput. Vis., 66(1): 83–101. [doi:10.1007/s11263-005-3671-4]
|
[28] |
Rao, C., Yilmaz, A., Shah, M., 2002. View-invariant representation and recognition of actions. Int. J. Comput. Vis.,50(2):203–226. [doi:10.1023/A:1020350100748]
|
[29] |
Raytchev, B., Kikutsugi, Y., Tamaki, T.,
|
[30] |
Roh, M., Shin, H., Lee, S., 2010. View-independent human action recognition with volume motion template on single stereo camera. Patt. Recogn. Lett., 31(7):639–647. [doi:10.1016/j.patrec.2009.11.017]
|
[31] |
Roweis, S.T., Saul, L.K., 2000. Nonlinear dimensionality reduction by locally linear embedding. Science,290(5500):2323–2326. [doi:10.1126/science.290.5500. 2323]
|
[32] |
Shen, B., Si, L., 2010. Nonnegative matrix factorization clustering on multiple manifolds. Proc. 24th AAAI Conf. on Artificial Intelligence, p.575-580.
|
[33] |
Srestasathiern, P., Yilmaz, A., 2008. View invariant object recognition. Proc. 19th Int. Conf. on Pattern Recognition,p.1–4. [doi:10.1109/ICPR.2008.4761238]
|
[34] |
Syeda-Mahmood, T., Vasilescu, A., Sethi, S., 2001. Recognizing action events from multiple viewpoints. Proc. IEEE Workshop on Detection and Recognition of Events in Video, p.64–72. [doi:10.1109/EVENT.2001.938868]
|
[35] |
Tang, Y.F., Huang, Z.M., Huang, R.J.,
|
[36] |
Tian, C., Fan, G., Gao, X., 2008. Multi-view face recognition by nonlinear tensor decomposition. Proc. 19th Int. Conf. on Pattern Recognition, p.1–4. [doi:10.1109/ICPR.2008. 4761195]
|
[37] |
Wang, Y., Huang, K., Tan, T., 2007. Multi-view gymnastic activity recognition with fused HMM. Proc. 8th Asian Conf. on Computer Vision, p.667–677. [doi:10.1007/ 978-3-540-76386-4_63]
|
[38] |
Weinland, D., Ronfard, R., Boyer, E., 2006. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Understand., 104(2-3):249–257. [doi:10.1016/ j.cviu.2006.07.013]
|
[39] |
Weinland, D., Boyer, E., Ronfard, R., 2007. Action recognition from arbitrary views using 3D exemplars. Proc. IEEE 11th Int. Conf. on Computer Vision, p.1–7. [doi:10.1109/ ICCV.2007.4408849]
|
[40] |
Wen, J.H., Tian, Z., Lin, W.,
|
[41] |
Wold, S., Esbensen, K., Geladi, P., 1987. Principal component analysis. Chemometr. Intell. Lab. Syst., 2(1-3):37–52. [doi:10.1016/0169-7439(87)80084-9]
|
[42] |
Wright, J., Yang, A.Y., Ganesh, A.,
|
[43] |
Xia, T., Tao, D.C., Mei, T.,
|
[44] |
Yan, P., Khan, S.M., Shah, M., 2008. Learning 4D action feature models for arbitrary view action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition,p.1–7. [doi:10.1109/CVPR.2008.4587737]
|
[45] |
Yang, J., Jiang, Y.G., Hauptmann, A.G.,
|
[46] |
Yilmaz, A., Shah, M., 2005. Actions as objects: a novel action representation. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.984–989. [doi:10.1109/CVPR.2005.58]
|
[47] |
Yu, H., Sun, G., Song, W.,
|
[48] |
Zheng, S.E., Ye, S.Z., 2006. Semi-supervision and active relevance feedback algorithm for content-based image retrieval. Comput. Eng. Appl., S1:81–87 (in Chinese).
|
[49] |
Zhou, D., Burges, C.J.C., 2007. Spectral clustering and transductive learning with multiple views. Proc. 24th Int. Conf. on Machine Learning, p.1159–1166. [doi:10.1145/ 1273496.1273642]
|
/
〈 | 〉 |