Behaviour recognition based on the integration of multigranular motion features in the Internet of Things

Lizong Zhang , Yiming Wang , Ke Yan , Yi Su , Nawaf Alharbe , Shuxin Feng

›› 2024, Vol. 10 ›› Issue (3) : 666 -675.

PDF
›› 2024, Vol. 10 ›› Issue (3) :666 -675. DOI: 10.1016/j.dcan.2022.10.011
Research article
research-article

Behaviour recognition based on the integration of multigranular motion features in the Internet of Things

Author information +
History +
PDF

Abstract

With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices, crowdsensing systems in the Internet of Things (IoT) are now conducting complicated video analysis tasks such as behaviour recognition. These applications have dramatically increased the diversity of IoT systems. Specifically, behaviour recognition in videos usually requires a combinatorial analysis of the spatial information about objects and information about their dynamic actions in the temporal dimension. Behaviour recognition may even rely more on the modeling of temporal information containing short-range and long-range motions, in contrast to computer vision tasks involving images that focus on understanding spatial information. However, current solutions fail to jointly and comprehensively analyse short-range motions between adjacent frames and long-range temporal aggregations at large scales in videos. In this paper, we propose a novel behaviour recognition method based on the integration of multigranular (IMG) motion features, which can provide support for deploying video analysis in multimedia IoT crowdsensing systems. In particular, we achieve reliable motion information modeling by integrating a channel attention-based short-term motion feature enhancement module (CSEM) and a cascaded long-term motion feature integration module (CLIM). We evaluate our model on several action recognition benchmarks, such as HMDB51, Something-Something and UCF101. The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods, which confirms its effectiveness and efficiency.

Keywords

Behaviour recognition / Motion features / Attention mechanism / Internet of things / Crowdsensing

Cite this article

Download citation ▾
Lizong Zhang, Yiming Wang, Ke Yan, Yi Su, Nawaf Alharbe, Shuxin Feng. Behaviour recognition based on the integration of multigranular motion features in the Internet of Things. , 2024, 10(3): 666-675 DOI:10.1016/j.dcan.2022.10.011

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

S. Cheng, Z. Cai, J. Li, H. Gao, Extracting kernel dataset from big sensory data in wireless sensor networks, IEEE Trans. Knowl. Data Eng. 29 (4) (2016) 813-827.

[2]

Z. He, Z. Cai, S. Cheng, X. Wang, Approximate aggregation for tracking quantiles and range countings in wireless sensor networks, Theor. Comput. Sci. 607 (2015) 381-390.

[3]

K. Shafique, B.A. Khawaja, F. Sabir, S. Qazi, M. Mustaqim,Internet of things (iot) for next-generation smart systems: a review of current challenges, future trends and prospects for emerging 5g-iot scenarios, IEEE Access 8 (2020) 23022-23040.

[4]

J. Li, S. Cheng, H. Gao, Z. Cai, Approximate physical world reconstruction algorithms in sensor networks, IEEE Trans. Parallel Distr. Syst. 25 (12) (2014) 3099-3110.

[5]

X. Zheng, Z. Cai, Privacy-preserved data sharing towards multiple parties in industrial IOTs, IEEE J. Sel. Area. Commun. 38 (5) (2020) 968-979.

[6]

M. Andersson,Deep learning for behaviour recognition in surveillance applications, in: Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies III, 11166, International Society for Optics and Photonics, 2019, p. 111660T.

[7]

M. Ziaeefard, R. Bergevin, Semantic human activity recognition: a literature review, Pattern Recogn. 48 (8) (2015) 2329-2345.

[8]

S. Cheng, Z. Cai, J. Li, Curve query processing in wireless sensor networks, IEEE Trans. Veh. Technol. 64 (11) (2014) 5198-5209.

[9]

Z. Cai, X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transact. Network Sci. Eng. 7 (2) (2018) 766-775.

[10]

J. Li, J. Zhao, S. Song, T. Feng, Occlusion aware unsupervised learning of optical flow from video, in: Thirteenth International Conference on Machine Vision, 11605, International Society for Optics and Photonics, 2021, p. 116050T.

[11]

K. Simonyan, A. Zisserman,Two-stream Convolutional Networks for Action Recognition in Videos, arXiv preprint arXiv:1406.2199.

[12]

N. Crasto, P. Weinzaepfel, K. Alahari, C. Schmid, Mars: motion-augmented rgb stream for action recognition,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7882-7891.

[13]

A. Piergiovanni, M.S. Ryoo,Representation flow for action recognition, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9945-9953.

[14]

L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, L. Van Gool, Temporal segment networks: towards good practices for deep action recognition,in: European Conference on Computer Vision, Springer, 2016, pp. 20-36.

[15]

S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2012) 221-231.

[16]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri,Learning spatiotemporal features with 3d convolutional networks, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489-4497.

[17]

K. He, X. Zhang, S. Ren, J. Sun,Deep residual learning for image recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[18]

J. Hu, L. Shen, G. Sun,Squeeze-and-excitation networks, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141.

[19]

Z. Cai, X. Zheng, J. Wang, Z. He, Private Data Trading towards Range Counting Queries in Internet of Things, IEEE Transactions on Mobile Computing, 2022.

[20]

R. Rajavel, S.K. Ravichandran, K. Harimoorthy, P. Nagappan, K.R. Gobichettipalayam, Iot-based smart healthcare video surveillance system using edge computing, J. Ambient Intell. Hum. Comput. 13 (6) (2022) 3195-3207.

[21]

X. Zhang, X. Wei, L. Zhou, Y. Qian, Social-content-aware scalable video streaming in internet of video things, IEEE Internet Things J. 9 (1) (2021) 830-843.

[22]

Z. Ou, Y. Hu, M. Song, Z. Yan, P. Hui, Redundancy removing aggregation network with distance calibration for video face recognition, IEEE Internet Things J. 8 (9)(2020) 7279-7287.

[23]

Y. Zhang, J.-H. Liu, C.-Y. Wang, H.-Y. Wei, Decomposable intelligence on cloud-edge iot framework for live video analytics, IEEE Internet Things J. 7 (9) (2020) 8860-8873.

[24]

C. Dai, X. Liu, L.T. Yang, M. Ni, Z. Ma, Q. Zhang, M.J. Deen, Video scene segmentation using tensor-train faster-rcnn for multimedia iot systems, IEEE Internet Things J. 8 (12) (2020) 9697-9705.

[25]

Z. Cai, Z. He, Trading private range counting over big iot data, in: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), IEEE, 2019, pp. 144-153.

[26]

N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition ( CVPR’05), 1, 2005, pp. 886-893, plus 0.5em minus 0.4emIeee.

[27]

I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8, plus 0.5em minus 0.4emIEEE.

[28]

H. Wang, A. Kläser, C. Schmid, C.-L. Liu, Dense trajectories and motion boundary descriptors for action recognition, Int. J. Comput. Vis. 103 (1) (2013) 60-79.

[29]

H. Wang, C. Schmid,Action recognition with improved trajectories, in:Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3551-3558.

[30]

A. Yao, J. Gall, L. Van Gool, A hough transform-based voting framework for action recognition, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2061-2068, plus 0.5em minus 0.4emIEEE.

[31]

Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, Y. Pan,Generative Adversarial Networks: A Survey towards Private and Secure Applications, arXiv preprint arXiv:2106.03785.

[32]

C. Feichtenhofer, A. Pinz, A. Zisserman,Convolutional two-stream network fusion for video action recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1933-1941.

[33]

C. Feichtenhofer, A. Pinz, R.P. Wildes,Spatiotemporal multiplier networks for video action recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4768-4777.

[34]

B. Jiang, M. Wang, W. Gan, W. Wu, J. Yan, Stm: spatiotemporal and motion encoding for action recognition,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2000-2009.

[35]

Y. Li, B. Ji, X. Shi, J. Zhang, B. Kang, L. Wang, Tea: temporal excitation and aggregation for action recognition,in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 909-918.

[36]

J. Lin, C. Gan, S. Han, Temporal shift module for efficient video understanding. 2019 ieee,in:CVF International Conference on Computer Vision, ICCV, 2019, pp. 7082-7092.

[37]

Z. Qiu, T. Yao, C.-W. Ngo, X. Tian, T. Mei,Learning spatio-temporal representation with local and global diffusion, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12056-12065.

[38]

A. Diba, M. Fayyaz, V. Sharma, A. H. Karami, M. M. Arzani, R. Yousefzadeh,L. Van Gool, Temporal 3D Convnets: New Architecture and Transfer Learning for Video Classification, arXiv preprint arXiv:1711.08200.

[39]

D. He, Z. Zhou, C. Gan, F. Li, X. Liu, Y. Li, L. Wang, S. Wen, Stnet: local and global spatial-temporal modeling for action recognition, Proc. AAAI Conf. Artif. Intell. 33 (2019) 8401-8408.

[40]

D. Tran, H. Wang, L. Torresani, M. Feiszli,Video classification with channel-separated convolutional networks, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5552-5561.

[41]

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri,A closer look at spatiotemporal convolutions for action recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6450-6459.

[42]

S. Xie, C. Sun, J. Huang, Z. Tu, K. Murphy, Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification,in:Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 305-321.

[43]

L. Sun, K. Jia, D.-Y. Yeung, B.E. Shi,Human action recognition using factorized spatio-temporal convolutional networks, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4597-4605.

[44]

S. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, P. H. Torr, Res2net: a new multi-scale backbone architecture, IEEE Trans. Pattern Anal. Mach. Intell..

[45]

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre, Hmdb: a large video database for human motion recognition, in: 2011 International Conference on Computer Vision, IEEE, 2011, pp. 2556-2563.

[46]

K. Soomro, A. R. Zamir, M. Shah, Ucf101: A Dataset of 101 Human Actions Classes from Videos in the Wild, arXiv preprint arXiv:1212.0402.

[47]

R. Goyal, S. Ebrahimi Kahou, V. Michalski, J. Materzynska, S. Westphal, H. Kim, V. Haenel, I. Fruend, P. Yianilos, M. Mueller-Freitag, et al., The ”something something” video database for learning and evaluating visual common sense,in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5842-5850.

[48]

J. Carreira, A. Zisserman, Quo vadis action recognition?A new model and the kinetics dataset, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299-6308.

[49]

X. Wang, A. Gupta, Videos as space-time region graphs, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 399-417.

[50]

Y. Hao, S. Wang, P. Cao, X. Gao, T. Xu, J. Wu, X. He, Attention in Attention: Modeling Context Correlation for Efficient Video Classification, IEEE Transactions on Circuits and Systems for Video Technology, 2022.

AI Summary AI Mindmap
PDF

82

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/