Three-dimensional dynamic gesture recognition method based on convolutional neural network

Ji Xi , Weiqi Zhang , Zhe Xu , Saide Zhu , Linlin Tang , Li Zhao

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (1) : 100280

PDF (944KB)
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (1) : 100280 DOI: 10.1016/j.hcc.2024.100280
Research Articles
research-article

Three-dimensional dynamic gesture recognition method based on convolutional neural network

Author information +
History +
PDF (944KB)

Abstract

With the rapid advancement of virtual reality, dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human-computer interaction in virtual environments. The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space. To solve the problem of low recognition accuracy of existing networks, an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed. The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures. Additionally, to enhance the model’s focus and improve its accuracy in identifying dynamic gestures, a lightweight convolutional attention mechanism is introduced. This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase. In order to further optimize the performance of the model, a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction. Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03% and 86.21%, respectively. When operating in RGB mode, the accuracy reached 93.49% and 80.22%, respectively. These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy, showcasing its potential for applications in advanced human-computer interaction systems.

Keywords

Dynamic gesture recognition / ResNeXt architecture / Three-dimensional convolution / Lightweight convolution / Attention mechanism module

Cite this article

Download citation ▾
Ji Xi, Weiqi Zhang, Zhe Xu, Saide Zhu, Linlin Tang, Li Zhao. Three-dimensional dynamic gesture recognition method based on convolutional neural network. High-Confidence Computing, 2025, 5(1): 100280 DOI:10.1016/j.hcc.2024.100280

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Ji Xi: Writing - review & editing, Writing - original draft, Visualization, Formal analysis, Data curation, Conceptualization. Weiqi Zhang: Writing - review & editing, Writing - original draft, Data curation, Conceptualization. Zhe Xu: Visualization, Funding acquisition. Saide Zhu: Visualization, Writing - original draft, Writing - review & editing. Linlin Tang: Writing - original draft, Writing - review & editing. Li Zhao: Supervision, Visualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

[1]

T. Zhu, S. Liu, B. Li, J. Liu, P. Liu, F. Zheng, Graph reasoning over explicit semantic relation, High-Confidence Comput. 4 (2) (2024) http://dx.doi.org/10.1016/j.hcc.2023.100190.

[2]

S. Yuanyuan, L. Yunan, F. Xiaolong, M. Kaibin, M. Qiguang, Review of dynamic gesture recognition, Virtual Real. Intell. Hardw. 3 (3) (2021) 183-206.

[3]

H. Xu, Z. Cai, W. Li, Privacy-preserving mechanisms for multi-label image recognition, ACM Trans. Knowl. Discov. Data 16 (4) (2022) http://dx.doi.org/10.1145/3491231.

[4]

B. Zhang, L. Wang, False negative sample detection for graph contrastive learning, Tsinghua Sci. Technol. 29 (2) (2024) 529-542, http://dx.doi.org/10.26599/TST.2023.9010043.

[5]

Y. Jiang, H. Xia, Adversarial attacks against dynamic graph neural networks via node injection, High-Confidence Comput. 4 (1) (2024) http://dx.doi.org/10.1016/j.hcc.2023.100185.

[6]

Y. Liang, Z. Cai, J. Yu, Q. Han, Y. Li, Deep learning based inference of private information using embedded sensors in smart devices, IEEE Netw. 32 (4) (2018) 8-14, http://dx.doi.org/10.1109/MNET.2018.1700349.

[7]

M. Haid, B. Budaker, M. Geiger, D. Husfeldt, M. Hartmann, N. Berezowski, Inertial-based gesture recognition for artificial intelligent cockpit control using hidden Markov models, in: 2019 IEEE International Conference on Consumer Electronics (ICCE), IEEE, 2019, pp. 1-4.

[8]

D. Wu, L. Pigou, P.-J. Kindermans, N.D.-H. Le, L. Shao, J. Dambre, J.-M. Odobez, Deep dynamic neural networks for multimodal gesture segmentation and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 38 (8) (2016) 1583-1597.

[9]

O. Yusuf, M. Habib, Development of a lightweight real-time application for dynamic hand gesture recognition, in: 2023 IEEE International Conference on Mechatronics and Automation (ICMA), IEEE, 2023, pp. 543-548.

[10]

W. Zhang, Y. Wang, X. Ji, Dynamic gesture tracking and recognition algorithm based on deep learning, in: 2023 35th Chinese Control and Decision Conference (CCDC), IEEE, 2023, pp. 3490-3495.

[11]

Y. Zhao, J. Zhou, Z. Ju, J. Chen, Q. Gao, Serial-parallel dynamic hand gesture recognition network for human-robot interaction, in: 2023 29th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), IEEE, 2023, pp. 1-6.

[12]

Y. Zhu, Z. Lan, S. Newsam, A. Hauptmann, Hidden two-stream convolutional networks for action recognition, in: Computer Vision-ACCV 2018: 14 th Asian Conference on Computer Vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, Springer, 2019, pp. 363-378.

[13]

Y. Zhao, M. Guo, X. Chen, J. Sun, J. Qiu, Attention-based CNN fusion model for emotion recognition during walking using discrete wavelet transform on EEG and inertial signals, Big Data Min. Anal. 7 (1) (2024) 188-204, http://dx.doi.org/10.26599/BDMA.2023.9020018.

[14]

L. Liu, L. Shao, Learning discriminative representations from RGB-D video data, in:Twenty-Third International Joint Conference on Artificial Intelligence, 2013.

[15]

C. Weihui, F. Yichao, Z. Ye, Dynamic gesture recognition based on icpm and rnn, in: Journal of Physics: Conference Series, 1684, IOP Publishing, 2020, 012066.

[16]

G. Zhu, L. Zhang, P. Shen, J. Song, Multimodal gesture recognition using 3-D convolution and convolutional LSTM, IEEE Access 5 (2017) 4517-4524.

[17]

C. Qi, J. Yin, Z. Zhang, J. Tang, Dynamic scene graph generation of point clouds with structural representation learning, Tsinghua Sci. Technol. 29 (1) (2024) 232-243, http://dx.doi.org/10.26599/TST.2023.9010002.

[18]

G. Zhu, L. Zhang, L. Mei, J. Shao, J. Song, P. Shen, Large-scale isolated gesture recognition using pyramidal 3d convolutional networks, in: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, 2016, pp. 19-24.

[19]

A. Tang, K. Lu, Y. Wang, J. Huang, H. Li, A real-time hand posture recognition system using deep neural networks, ACM Trans. Intell. Syst. Technol. 6 (2) (2015) 1-23.

[20]

S. Woo, J. Park, J.-Y. Lee, I. Kweon, Cbam: Convolutional block attention module,in:Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3-19.

[21]

A. Corradini, Dynamic time warping for off-line recognition of a small gesture vocabulary, in:Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, IEEE, 2001, pp. 82-89.

[22]

W. Zhang, H. Liu, H. Ma, W. Zhao, Research on gesture recognition based on improved template matching algorithm, in: 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), 2, IEEE, 2021, pp. 462-467.

[23]

Z. Yang, Y. Li, W. Chen, Y. Zheng, Dynamic hand gesture recognition using hidden Markov models, in: 2012 7th International Conference on Computer Science & Education (ICCSE), IEEE, 2012, pp. 360-365.

[24]

A. Hernández-Vela, M.A. Bautista, X. Perez-Sala, V. Ponce, X. Baró, O. Pujol, C. Angulo, S. Escalera, Bovdw: Bag-of-visual-and-depth-words for gesture recognition,in:Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), IEEE, 2012, pp. 449-452.

[25]

P. Das, T. Ahmed, M.F. Ali, Static hand gesture recognition for american sign language using deep convolutional neural network, in: 2020 IEEE Region 10 Symposium (TENSYMP), IEEE, 2020, pp. 1762-1765.

[26]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.

[27]

O. Kopuklu, N. Kose, G. Rigoll, Motion fused frames: Data level fusion strategy for hand gesture recognition,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 2103-2111.

[28]

S. He, J. Fan, T. Yu, G3DC: A gene-graph-guided selective deep clustering method for single cell RNA-seq data, Big Data Min. Anal. 7 (3) (2024) 809-827, http://dx.doi.org/10.26599/BDMA.2024.9020011.

[29]

M.R. Gunawan, E.C. Djamal, Spatio-temporal approach using CNN-RNN in hand gesture recognition, in: 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), IEEE, 2021, pp. 385-389.

[30]

X. Li, L. Yang, Y. Liu, A lightweight dynamic gesture recognition network with spatio-temporal attention, in: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE, 2023, pp. 149-154.

[31]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489-4497.

[32]

Q. Miao, Y. Li, W. Ouyang, Z. Ma, X. Xu, W. Shi, X. Cao, Multimodal gesture recognition based on the resc3d network, in:Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3047-3055.

[33]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[34]

S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1492-1500.

[35]

Y. Ioannou, D. Robertson, R. Cipolla, A. Criminisi, Deep roots: Improving cnn efficiency with hierarchical filter groups,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1231-1240.

[36]

P. Gupta, K. Kautz, et al., Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks, in: CVPR, 1, 2016, p. 3.

[37]

Y. Zhang, C. Cao, J. Cheng, H. Lu, EgoGesture: A new dataset and benchmark for egocentric hand gesture recognition, IEEE Trans. Multimed. 20 (5) (2018) 1038-1050.

[38]

S. Dreiseitl, L. Ohno-Machado, Logistic regression and artificial neural network classification models: a methodology review, J. Biomed. Inform. 35 (5-6) (2002) 352-359.

[39]

E.S. Olivas, J.D.M. Guerrero, M. Martinez-Sober, J.R. Magdalena-Benedito, L. Serrano, et al., Handbook of Research on Machine Learning Applications and Trends:Algorithms, Methods, and Techniques:Algorithms, Methods, and Techniques, IGI global, 2009.

[40]

L. Bottou, Stochastic gradient descent tricks, in: Neural Networks: Tricks of the Trade: Second Edition, Springer, 2012, pp. 421-436.

[41]

E. Okewu, S. Misra, F.-S. Lius, Parameter tuning using adaptive moment estimation in deep learning neural networks, in: Computational Science and Its Applications-ICCSA 2020: 20th International Conference, Cagliari, Italy, July 1-4, 2020, Proceedings, Part VI 20, Springer, 2020, pp. 261-272.

[42]

P.Y. Simard, D. Steinkraus, J.C. Platt, et al., Best practices for convolutional neural networks applied to visual document analysis, in: Icdar, 3, Edinburgh, 2003.

[43]

O. Kopuklu, N. Kose, A. Gunduz, G. Rigoll, Resource efficient 3d convolutional neural networks, in:Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.

[44]

X. Han, F. Lu, G. Tian, Efficient 3D CNNs with knowledge transfer for sign language recognition, Multimedia Tools Appl. 81 (7) (2022) 10071-10090.

[45]

O. Kopuklu, Y. Rong, G. Rigoll, Talking with your hands: Scaling hand gestures and recognition with cnns,in:Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.

[46]

B. Karsh, R.H. Laskar, R.K. Karsh, Mxception and dynamic image for hand gesture recognition, Neural Comput. Appl. 36 (15) (2024) 8281-8300.

[47]

X. Han, Y. Cui, X. Chen, Y. Lu, W. Hu, Spatio-temporal dynamic attention graph convolutional network based on skeleton gesture recognition, Electronics 13 (18) (2024) 3733.

AI Summary AI Mindmap
PDF (944KB)

289

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/