3D human pose estimation in the sailing simulator based on somatosensory interaction
Mingmei Che , Cui Xie , Xuqi Pan , Yonghang Yang , Junyu Dong , Shan Luo , Xiaofeng Chang , Yiguo Wang
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) : 15
3D human pose estimation in the sailing simulator based on somatosensory interaction
The increasing research on sailing simulators has facilitated the advancement and widespread adoption of land-based sailing training. The somatosensory interaction offers benefits such as ease of deployment, low cost, and natural interactivity. This paper introduces a sailing simulator based on somatosensory interaction, wherein a standard camera and 3D human pose estimation (3DHPE) technology are utilized for sailing simulation. The aim of the study is to strike a balance between interactivity and deployment complexity. Despite the recent advances in 3DHPE, its use in real-time deployment scenarios has not been fully explored. The use of 3DHPE for pose estimation of dynamic users, wherein achieving a balance between real-time efficiency and accuracy is challenging. To address this, we propose a 3D pose estimation approach that integrates a graph-guided state space (STGJMamer). By using the lightweight transformer model PoseformerV2 as a baseline, the proposed method yields good real-time efficiency while integrating spatiotemporal extended graph convolutions and hierarchical joint enhancement Mamba. Furthermore, the model can efficiently capture both global and local features, and this ultimately enhances the pose estimation accuracy for dynamic users. The experimental results demonstrate that our approach achieves a frame-wise inference speed of 52 FPS, satisfying real-time constraints. Furthermore, it achieves an average mean per-joint position error (MPJPE) of 29.5 mm on the MPI-INF-3DHP dataset, outperforming most existing methods. Finally, we deploy the STGJMamer in a somatosensory interaction-based sailing simulator system and study its feasibility in real-world applications. The code is available at https://gitee.com/chemingmei/stgjmamer.
Sailing simulator / 3D human pose estimation / Human-computer interaction / Information and Computing Sciences / Artificial Intelligence and Image Processing / Information Systems
| [1] |
Anuradha M, Rao A, Shreyas S, Sanjaya KC (2023) Real time virtual yoga tutor. In: Proceedings of the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT). IEEE, pp 1–4 |
| [2] |
Chen HY, He JY, Xiang WM, Cheng ZQ, Liu W, Liu HB et al (2023) HDFormer: high-order directed transformer for 3D human pose estimation. Preprint at arXiv:2302.01825 |
| [3] |
|
| [4] |
Clark NA (2014) Validation of a sailing simulator using full scale experimental data. PhD thesis, University of Tasmania |
| [5] |
|
| [6] |
Dong HY, Chharia A, Gou WB, Carrasco FV, Torre F (2024) Hamba: single-view 3D hand reconstruction with graph-guided bi-scanning Mamba. Preprint at arXiv:2407.09646 |
| [7] |
Einfalt M, Ludwig K, Lienhart R (2023) Uplift and upsample: efficient 3D human pose estimation with uplifting transformers. In: Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, pp 2902–2912 |
| [8] |
|
| [9] |
|
| [10] |
Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. Preprint at arXiv:2312.00752 |
| [11] |
Guzov V, Mir A, Sattler T, Pons-Moll G (2021) Human POSEitioning System (HPS): 3D human pose estimation and self-localization in large scenes from body-mounted sensors. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 4316–4329 |
| [12] |
Huang YL, Liu JS, Xian K, Qiu RC (2024) PoseMamba: monocular 3D human pose estimation with bidirectional global-local spatio-temporal state space model. Preprint at arXiv:2408.03540 |
| [13] |
Jiang T, Lu P, Zhang L, Ma NS, Han R, Lyu CQ et al (2023) RTMPose: real-time multi-person pose estimation based on MMPose. Preprint at arXiv:2303.07399 |
| [14] |
|
| [15] |
Leder R, Laudan M (2021) Comparing a VR ship simulator using an HMD with a commercial ship handling simulator in a CAVE setup. In: Proceedings of the 23rd International Conference on Harbor, Maritime and Multimodal Logistic Modeling & Simulation (HMS 2021). CAL-TEK, pp 1–8 |
| [16] |
Li SC, Ke L, Pratama K, Tai YW, Tang CK, Cheng KT (2020) Cascaded deep monocular 3D human pose estimation with evolutionary training data. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 6172–6182 |
| [17] |
Li WH, Liu H, Tang H, Wang PC, Van Gool L (2022) MHFormer: multi-hypothesis transformer for 3D human pose estimation. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 13137–13146 |
| [18] |
Li WH, Liu MY, Liu H, Wang PC, Cai JL, Sebe N (2024) Hourglass tokenizer for efficient transformer-based 3D human pose estimation. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 604–613 |
| [19] |
Lin JH, Lee GH (2019) Trajectory space factorization for deep video-based 3D human pose estimation. Preprint at arXiv:1908.08289 |
| [20] |
|
| [21] |
Liu Y, Tian Y, Zhao Y, Yu HT, Xie LX, Wang YW et al (2024) VMamba: visual state space model. Preprint at arXiv:2401.10166 |
| [22] |
Lugaresi C, Tang JQ, Nash H, McClanahan C, Uboweja E, Hays M et al (2019) MediaPipe: a framework for building perception pipelines. Preprint at arXiv:1906.08172 |
| [23] |
|
| [24] |
|
| [25] |
Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, pp 2659–2668 |
| [26] |
Mehraban S, Adeli V, Taati B (2024) MotionAGFormer: enhancing 3D human pose estimation with a transformer-GCNFormer network. In: Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, pp 6905–6915 |
| [27] |
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu WP et al (2017) Monocular 3D human pose estimation in the wild using improved CNN supervision. In: Proceedings of the 2017 International Conference on 3D Vision (3DV). IEEE, pp 506–516 |
| [28] |
Nibali A, He Z, Morgan S, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. Preprint at arXiv:1801.07372 |
| [29] |
Pan XQ, Hu WC, Lv XK, Xie C, Dong JY, Chang XF (2024) Sailboat simulation system based on natural interaction and eye-tracking. In: International Conference on Haptics and Virtual Reality. Springer, Cham, pp 35–43 |
| [30] |
Pavlakos G, Zhou XW, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1263–1272 |
| [31] |
Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 7745–7754 |
| [32] |
Tang ZH, Qiu ZF, Hao YB, Hong RC, Yao T (2023) 3D human pose estimation with spatio-temporal criss-cross attention. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 4790–4799 |
| [33] |
Yu BXB, Zhang Z, Liu YX, Zhong SH, Liu Y, Chen CW (2023) GLA-GCN: global-local adaptive graph convolutional network for 3D human pose estimation from monocular video. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. IEEE, pp 8784–8795 |
| [34] |
|
| [35] |
Zennaro S, Munaro M, Milani S, Zanuttigh P, Bernardi A, Ghidoni S et al (2015) Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In: Proceedings of the 2015 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6 |
| [36] |
Zhang JL, Tu ZG, Yang JY, Chen YJ, Yuan JS (2022) MixSTE: seq2seq mixed spatio-temporal encoder for 3D human pose estimation in video. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 13222–13232 |
| [37] |
Zhao QT, Zheng C, Liu MY, Wang PC, Chen C (2023) PoseFormerV2: exploring frequency domain for efficient and robust 3D human pose estimation. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 8877–8886 |
| [38] |
Zheng C, Zhu SJ, Mendieta M, Yang TJN, Chen C, Ding ZM (2021) 3D human pose estimation with spatial and temporal transformers. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. IEEE, pp 11636–11645 |
The Author(s)
/
| 〈 |
|
〉 |