3D human pose estimation in the sailing simulator based on somatosensory interaction

Mingmei Che; Cui Xie; Xuqi Pan; Yonghang Yang; Junyu Dong; Shan Luo; Xiaofeng Chang; Yiguo Wang

doi:10.1007/s44295-025-00062-7

Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) :15 DOI: 10.1007/s44295-025-00062-7

Research Paper

3D human pose estimation in the sailing simulator based on somatosensory interaction

Author information +

History +

PDF

Abstract

The increasing research on sailing simulators has facilitated the advancement and widespread adoption of land-based sailing training. The somatosensory interaction offers benefits such as ease of deployment, low cost, and natural interactivity. This paper introduces a sailing simulator based on somatosensory interaction, wherein a standard camera and 3D human pose estimation (3DHPE) technology are utilized for sailing simulation. The aim of the study is to strike a balance between interactivity and deployment complexity. Despite the recent advances in 3DHPE, its use in real-time deployment scenarios has not been fully explored. The use of 3DHPE for pose estimation of dynamic users, wherein achieving a balance between real-time efficiency and accuracy is challenging. To address this, we propose a 3D pose estimation approach that integrates a graph-guided state space (STGJMamer). By using the lightweight transformer model PoseformerV2 as a baseline, the proposed method yields good real-time efficiency while integrating spatiotemporal extended graph convolutions and hierarchical joint enhancement Mamba. Furthermore, the model can efficiently capture both global and local features, and this ultimately enhances the pose estimation accuracy for dynamic users. The experimental results demonstrate that our approach achieves a frame-wise inference speed of 52 FPS, satisfying real-time constraints. Furthermore, it achieves an average mean per-joint position error (MPJPE) of 29.5 mm on the MPI-INF-3DHP dataset, outperforming most existing methods. Finally, we deploy the STGJMamer in a somatosensory interaction-based sailing simulator system and study its feasibility in real-world applications. The code is available at https://gitee.com/chemingmei/stgjmamer.

Keywords

Sailing simulator / 3D human pose estimation / Human-computer interaction / Information and Computing Sciences / Artificial Intelligence and Image Processing / Information Systems

Cite this article

Download citation ▾

Mingmei Che, Cui Xie, Xuqi Pan, Yonghang Yang, Junyu Dong, Shan Luo, Xiaofeng Chang, Yiguo Wang. 3D human pose estimation in the sailing simulator based on somatosensory interaction. Intelligent Marine Technology and Systems, 2025, 3 (1) : 15 DOI:10.1007/s44295-025-00062-7

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Anuradha M, Rao A, Shreyas S, Sanjaya KC (2023) Real time virtual yoga tutor. In: Proceedings of the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT). IEEE, pp 1–4

[2]	Chen HY, He JY, Xiang WM, Cheng ZQ, Liu W, Liu HB et al (2023) HDFormer: high-order directed transformer for 3D human pose estimation. Preprint at arXiv:2302.01825

[3]	ChenTL, FangC, ShenXH, ZhuYH, ChenZL, LuoJB. Anatomy-aware 3D human pose estimation with bone-based pose decomposition. IEEE Trans Circ Syst Video Technol, 2021, 321198-209.

[4]	Clark NA (2014) Validation of a sailing simulator using full scale experimental data. PhD thesis, University of Tasmania

[5]	Diaz-AriasA, ShinD. ConvFormer: parameter reduction in transformer models for 3D human pose estimation by leveraging dynamic multi-headed convolutional attention. Visual Comput, 2024, 4042555-2569.

[6]	Dong HY, Chharia A, Gou WB, Carrasco FV, Torre F (2024) Hamba: single-view 3D hand reconstruction with graph-guided bi-scanning Mamba. Preprint at arXiv:2407.09646

[7]	Einfalt M, Ludwig K, Lienhart R (2023) Uplift and upsample: efficient 3D human pose estimation with uplifting transformers. In: Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, pp 2902–2912

[8]	El KaidA, BaïnaK. A systematic review of recent deep learning approaches for 3D human pose estimation. J Imaging, 2023, 912275.

[9]	GaleTJ, WallsJT. Development of a sailing dinghy simulator. Simulation, 2000, 743167-179.

[10]	Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. Preprint at arXiv:2312.00752

[11]	Guzov V, Mir A, Sattler T, Pons-Moll G (2021) Human POSEitioning System (HPS): 3D human pose estimation and self-localization in large scenes from body-mounted sensors. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 4316–4329

[12]	Huang YL, Liu JS, Xian K, Qiu RC (2024) PoseMamba: monocular 3D human pose estimation with bidirectional global-local spatio-temporal state space model. Preprint at arXiv:2408.03540

[13]	Jiang T, Lu P, Zhang L, Ma NS, Han R, Lyu CQ et al (2023) RTMPose: real-time multi-person pose estimation based on MMPose. Preprint at arXiv:2303.07399

[14]	KalmanRE. A new approach to linear filtering and prediction problems. J Basic Eng Mar, 1960, 82135-45.

[15]	Leder R, Laudan M (2021) Comparing a VR ship simulator using an HMD with a commercial ship handling simulator in a CAVE setup. In: Proceedings of the 23rd International Conference on Harbor, Maritime and Multimodal Logistic Modeling & Simulation (HMS 2021). CAL-TEK, pp 1–8

[16]	Li SC, Ke L, Pratama K, Tai YW, Tang CK, Cheng KT (2020) Cascaded deep monocular 3D human pose estimation with evolutionary training data. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 6172–6182

[17]	Li WH, Liu H, Tang H, Wang PC, Van Gool L (2022) MHFormer: multi-hypothesis transformer for 3D human pose estimation. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 13137–13146

[18]	Li WH, Liu MY, Liu H, Wang PC, Cai JL, Sebe N (2024) Hourglass tokenizer for efficient transformer-based 3D human pose estimation. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 604–613

[19]	Lin JH, Lee GH (2019) Trajectory space factorization for deep video-based 3D human pose estimation. Preprint at arXiv:1908.08289

[20]	LiuK, ZouZ, TangW, et al. IshikawaH, et al. . Learning global pose features in graph convolutional networks for 3d human pose estimation. Computer Vision-ACCV 2020. Lecture Notes in Computer Science, 2020ChamSpringer89-10512622

[21]	Liu Y, Tian Y, Zhao Y, Yu HT, Xie LX, Wang YW et al (2024) VMamba: visual state space model. Preprint at arXiv:2401.10166

[22]	Lugaresi C, Tang JQ, Nash H, McClanahan C, Uboweja E, Hays M et al (2019) MediaPipe: a framework for building perception pipelines. Preprint at arXiv:1906.08172

[23]	LuvizonDC, TabiaH, PicardD. Human pose regression by combining indirect part detection and contextual information. Comput Graph, 2019, 85: 15-22.

[24]	LuvizonDC, TabiaH, PicardD. SSP-Net: scalable sequential pyramid networks for real-time 3D human pose regression. Pattern Recognit, 2023, 142: 109714.

[25]	Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, pp 2659–2668

[26]	Mehraban S, Adeli V, Taati B (2024) MotionAGFormer: enhancing 3D human pose estimation with a transformer-GCNFormer network. In: Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, pp 6905–6915

[27]	Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu WP et al (2017) Monocular 3D human pose estimation in the wild using improved CNN supervision. In: Proceedings of the 2017 International Conference on 3D Vision (3DV). IEEE, pp 506–516

[28]	Nibali A, He Z, Morgan S, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. Preprint at arXiv:1801.07372

[29]	Pan XQ, Hu WC, Lv XK, Xie C, Dong JY, Chang XF (2024) Sailboat simulation system based on natural interaction and eye-tracking. In: International Conference on Haptics and Virtual Reality. Springer, Cham, pp 35–43

[30]	Pavlakos G, Zhou XW, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1263–1272

[31]	Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 7745–7754

[32]	Tang ZH, Qiu ZF, Hao YB, Hong RC, Yao T (2023) 3D human pose estimation with spatio-temporal criss-cross attention. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 4790–4799

[33]	Yu BXB, Zhang Z, Liu YX, Zhong SH, Liu Y, Chen CW (2023) GLA-GCN: global-local adaptive graph convolutional network for 3D human pose estimation from monocular video. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. IEEE, pp 8784–8795

[34]	YuanSZ, ZhouL. GTA-Net: an IoT-integrated 3D human pose estimation system for real-time adolescent sports posture correction. Alex Eng J, 2025, 112: 585-597.

[35]	Zennaro S, Munaro M, Milani S, Zanuttigh P, Bernardi A, Ghidoni S et al (2015) Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In: Proceedings of the 2015 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

[36]	Zhang JL, Tu ZG, Yang JY, Chen YJ, Yuan JS (2022) MixSTE: seq2seq mixed spatio-temporal encoder for 3D human pose estimation in video. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 13222–13232

[37]	Zhao QT, Zheng C, Liu MY, Wang PC, Chen C (2023) PoseFormerV2: exploring frequency domain for efficient and robust 3D human pose estimation. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 8877–8886

[38]	Zheng C, Zhu SJ, Mendieta M, Yang TJN, Chen C, Ding ZM (2021) 3D human pose estimation with spatial and temporal transformers. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. IEEE, pp 11636–11645