Robust human motion prediction via integration of spatial and temporal cues

Shaobo Zhang , Sheng Liu , Fei Gao , Yuan Feng

Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (8) : 499 -506.

PDF
Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (8) : 499 -506. DOI: 10.1007/s11801-025-4119-4
Article
research-article

Robust human motion prediction via integration of spatial and temporal cues

Author information +
History +
PDF

Abstract

Research on human motion prediction has made significant progress due to its importance in the development of various artificial intelligence applications. However, effectively capturing spatio-temporal features for smoother and more precise human motion prediction remains a challenge. To address these issues, a robust human motion prediction method via integration of spatial and temporal cues (RISTC) has been proposed. This method captures sufficient spatio-temporal correlation of the observable sequence of human poses by utilizing the spatio-temporal mixed feature extractor (MFE). In multi-layer MFEs, the channel-graph united attention blocks extract the augmented spatial features of the human poses in the channel and spatial dimension. Additionally, multi-scale temporal blocks have been designed to effectively capture complicated and highly dynamic temporal information. Our experiments on the Human3.6M and Carnegie Mellon University motion capture (CMU Mocap) datasets show that the proposed network yields higher prediction accuracy than the state-of-the-art methods.

Keywords

A

Cite this article

Download citation ▾
Shaobo Zhang, Sheng Liu, Fei Gao, Yuan Feng. Robust human motion prediction via integration of spatial and temporal cues. Optoelectronics Letters, 2025, 21(8): 499-506 DOI:10.1007/s11801-025-4119-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

GuiL Y, ZhangK, WangY X, et al.. Teaching robots to predict human motion. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 1–5, 2018, Madrid, Spain, 2018, New York. IEEE. 562567[C]

[2]

HABIBI G, JAIPURIA N, HOW J P. Context-aware pedestrian motion prediction in urban intersections[EB/OL]. (2018-07-25) [2024-02-23]. https://arxiv.org/abs/1806.09453.

[3]

LiT, LiuJ, ZhangW, et al.. Hard-net: hardness-aware discrimination network for 3D early activity prediction. IEEE transactions on circuits and systems for video technology, 2020, 34(12): 12112-12126[J]

[4]

KiciogluS, RhodinH, SinhaS N, et al.. Active-mocap: optimized viewpoint selection for active human motion capture. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13–19, 2020, Seattle, WA, USA, 2020, New York. IEEE. 103112[C]

[5]

IonescuC, PapavaD, OlaruV, et al.. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(7): 1325-1339[J]

[6]

LIU J, GUANG Y, ROJAS J. GAST-Net: graph attention spatio-temporal convolutional networks for 3D human pose estimation in video[EB/OL]. (2020-03-11) [2024-02-23]. https://arxiv.org/abs/2003.14179.

[7]

MaoW, LiuM, SalzmannM, et al.. Learning trajectory dependencies for human motion prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27–November 2, 2019, Seoul, Korea (South), 2019, New York. IEEE. 94899497[C]

[8]

FU J, YANG F, DANG Y, et al. Learning constrained dynamic correlations in spatiotemporal graphs for motion prediction[EB/OL]. (2022-04-04) [2024-02-23]. https://arxiv.org/abs/2204.01297.

[9]

MedinaE, LohL, GurungN, et al.. Context-based interpretable spatio-temporal graph convolutional network for human motion forecasting. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 3–8, 2024, Waikoloa, HI, USA, 2024, New York. IEEE. 32323241[C]

[10]

CamagozN, HadfieldS, KollerO, et al.. Subunets: end-to-end hand shape and continuous sign language recognition. Proceedings of the IEEE International Conference on Computer Vision, October 22–29, 2017, Venice, Italy, 2017, New York. IEEE. 30563065[C]

[11]

MartinezJ, BlackM J, RomeroJ. On human motion prediction using recurrent neural networks. IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Honolulu, HI, USA, 2017, New York. IEEE. 46744683[C]

[12]

PavlloD, FeichtenhoferC, GrangierD, et al.. 3D human pose estimation in video with temporal convolutions and semi-supervised training. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15–20, 2019, Long Bench, CA, USA, 2019, New York. IEEE. 77537762[C]

[13]

WooS, ParkJ, LeeJ Y, et al.. CBAM: convolutional block attention module. 15th European Conference on Computer Vision (ECCV), September 8–14, 2018, Munich, Germany, 2018, Heidelberg. Springer. 319[C]

[14]

LebaillyT, KicirogluS, SalzmannM, et al.. Motion prediction using temporal inception module. Proceedings of the Asian Conference on Computer Vision, November 30–December 4, 2020, Kyoto, Japan, 2020, Heidelberg. Springer. 651665[C]

[15]

ZhaoL, PengX, TianY, et al.. Semantic graph convolutional networks for 3D human pose regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15–20, 2019, Long Bench, CA, USA, 2019, New York. IEEE. 34253435[C]

[16]

VELICKOVIC P, CUCRRULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2017-10-30) [2024-02-23]. http://arxiv.org/abs/1710.10903.

[17]

ShiL, ZhangY, ChengJ, et al.. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15–20, 2019, Long Bench, CA, USA, 2019, New York. IEEE. 1202612035[C]

[18]

LiC, ZhangZ, LeeW S, et al.. Convolutional sequence to sequence model for human dynamics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–23, 2018, Salt Lake City, UT, USA, 2018, New York. IEEE. 52265234[C]

[19]

BOUAZIZI A, HOLZBOCK A, KRESSEL U, et al. MotionMixer: MLP-based 3D human body pose forecasting[EB/OL]. (2022-07-01) [2024-02-23]. https://arxiv.org/abs/2207.00499.

[20]

MaT, NieY, LongC, et al.. Progressively generating better initial guesses towards next stages for high-quality human motion prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18–24, 2022, New Orleans, LA, USA, 2022, New York. IEEE. 64376446[C]

[21]

ZhaoM, TangH, XieP, et al.. Bidirectional transformer GAN for long-term human motion prediction. ACM transactions on multimedia computing, communications and applications, 2023, 19(5): 1-19[J]

RIGHTS & PERMISSIONS

Tianjin University of Technology

AI Summary AI Mindmap
PDF

60

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/