Video-based body geometric aware network for 3D human pose estimation

Chaonan Li , Sheng Liu , Lu Yao , Siyu Zou

Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 313 -320.

PDF
Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 313 -320. DOI: 10.1007/s11801-022-2015-8
Article

Video-based body geometric aware network for 3D human pose estimation

Author information +
History +
PDF

Abstract

Three-dimensional human pose estimation (3D HPE) has broad application prospects in the fields of trajectory prediction, posture tracking and action analysis. However, the frequent self-occlusions and the substantial depth ambiguity in two-dimensional (2D) representations hinder the further improvement of accuracy. In this paper, we propose a novel video-based human body geometric aware network to mitigate the above problems. Our network can implicitly be aware of the geometric constraints of the human body by capturing spatial and temporal context information from 2D skeleton data. Specifically, a novel skeleton attention (SA) mechanism is proposed to model geometric context dependencies among different body joints, thereby improving the spatial feature representation ability of the network. To enhance the temporal consistency, a novel multilayer perceptron (MLP)-Mixer based structure is exploited to comprehensively learn temporal context information from input sequences. We conduct experiments on publicly available challenging datasets to evaluate the proposed approach. The results outperform the previous best approach by 0.5 mm in the Human3.6m dataset. It also demonstrates significant improvements in HumanEva-I dataset.

Cite this article

Download citation ▾
Chaonan Li, Sheng Liu, Lu Yao, Siyu Zou. Video-based body geometric aware network for 3D human pose estimation. Optoelectronics Letters, 2022, 18(5): 313-320 DOI:10.1007/s11801-022-2015-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MehtaD, RhodinH, CasasD, et al.. Monocular 3D human pose estimation in the wild using improved CNN supervision, 2017, New York, IEEE: 506-516

[2]

HossainM R, LittleJ J. Exploiting temporal information for 3D human pose estimation, 2018, Berlin, Springer: 68-84

[3]

LIN J, LEE G H. Trajectory space factorization for deep video-based 3D human pose estimation[C]//2019 British Machine Vision Conference (BMVC), September 9–12, 2019, Cardiff, UK. BMVA, 2019.

[4]

LuvizonD C, PicardD, TabiaH. 2D/3D pose estimation and action recognition using multitask deep learning, 2018, New York, IEEE: 5137-5146

[5]

MartinezJ, HossainR, RomeroJ, et al.. A simple yet effective baseline for 3D human pose estimation, 2017, New York, IEEE: 2640-2649

[6]

ParkS, HwangJ, KwakN. 3D human pose estimation using convolutional neural networks with 2D pose information, 2016, Berlin, Springer: 156-169

[7]

PavlloD, FeichtenhoferC, GrangierD, et al.. 3D human pose estimation in video with temporal convolutions and semi-supervised training, 2019, New York, IEEE: 7753-7762

[8]

ChenX, LinK Y, LiuW, et al.. Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation, 2019, New York, IEEE: 7753-7762

[9]

FangH S, XuY, WangW, et al.. Learning pose grammar to encode human body configuration for 3D pose estimation, 2018, Cambridge, AAAI Press: 6821-6828

[10]

PavlakosG, ZhouX, DerpanisK G, et al.. Coarse-to-fine volumetric prediction for single-image 3D human pose, 2017, New York, IEEE: 7025-7034

[11]

XuJ, YuZ, NiB, et al.. Deep kinematics analysis for monocular 3D human pose estimation, 2020, New York, IEEE: 899-908

[12]

CaiY, GeL, LiuJ, et al.. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks, 2019, New York, IEEE: 2272-2281

[13]

ZhaoL, PengX, TianY, et al.. Semantic graph convolutional networks for 3D human pose regression, 2019, New York, IEEE: 3425-3435

[14]

LiuK, DingR, ZouZ, et al.. A comprehensive study of weight sharing in graph networks for 3D human pose estimation, 2020, Berlin, Springer: 318-334

[15]

CiH, WangC, MaX, et al.. Optimizing network structure for 3D human pose estimation, 2019, New York, IEEE: 2262-2271

[16]

WangJ, YanS, XiongY, et al.. Motion guided 3D pose estimation from videos, 2020, Berlin, Springer: 764-780

[17]

LiuR, ShenJ, WangH, et al.. Attention mechanism exploits temporal contexts: real-time 3D human pose reconstruction, 2020, New York, IEEE: 5064-5073

[18]

TolstikhinI, HoulsbyN, KolesnikovA, et al.. MLP-mixer: an all-MLP architecture for vision, 2021, New York, Curran Associates: 24261-24272

[19]

ChenC H, RamananD. 3D human pose estimation= 2D pose estimation + matching, 2017, New York, IEEE: 7035-7043

[20]

ZhengC, ZhuS, MendietaM, et al.. 3D human pose estimation with spatial and temporal transformers, 2021, New York, IEEE: 11656-11665

[21]

DabralR, MundhadaA, KusupatiU, et al.. Learning 3D human pose from structure and motion, 2018, Berlin, Springer: 668-683

[22]

ChengY, YangB, WangB, et al.. Occlusion-aware networks for 3D human pose estimation in video, 2019, New York, IEEE: 723-732

[23]

LiuJ, RojasJ, LiY, et al.. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video, 2021, New York, IEEE: 3374-3380

[24]

HochreiterS, SchmidhuberJ. Long short-term memory. Neural computation, 1997, 9(8):1735-1780

[25]

DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//9th International Conference on Learning Representations (ICLR), May 3–7, 2021, Virtual Event, Austria. 2021.

[26]

HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[EB/OL]. (2016-06-27) [2021-12-26]. https://arxiv.org/abs/1606.08415v1.

[27]

IonescuC, PapavaD, OlaruV, et al.. Human3. 6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(7):1325-1339

[28]

ChenY, WangZ, PengY, et al.. Cascaded pyramid network for multi-person pose estimation, 2018, New York, IEEE: 7103-7112

[29]

SigalL, BalanA O, BlackM J. Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International journal of computer vision, 2010, 87(1–2):4

[30]

KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22) [2021-12-26]. https://arxiv.org/abs/1412.6980v1.

[31]

LOSHCHILOV I, HUTTER F. SGDR: stochastic gradient descent with warm restarts[EB/OL]. (2016-08-13) [2021-12-26]. https://arxiv.org/abs/1608.03983v1.

[32]

LeeK, LeeI, LeeS. Propagating LSTM: 3D pose estimation based on joint interdependency, 2018, Berlin, Springer: 119-135

AI Summary AI Mindmap
PDF

147

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/