Leveraging auxiliary-tasks for height and weight estimation with pose-disentanglement

Dan HAN , Jie ZHANG , Shiguang SHAN

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (4) : 2004704

PDF (1278KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (4) : 2004704 DOI: 10.1007/s11704-025-50162-0
Image and Graphics
RESEARCH ARTICLE

Leveraging auxiliary-tasks for height and weight estimation with pose-disentanglement

Author information +
History +
PDF (1278KB)

Abstract

Body height and weight estimation from a single non-frontal face image suffers from poor performance due to large face pose variance and lack of labeled data. In this paper, we propose a face-based body height and weight estimation method that leverages auxiliary tasks and pose disentanglement to address these issues. Specifically, inspired by the relevance of gender, age, height and weight estimation tasks, we employ gender and age estimation as auxiliary tasks to improve the performance of primary tasks, i.e., height and weight estimation. Besides, we remove the pose-relevant feature from input to further promote the performance of both primary tasks and auxiliary tasks. Extensive experiments are conducted on both small- and large-pose datasets, demonstrating the superiority of the proposed method.

Graphical abstract

Keywords

face-based body height and weight estimationm / auxiliary-tasks learning / pose disentanglement

Cite this article

Download citation ▾
Dan HAN, Jie ZHANG, Shiguang SHAN. Leveraging auxiliary-tasks for height and weight estimation with pose-disentanglement. Front. Comput. Sci., 2026, 20(4): 2004704 DOI:10.1007/s11704-025-50162-0

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Dantcheva A, Bremond F, Bilinski P. Show me your face and i will tell you your height, weight and body mass index. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR). 2018, 3555−3560

[2]

Haritosh A, Gupta A, Chahal E S, Misra A, Chandra S. A novel method to estimate Height, Weight and Body Mass Index from face images. In: Proceedings of the 12th International Conference on Contemporary Computing (IC3). 2019, 1−6

[3]

Kuczmarski M F, Kuczmarski R J, Najjar M . Effects of age on validity of self-reported height, weight, and body mass index: findings from the third national health and nutrition examination survey, 1988–1994. Journal of the American Dietetic Association, 2001, 101( 1): 28–34

[4]

Han D, Zhang J, Shan S. Leveraging auxiliary tasks for height and weight estimation by multi task learning. In: Proceedings of 2020 IEEE International Joint Conference on Biometrics (IJCB). 2020, 1−7

[5]

Popovic S, Bjelica D, Georgiev G, Krivokapic D, Milasinovic R . Body height and its estimation utilizing arm span measurements in macedonian adults. Anthropologist, 2016, 24( 3): 737–745

[6]

Bjelica D, Popović S, Kezunović M, Petković J, Jurak G, Grasgruber P . Body height and its estimation utilising arm span measurements in montenegrin adults. Anthropological Notebooks, 2012, 18( 2): 69–83

[7]

Popović S, Bjelica D, Tanase G D, Milašinović R . Body height and its estimation utilizing arm span measurements in Bosnian and Herzegovinian adults. Montenegrin Journal of Sports Science and Medicine, 2015, 4( 1): 29–36

[8]

Kuiti B, Bose K . Predictive equations for height estimation using knee height of older Bengalees of Purba Medinipur, West Bengal, India. Anthropological Review, 2016, 79( 1): 47–57

[9]

Fawzy I A, Kamal N N . Stature and body weight estimation from various footprint measurements among Egyptian population. Journal of forensic sciences, 2010, 55( 4): 884–888

[10]

Grivas T B, Mihas C, Arapaki A, Vasiliadis E . Correlation of foot length with height and weight in school age children. Journal of Forensic and Legal Medicine, 2008, 15( 2): 89–95

[11]

Cao D, Chen C, Adjeroh D, Ross A. Predicting gender and weight from human metrology using a copula model. In: Proceedings of the 5th IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS). 2012, 162−169

[12]

Huang J, Shang C, Xiong A, Pang Y, Jin Z. Seeing health with eyes: feature combination for image-based human BMI estimation. In: Proceedings of 2021 IEEE International Conference on Multimedia and Expo (ICME). 2021, 1−6

[13]

Jiang M, Guo G . Body weight analysis from human body images. IEEE Transactions on Information Forensics and Security, 2019, 14( 10): 2676–2688

[14]

Jin Z, Huang J, Xiong A, Pang Y, Wang W, Ding B . Attention guided deep features for accurate body mass index estimation. Pattern Recognition Letters, 2022, 154: 22–28

[15]

Kwon B, Lee S . Ensemble learning for skeleton-based body mass index classification. Applied Sciences, 2020, 10( 21): 7812

[16]

Nahavandi D, Abobakr A, Haggag H, Hossny M, Nahavandi S, Filippidis D. A skeleton-free Kinect system for body mass index assessment using deep neural networks. In: Proceedings of 2017 IEEE International Systems Engineering Symposium (ISSE). 2017, 1−6

[17]

Pfitzner C, May S, Nüchter A . Body weight estimation for dose-finding and health monitoring of lying, standing and walking patients based on RGB-D data. Sensors, 2018, 18( 5): 1311

[18]

Arigbabu O A, Ahmad S M S, Adnan W A W, Yussof S, Iranmanesh V, Malallah F L . Estimating body related soft biometric traits in video frames. The Scientific World Journal, 2014, 2014( 1): 460973

[19]

Pascali M A, Giorgi D, Bastiani L, Buzzigoli E, Henriquez P, Matuszewski B J, Morales M A, Colantonio S. Face morphology: can it tell us something about body weight and fat? Computers in Biology and Medicine, 2016, 76: 238−249

[20]

Hanif U, Paulsen R R, Leary E B, Mignot E, Jennum P, Sorensen H B D. Prediction of patient demographics using 3D craniofacial scans and multi-view CNNs. In: Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 2020, 1950−1953

[21]

Wen L, Guo G . A computational approach to body mass index prediction from face images. Image and Vision Computing, 2013, 31( 5): 392–400

[22]

Kocabey E, Camurcu M, Ofli F, Aytar Y, Marin J, Torralba A, Weber I. Face-to-BMI: using computer vision to infer body mass index on social media. In: Proceedings of the International AAAI Conference on Web and Social Media. 2017, 572−575

[23]

Mana N A M A, Fook C Y, Chin L C, Vijean V, Ardeenawatie S, Muthusamy H. Deep learning-based body mass index (BMI) prediction using pre-trained CNN models. In: Thakkar F, Saha G, Shahnaz C, Hu Y C, eds. Proceedings of the International e-Conference on Intelligent Systems and Signal Processing. Singapore: Springer, 2022, 617−631

[24]

Wang Y, Jin Z, Huang J, Lu H, Wang W. Facial landmark based BMI analysis for pervasive health informatics. In: Proceedings of 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 2023, 1−5

[25]

Jiang M, Guo G, Mu G. Visual BMI estimation from face images using a label distribution based method. Computer Vision and Image Understanding, 2020, 197−198: 102985

[26]

Yousaf N, Hussein S, Sultani W . Estimation of BMI from facial images using semantic segmentation based region-aware pooling. Computers in Biology and Medicine, 2021, 133: 104392

[27]

Mirabet-Herranz N, Mallat K, Dugelay J L. New insights on weight estimation from face images. In: Proceedings of 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). 2023, 1−6

[28]

Bai Y, Zhang Y, Ding M, Ghanem B. SOD-MTGAN: small object detection via multi-task generative adversarial network. In: Proceedings of the 15th European Conference on Computer Vision – ECCV 2018. 2018, 210−226

[29]

Liang M, Yang B, Chen Y, Hu R, Urtasun R. Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 7337−7345

[30]

Liu J, Li Y, Song S, Xing J, Lan C, Zeng W . Multi-modality multi-task recurrent neural network for online action detection. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29( 9): 2667–2682

[31]

Zhang S, Zheng R, Luo Y, Wang X, Mao J, Roberts C J, Sun M . Simultaneous arteriole and venule segmentation of dual-modal fundus images using a multi-task cascade network. IEEE Access, 2019, 7: 57561–57573

[32]

Volpi M, Tuia D . Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 144: 48–60

[33]

Sarafianos N, Giannakopoulos T, Nikou C, Kakadiaris I A. Curriculum learning for multi-task classification of visual attributes. In: Proceedings of 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 2017, 2608−2615

[34]

Argyriou A, Evgeniou T, Pontil M. Multi-task feature learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2006, 41−48

[35]

Lounici K, Pontil M, Tsybakov A B, van de Geer S. A. Taking advantage of sparsity in multi-task learning. In: Proceedings of the 22nd Conference on Learning Theory. 2009

[36]

Evgeniou T, Pontil M. Regularized multi–task learning. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 109−117

[37]

Jacob L, Bach F, Vert J P. Clustered multi-task learning: a convex formulation. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2008, 745−752

[38]

Kim S, Xing E P. Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 543−550

[39]

Chen X, Kim S, Lin Q, Carbonell J G, Xing E P. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. 2010, arXiv preprint arXiv:1005.3579

[40]

Gkioxari G, Hariharan B, Girshick R, Malik J. R-CNNs for pose estimation and action detection. 2014, arXiv preprint arXiv:1406.5212

[41]

Neven D, De Brabandere B, Georgoulis S, Proesmans M, Van Gool L. Fast scene understanding for autonomous driving. 2017, arXiv preprint arXiv:1708.02550

[42]

Teichmann M, Weber M, Zöllner M, Cipolla R, Urtasun R. MultiNet: real-time joint semantic reasoning for autonomous driving. In: Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV). 2018, 1013−1020

[43]

Li Z, Yao L. Three birds with one stone: multi-task temporal action detection via recycling temporal annotations. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 4749−4758

[44]

Huang Z, Zhang J, Shan H. When age-invariant face recognition meets face age synthesis: a multi-task learning framework. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 7278−7287

[45]

Phillips J, Martinez J, Bârsan I A, Casas S, Sadat A, Urtasun R. Deep multi-task learning for joint localization, perception, and prediction. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 4677−4687

[46]

Fu G, Zhang Q, Zhu L, Li P, Xiao C. A multi-task network for joint specular highlight detection and removal. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 7748−7757

[47]

Zhang X, Huang H, Tan J, Xu H, Yang C, Peng G, Wang L, Liu J. Hand image understanding via deep multi-task learning. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021, 11261−11272

[48]

Long M, Cao Z, Wang J, Yu P S. Learning multiple tasks with multilinear relationship networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1593−1602

[49]

Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 1131−1140

[50]

Misra I, Shrivastava A, Gupta A, Hebert M. Cross-stitch networks for multi-task learning. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 3994−4003

[51]

Ruder S, Bingel J, Augenstein I, Søgaard A. Latent Multi-task Architecture Learning. 2017, arXiv preprint arXiv:1705.08142

[52]

Liu S, Johns E, Davison A J. End-to-end multi-task learning with attention. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 1871−1880

[53]

Vandenhende S, Georgoulis S, Van Gool L. MTI-net: multi-scale task interaction networks for multi-task learning. In: Proceedings of the 16th European Conference on Computer Vision – ECCV 2020. 2020, 527−543

[54]

Gao Y, Bai H, Jie Z, Ma J, Jia K, Liu W. MTL-NAS: task-agnostic neural architecture search towards general-purpose multi-task learning. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, 11540−11549

[55]

Srivastava S, Sharma G. OmniVec2-a novel transformer based network for large scale multimodal and multitask learning. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024, 27402−27414

[56]

Chen Z, Zhu L, Wan L, Wang S, Feng W, Heng P A. A multi-task mean teacher for semi-supervised shadow detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, 5610−5619

[57]

Doersch C, Zisserman A. Multi-task self-supervised visual learning. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2051−2060

[58]

Yan Y, Xu C, Cai D, Corso J J. Weakly supervised actor-action segmentation via robust multi-task ranking. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 1022−1031

[59]

Chen S, Bortsova G, Juárez A G U, van Tulder G, de Bruijne M. Multi-task attention-based semi-supervised learning for medical image segmentation. In: Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. 2019, 457−465

[60]

Zhang Z, Luo P, Loy C C, Tang X. Facial landmark detection by deep multi-task learning. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). 2014, 94−108

[61]

Liebel L, Körner M. Auxiliary tasks in multi-task learning. 2018, arXiv preprint arXiv:1805.06334

[62]

Toshniwal S, Tang H, Lu L, Livescu K. Multitask learning with low-level auxiliary tasks for encoder-decoder based speech recognition. In: Proceedings of 18th Annual Conference of the International Speech Communication Association. 2017, 3532−3536

[63]

Oh J, Kim M . PeaceGAN: a GAN-based multi-task learning method for SAR target image generation with a pose estimator and an auxiliary classifier. Remote Sensing, 2021, 13( 19): 3939

[64]

Hosseinzadeh M, Wang Y. Image change captioning by learning from an auxiliary task. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 2724−2733

[65]

Valada A, Radwan N, Burgard W. Deep auxiliary learning for visual localization and odometry. In: Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA). 2018, 6939−6946

[66]

Mordan T, Thome N, Henaff G, Cord M. Revisiting multi-task learning with ROCK: a deep residual auxiliary block for visual detection. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1317−1329

[67]

Yu J, Jiang J. Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In: Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. 2016, 236−246

[68]

Cheng H, Fang H, Ostendorf M. Open-domain name error detection using a multi-task RNN. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 737−746

[69]

Xu L, Ouyang W, Bennamoun M, Boussaid F, Sohel F, Xu D. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021, 6964−6973

[70]

Jiang J, Chen B, Pan J, Wang X, Liu D, Jiang J, Long M. ForkMerge: mitigating negative transfer in auxiliary-task learning. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1322

[71]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 770−778

[72]

Hadad N, Wolf L, Shahar M. A two-step disentanglement method. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 772−780

[73]

Pan H, Han H, Shan S, Chen X. Mean-variance loss for deep age estimation from a face. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5285−5294

[74]

Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015, 5325−5334

[75]

He Z, Kan M, Zhang J, Chen X, Shan S. A fully end-to-end cascaded CNN for facial landmark detection. In: Proceedings of 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). 2017, 200−207

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1278KB)

Supplementary files

Highlights

349

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/