Bayesian Analysis of Two-Part Latent Variable Model with Mixed Data

Shuang-Can Xiong , Ye-Mao Xia , Bin Lu

Communications in Mathematics and Statistics ›› 2025, Vol. 13 ›› Issue (6) : 1313 -1349.

PDF
Communications in Mathematics and Statistics ›› 2025, Vol. 13 ›› Issue (6) :1313 -1349. DOI: 10.1007/s40304-023-00359-1
Article
research-article

Bayesian Analysis of Two-Part Latent Variable Model with Mixed Data

Author information +
History +
PDF

Abstract

In analyzing semi-continuous data, two-part model is a widely appreciated tool, in which two components are enclosed to characterize the mixing proportion of zeros and the actual level of positive values in semi-continuous data. The primary interest underlying such a model is primarily to exploit the dependence of the observed covariates on the semi-continuous variables; as such, the exploitation of unobserved heterogeneity is sometimes ignored. In this paper, we extend the conventional two-part regression model to much more general situations where multiple latent factors are considered to interpret the latent heterogeneity arising from the absence of covariates. A structural equation is constructed to describe the interrelationships between the latent factors. Moreover, a general statistical analysis procedure is developed to accommodate semi-continuous, ordered and unordered data simultaneously. A procedure for parameter estimation and model assessment is developed under a Bayesian framework. Empirical results including a simulation study and a real example are presented to illustrate the proposed methodology.

Keywords

Two-part latent variable model / Gibbs sampler / Model comparison / Household finance / 62F15 / 62H15

Cite this article

Download citation ▾
Shuang-Can Xiong, Ye-Mao Xia, Bin Lu. Bayesian Analysis of Two-Part Latent Variable Model with Mixed Data. Communications in Mathematics and Statistics, 2025, 13(6): 1313-1349 DOI:10.1007/s40304-023-00359-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Agresti A. An Introduction to Categorical Data Analysis, 20072Hoboken, Wiley

[2]

Berger JO. Statistical Decision Theory and Bayesian Analysis, 1985, New York, Springer

[3]

Bernheim D. Do households appreciate their financial vulnerabilities? An analysis of actions, perceptions, and public policy. Tax Policy Econ. Growth, 1995, 3: 11-13

[4]

Bollen KA. Structural Equations with Latent Variables, 1989, New York, Wiley

[5]

Brown S, Ghosh P, Su L, Taylor K. Modelling household finances: a bayesian approach to a multivariate two-part model. J. Empir. Financ., 2015, 33: 190-207

[6]

Browning M, Lusardi A. Household saving: micro theories and micro facts. J. Econ. Lit., 1996, 34(4): 1797-1855

[7]

Chow SM, Tang NS, Yuan Y, Song XY, Zhu HT. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet prior. Br. J. Math. Stat. Psychol., 2011, 64: 69-106

[8]

Cragg JG. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 1971, 39(5): 829-844

[9]

Duan N, Manning WG, Morris CN, Newhouse JP. A Comparison of alternative models for the demand for medical Care. J. Bus. Econ. Stat., 1983, 1(2): 115-126

[10]

Feng XN, Lu B, Song XY, Ma S. Financial literacy and household finances: a Bayesian two-part latent variable modeling approach. J. Empir. Financ., 2019, 51: 119-137

[11]

Geisser S, Eddy W. A predictive approach to model selection. J. Am. Stat. Assoc., 1979, 74: 1537-1160

[12]

Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc., 1990, 85(410): 398-409

[13]

Gelman A, Meng XL. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci., 1998, 13: 163-185

[14]

Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin., 1996, 6: 733-759

[15]

Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci., 1992, 7(4): 457-511

[16]

Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(6), 721–741 (1984)

[17]

Geyer CJ. Practical Markov chain Monte Carlo. Stat. Sci., 1992, 7(4): 473-511

[18]

Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl. Stat., 1992, 41: 337-348

[19]

Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57: 97-109

[20]

Jöreskog KG. A general approach to conrmatory maximum likelihood factor analysis. Psychometrika, 1969, 34: 183-202

[21]

Kass RE, Raftery AE. Bayes factors. J. Am. Stat. Assoc., 1995, 90(430): 773-795

[22]

Kim YK, Muthén BO. Two-part factor mixture modeling: application to an aggressive behavior measurement instrument. Struct. Equ. Model., 2009, 16(4): 602-624

[23]

Lee SY. Structural Equation Modeling: A Bayesian Approach, 2007, New York, Wiley

[24]

Lee SY, Xia YM. A robust bayesian approach for structural equation models with missing data. Psychometrika, 2008, 73(3): 343-364

[25]

Liu L, Strawderman RL, Cowen ME, Shih YCT. A flexible two-part random effects model for correlated medical costs. J. Health Econ., 2010, 29(1): 110-123

[26]

Liu L, Strawderman RL, Johnson B, O’Quigley JM. Analyzing repeated measures semicontinuous data, with application to an alcohol dependence study. Stat. Methods Med. Res., 2016, 25(1): 1-33

[27]

Little RJA, Rubin DB. Statistical Analysis with Missing Data, 20022New York, Wiley

[28]

Lusardi A, Mitchell OS. Baby boomer retirement security: the roles of planning, financial literacy, and housing wealth. Monetary Econ., 2007, 42: 35-44

[29]

Manning, W.G., et al.: A two-part model of the demand for medical care: preliminary results from the health insurance experiment. In: van der Gaag, J., Perlman, M. (eds.) Health, Economics, and Health Economics, p. 103C104. North-Holland, Amsterdam (1981)

[30]

Meng XL. Posterior predictive p-values. Ann. Stat., 1994, 22: 1142-1160

[31]

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. J. Chem. Phys., 1953, 21: 1087-1091

[32]

Mukhopadhyay S, Gelfand AE. Dirichlet process mixed generalized linear models. J. Am. Stat. Assoc., 1997, 92(438): 633-639

[33]

Neelon B, Zhu L, Neelon SEB. Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics, 2015, 16(3): 465-479

[34]

Olsen MK, Schafer JL. A two-part random-effects model for semicontinuous longitudinal data. J. Am. Stat. Assoc., 2001, 96(454): 730-745

[35]

Owen AB. Statistically efficient thinning of a Markov chain sampler. J. Comput. Graph. Stat., 2017

[36]

Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using PlyaCGamma latent variables. J. Am. Stat. Assoc., 2013, 108(504): 1339-1349

[37]

Rooij MV, Lusardi A, Alessie R. Financial literacy and stock market participation. J. Financ. Econ., 2011, 101(2): 449-472

[38]

Shi JQ, Lee SY. Bayesian sampling-based approach for factor analysis model with continuous and polytomous data. Br. J. Math. Stat. Psychol., 1998, 51(2): 233-252

[39]

Smith VA, Neelon B, Preisser JS, Maciejewski L. A marginalized two-part model for semicontinuous data. Stat. Med., 2015, 33(28): 4891-4903

[40]

Schneider, S., Stone, A.A.: Distinguishing between frequency and intensity of healthrelated symptoms from diary assessments. J. Psychosom. Res. 77(3), 205C212 (2014)

[41]

Song XY, Xia YM, Lee SY. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables. Stat. Med., 2010, 28(17): 2253-2276

[42]

Tooze JA, Grunwald JK, Jones RH. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res., 2002, 11(4): 341-355

[43]

Walker SG. Sampling the Dirichlet mixture model with slices. Commun. Stat. Simul. Comput., 2007, 36(1): 45-54

[44]

Wang XQ, Feng XN, Song XY. Joint analysis of semicontinuous data with latent variables. Comput. Stat. Data Anal., 2020, 151107005

[45]

Xia YM, Gou JW. Bayesian semiparametric analysis for latent variable models with mixed continuous and ordinal outcomes. J. Korean Stat. Soc., 2016, 45(3): 451-465

[46]

Xia YM, Lu B, Tang NS. Inference on two-part latent variable analysis model with multivariate longitudinal data. Struct. Equ. Model. A Multidiscipl. J., 2019, 26(5): 685-709

[47]

Zhu HT, Lee SY. A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 2001, 66(1): 133-152

RIGHTS & PERMISSIONS

The Author(s)

PDF

115

Accesses

0

Citation

Detail

Sections
Recommended

/