
Dirichlet process and its developments: a survey
Yemao XIA, Yingan LIU, Jianwei GOU
Front. Math. China ›› 2022, Vol. 17 ›› Issue (1) : 79-115.
Dirichlet process and its developments: a survey
The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random, and assign them a prior. Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting. As the distribution of distribution, Dirichlet process (DP) is the most appreciated nonparametric prior due to its nice theoretical proprieties, modeling flexibility and computational feasibility. In this paper, we review and summarize some developments of DP during the past decades. Our focus is mainly concentrated upon its theoretical properties, various extensions, statistical modeling and applications to the latent variable models.
Nonparametric Bayes / Dirichlet process / Pólya urn prediction / Sethuraman representation / stick-breaking procedure / Chinese restaurant rule / mixture of Dirichlet process / dependence Dirichlet process / Markov Chains Monte Carlo / blocked Gibbs sampler / latent variable models
[1] |
Aldous D J. Exchangeability and related topics, In: École d’Éte de Probabilités de Saint-Flour XIII–1983, Lecture Notes in Math., Vol. 1117, New York: Springer-Verlag, 1985, 23- 34
|
[2] |
Antoniak C E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 1974, 2 (6): 1152- 1174
|
[3] |
Basu S, Chib S. Marginal likelihood and Bayes factors for Dirichlet process mixture models, J. Amer. Statist. Assoc., 2003, 98 (461): 224- 235
CrossRef
Google scholar
|
[4] |
Bentler P M, Wu E J C. EQS6: Structural Equations Program Manual. Encino, CA: Multivariate Software, 2006
|
[5] |
Blackwell D. Discreteness of Ferguson selections. Ann. Statist., 1973, 1 (2): 356- 358
|
[6] |
Blackwell D, MacQueen J B. Ferguson distributions via polya urn schemes. Ann. Statist., 1973, 1 (2): 353- 355
|
[7] |
Bollen K A. Structural Equations with Latent Variables. New York: John Wiley & Sons, 1989
|
[8] |
Bush C A, MacEachern S N. A semiparametric Bayesian model for randomised block designs. Biometrika, 1996, 83 (2): 275- 285
CrossRef
Google scholar
|
[9] |
Carota C, Parmigiani G. Semiparametric regression for count data. Biometrika, 2002, 89 (2): 265- 281
CrossRef
Google scholar
|
[10] |
Chow S M, Tang N S, Yuan Y, Song X Y, Zhu H T. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior. Br. J. Math. Stat. Psychol., 2011, 64 (1): 69- 106
CrossRef
Google scholar
|
[11] |
Cifarelli D, Regazzini E. Problemi statistici non parametrici in condizioni di scambialbilità parziale impiego di medie associative. Technical Report, Quad. Insitit. Mat. Finana. Univ. Torino III, 1978, 1- 13 (in Italian)
|
[12] |
Connor R J, Mosimann J E. Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Amer. Statist. Assoc., 1969, 64 (325): 194- 206
CrossRef
Google scholar
|
[13] |
Crandell L J, Dunson D B. Posterior simulation across nonparametric models for functional clustering. Sankhya B, 2011, 73 (1): 42- 61
CrossRef
Google scholar
|
[14] |
Dalal S R. Dirichlet invariant processes and applications to nonparametric estimation of symmetric distribution functions. Stochastic Process. Appl., 1979, 9 (1): 99- 107
CrossRef
Google scholar
|
[15] |
De Iorio M, Müller P, Rosner G L, MacEacher S N. An ANOVA model for dependent random measures. J. Amer. Statist. Assoc., 2004, 99 (465): 205- 215
CrossRef
Google scholar
|
[16] |
Doss H. Bayesian nonparametric estimation of the median: Part I. Computation of the estimates. Ann. Statist., 1985, 13 (4): 1432- 1444
|
[17] |
Doss H. Bayesian nonparametric estimation of the median: Part II. Asymptotic properties of the estimates. Ann. Statist., 1985, 13 (4): 1445- 1464
|
[18] |
Doss H. Bayesian nonparametric estimation for incomplete data via successive substitution sampling. Ann. Statist., 1994, 22 (4): 1763- 1786
|
[19] |
Duan J A, Guindani M, Gelfand A E. Generalized spatial Dirichlet process models. Biometrika, 2007, 94 (4): 809- 825
CrossRef
Google scholar
|
[20] |
Dunson D B. Nonparametric Bayes local partition models for random effects. Biometrika, 2009, 96 (2): 249- 262
CrossRef
Google scholar
|
[21] |
Dunson D B, Park J H. Kernel stick-breaking processes. Biometrika, 2008, 95 (2): 307- 323
CrossRef
Google scholar
|
[22] |
Dunson D B, Pillai N, Park J H. Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2007, 69 (2): 163- 183
CrossRef
Google scholar
|
[23] |
Escobar M D. Estimating the means of several normal populations by estimating the distribution of the means, Ph. D. Thesis. New Haven: Yale Univ., 1988
|
[24] |
Escobar M D. Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc., 1994, 89 (425): 268- 277
CrossRef
Google scholar
|
[25] |
Escobar M D, West M. Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc., 1995, 90 (430): 577- 588
CrossRef
Google scholar
|
[26] |
Ewens W J. Population Genetics Theory — The Past and the Future. In: Lessard S. (eds) Mathematical and Statistical Developments of Evolutionary Theory. NATO ASI Series (Series C: Mathematical and Physical Sciences), vol 299. Dordrecht: Springer, 1990
|
[27] |
Fabius J. Asymptotic behavior of Bayes’ estimates. Ann. Math. Statist., 1964, 35 (2): 846- 856
CrossRef
Google scholar
|
[28] |
Ferguson T S. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973, 1 (2): 209- 230
|
[29] |
Ferguson T S. Prior distributions on spaces of probability measures. Ann. Statist., 1974, 2 (4): 615- 629
|
[30] |
Fong D K H, Pammer S E, Arnold S F, Bolton G E. Reanalyzing ultimatum bargaining: comparing nondecreasing curves without shape constraints. J. Busin. Econom. Statist., 2002, 20 (3): 423- 430
CrossRef
Google scholar
|
[31] |
Freedman D A. On the asymptotic behavior of Bayes’ estimates in the discrete case II. Ann. Math. Statist., 1963, 34 (4): 1386- 1403
CrossRef
Google scholar
|
[32] |
Gelfand A E, Kottas A. A computational approach for full nonparametric Bayesian inference under Dirichlet Process mixture models. J. Comput. Graph. Stat., 2002, 11 (2): 289- 305
CrossRef
Google scholar
|
[33] |
Gelfand A E, Kottas A. Bayesian semiparametric for median residual life. Scandinavian Journal of Statistics, 2003, 30 (4): 651- 665
CrossRef
Google scholar
|
[34] |
Gelfand A E, Kottas A, MacEachern S N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc., 2005, 100 (471): 1021- 1035
CrossRef
Google scholar
|
[35] |
Gelfand A E, Kuo L. Nonparametric Bayesian bioassay including ordered polytomous response. Biometrika, 1991, 78 (3): 657- 666
CrossRef
Google scholar
|
[36] |
Gelfand A E, Smith A F M. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 1990, 85 (410): 398- 409
CrossRef
Google scholar
|
[37] |
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Trans. Pattern Anal. Mach. Intell., 1984, PAMI-6 (6): 721- 741
CrossRef
Google scholar
|
[38] |
Ghosh J K, Ramamoorthi R V. Bayesian Nonparametrics, New York: Springer-Verlag,, 2003
|
[39] |
Giudici P, Mezzetti M, Muliere P. Mixtures of products of Dirichlet processes for variable selection in survival analysis. J. Statist. Plann. Inference, 2003, 111 (1/2): 101- 115
|
[40] |
Gou J W, Xia Y M, Jiang D P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Statistical Modelling, 2021,
CrossRef
Google scholar
|
[41] |
Griffin J E, Steel M F J. Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc., 2006, 101 (473): 179- 194
CrossRef
Google scholar
|
[42] |
Halmos P R. Random alms. Ann. Math. Statist., 1944, 15 (2): 182- 189
CrossRef
Google scholar
|
[43] |
Hanson T E. Inference for mixtures of finite Polya tree models. J. Amer. Statist. Assoc., 2006, 101 (476): 1548- 1565
CrossRef
Google scholar
|
[44] |
Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57 (1): 97- 109
CrossRef
Google scholar
|
[45] |
Ishwaran H, James L F. Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 2001, 96 (453): 161- 173
CrossRef
Google scholar
|
[46] |
Ishwaran H, James L F. Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comput. Graph. Stat., 2002, 11 (3): 508- 532
CrossRef
Google scholar
|
[47] |
Ishwaran H, James L F. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sin., 2003, 13 (4): 1211- 1235
|
[48] |
Ishwaran H, James L F. Computational methods for multiplicative intensity models using weighted Gamma process: proportional hazards, marked point processes, and panel count data. J. Amer. Statist. Assoc., 2004, 99 (465): 175- 190
CrossRef
Google scholar
|
[49] |
Ishwaran H, Takahara G. Independent and identically distributed Monte Carlo algorithms for semiparametric linear mixed models. J. Amer. Statist. Assoc., 2002, 97 (460): 1154- 1166
CrossRef
Google scholar
|
[50] |
Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika, 2000, 87 (2): 371- 390
CrossRef
Google scholar
|
[51] |
Jöreskog K, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Hove and London: Scientific Software International, 1996
|
[52] |
Kelloway E K. Using Mplus for Structural Equation Modeling. Canadian Psychology, 1998, 40 (4): 381- 383
|
[53] |
Kingman J F C, Taylor S J, Hawkes A G, Walker A M, Cox D R, Smith A F M, Hill B M, Burville P J, Leonard T. Random discrete distributions. J. R. Stat. Soc. Ser. B., 1975, 37: 1- 22
|
[54] |
Kleinman K P, Ibrahim J G. A semiparametric Bayesian approach to the random effects model. Biometrics, 1998, 54 (3): 921- 938
CrossRef
Google scholar
|
[55] |
Kleinman K P, Ibrahim J G. A semi-parametric Bayesian approach to generalized linear mixed models. Statist. Med., 1998, 17 (22): 2579- 2596
CrossRef
Google scholar
|
[56] |
Kolmogorov A N. Foundations of the Theory of Probability, 2nd ed., trans. Nathan Morrison (1956). Chelsea: New-York, 1933. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288
CrossRef
Google scholar
|
[57] |
Kong A, Liu J S, Wong W H. Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288
CrossRef
Google scholar
|
[58] |
Korwar R M, Hollander M. Contributions to the theory of Dirichlet processes. Ann. Probab., 1973, 1 (4): 705- 711
|
[59] |
Kuo L. Computations of mixtures of Dirichlet processes. SIAM J. Sci. Stat. Comput., 1986, 7 (1): 60- 71
CrossRef
Google scholar
|
[60] |
Lavine M. Some aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1992, 20 (3): 1222- 1235
|
[61] |
Lavine M. More aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1994, 22 (3): 1161- 1176
|
[62] |
Lee S Y. Structural Equation Modeling: A Bayesian Approach. Chichester: John Wiley & Sons., 2007
|
[63] |
Lee S Y, Lu B, Song X Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statist. Med., 2008, 27 (13): 2341- 2360
CrossRef
Google scholar
|
[64] |
Lennox K P, Dahl D B, Vannucci M, Day R, Tsai J W. A Dirichlet process mixture of hidden Markov Models for protein structure prediction. Ann. Appl. Stat., 2010, 4 (2): 916- 942
|
[65] |
Li Y S, Lin X H, Müller P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics, 2010, 66 (1): 70- 78
CrossRef
Google scholar
|
[66] |
Liu J S. Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist., 1996, 24 (3): 911- 930
|
[67] |
Lo A Y. On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist., 1984, 12 (1): 351- 357
|
[68] |
MacEachern S N. Estimating normal means with a conjugate style Dirichlet process prior. Comm. Stat. Simulat. Comput., 1994, 23 (3): 727- 741
CrossRef
Google scholar
|
[69] |
MacEachern S N. Dependent Dirichlet processes, In: ASA Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: Amer. Statist. Assoc., 1999: 50- 55
|
[70] |
MacEachern S N. Decision theoretic aspects of dependent nonparametric processes. In: Bayesian Methods with Applications to Science, Policy and Official Statistics, Crete: International Society for Bayesian Analysis, 2000: 551- 560
|
[71] |
MacEachern S N, Clyde M, Liu J S. Sequential importance sampling for nonparametric Bayes models: The next generation. Canad. J. Statist., 1999, 27 (2): 251- 267
CrossRef
Google scholar
|
[72] |
MacEachern S N, Müller P. Estimating mixture of Dirichlet process models. J. Comput. Graph. Stat., 1998, 7 (2): 223- 238
|
[73] |
MacEachern S N, Müller P. Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models. In: Robust Bayesian Analysis, Lecture Notes in Statist., Vol. 152. New York: Springer-Verlag, 2000: 295- 315
|
[74] |
McCloskey J W. A model for the distribution of individuals by species in an environment. Ph.D. Thesis, East Lansing, MI: Michigan State Univ., 1965
|
[75] |
Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys., 1953, 21 (6): 1087- 1092
CrossRef
Google scholar
|
[76] |
Mira A, Petrone S. Bayesian hierarchical non-parametric inference for change-point problems. In: Bayesian Statistics 5, Oxford: Oxford Univ. Press, 1996: 693- 703
|
[77] |
Muliere P, Petrone S. A Bayesian predictive approach to sequential search for an optimal dose: parametric and nonparametric models. J. Ital. Statist. Soc., 1993, 2 (3): 349- 364
CrossRef
Google scholar
|
[78] |
Muliere P, Tardella L. Approximating distributions of random functionals of FergusonDirichlet priors. Canadian J. Statist., 1998, 26 (2): 283- 297
CrossRef
Google scholar
|
[79] |
Müller P, Erkanli A, West M. Bayesian curving fitting using multivariate normal mixtures. Biometrika, 1996, 83 (1): 67- 79
CrossRef
Google scholar
|
[80] |
Müller P, Quintana F, Rosner G. A method for combining inference across related nonparametric Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2004, 66 (3): 735- 749
CrossRef
Google scholar
|
[81] |
Müller P, Quintana F, Rosner G. A product partition model with regression on covariates. Journal of Computational and Graphical Statistics, 2011, 20, 260- 278
CrossRef
Google scholar
|
[82] |
Müller P, Quintana F A, Rosner G L, Maitland M L. Bayesian inference for longitudinal data with non-parametric treatment effects. Biostatistics, 2014, 15 (2): 341- 352
CrossRef
Google scholar
|
[83] |
Muthén L K, Muthén B O. Mplus user’s guild. Los Angels, CA: Muthén & Muthé, 1998. Biostatistics, 2014, 15 (2): 341- 352
CrossRef
Google scholar
|
[84] |
Neal R M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist., 2000, 9 (2): 249- 265
|
[85] |
Papaspiliopoulos O, Roberts G O. Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 2008, 95 (1): 169- 186
CrossRef
Google scholar
|
[86] |
Petrone S, Guindani M, Gelfand A E. Hybrid dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2009, 71 (4): 755- 782
CrossRef
Google scholar
|
[87] |
Pitman J. Some developments of the Blackwell-MacQueen urn scheme. In: Statistics, Probability and Game Theory, Papers in honor of David Blackwell, Hayward, CA: IMS, 1996: 245- 267
|
[88] |
Pitman J. Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab., 1996, 28 (2): 525- 539
CrossRef
Google scholar
|
[89] |
Reich B J, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat., 2007, 1 (1): 249- 264
|
[90] |
Ripley B D. Stochastic Simulation. Chichester: John Wiley & Sons, 1987
|
[91] |
Rodríguez, A, Dunson D B, Gelfand A E. The nested Dirichlet process. J. Amer. Statist. Assoc., 2008, 103 (483): 1131- 1154
CrossRef
Google scholar
|
[92] |
Rodriguez A, Dunson D B, Gelfand A E. Bayesian nonparametric functional data analysis through density estimation. Biometrika, 2009, 96 (1): 149- 162
CrossRef
Google scholar
|
[93] |
Scarpa B, Dunson D B. Enriched stick-breaking processes for functional data. J. Amer. Statist. Assoc., 2014, 109 (506): 647- 660
CrossRef
Google scholar
|
[94] |
Sethuraman J. A constructive definition of Dirichlet priors. Statist. Sin., 1994, 4 (2): 639- 650
|
[95] |
Sethuraman J, Tiwari R C. Convergence of Dirichlet measures and the interpretation of their parameters. In: Statistical Decision Theory and Related Topics III, New York: Academic Press, 1982: 305- 316
|
[96] |
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York: Chapman & Hall/CRC, 2004
|
[97] |
Song X Y, Lee S Y. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. New York: John Wiley & Sons, 2012
|
[98] |
Song X Y, Xia Y M, Lee S Y. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables. Statist. Med., 2009, 28 (17): 2253- 2276
CrossRef
Google scholar
|
[99] |
Song X Y, Xia Y M, Pan J H, Lee S Y. Model comparison of Bayesian semiparametric and parametric structural equation models. Struct. Equat. Model., 2011, 18 (1): 55- 72
CrossRef
Google scholar
|
[100] |
Tang A M, Tang N S. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statist. Med., 2015, 34 (5): 824- 843
CrossRef
Google scholar
|
[101] |
Tanner M A, Wong W H. The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc., 1987, 82 (398): 528- 540
CrossRef
Google scholar
|
[102] |
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 2006, 101 (476): 1566- 1581
CrossRef
Google scholar
|
[103] |
Tomlinson G, Escobar M. Analysis of densities. Technical Report, Toronto: University of Toronto, 1999
|
[104] |
Walker S G. Sampling the Dirichlet mixture model with slices, Comm. Statist. Simulation Comput., 2007, 36 (1): 45- 54
CrossRef
Google scholar
|
[105] |
West M, Müller P, Escobar M D. Hierarchical priors and mixtures models, with applications in regression and density estimates. In: Aspects of Uncertainty, A Tribute to D. V. Lindley. London: John Wiley & Sons, 1994: 363- 386
|
[106] |
Xia Y M, Gou J W. Assessing heterogeneity in multilevel factor analysis model A semiparametric Bayesian approach. Acta Math. Sin., 2015, 38 (4): 751- 768 (in Chinese)
|
[107] |
Xia Y M, Gou J W. Bayesian semiparametric analysis for latent variable models with mixed continuous and ordinal outcomes. J. Korean Statist. Soc., 2016, 45 (3): 451- 465
CrossRef
Google scholar
|
[108] |
Xia Y M, Gou J W, Liu Y A. Semi-parametric Bayesian analysis for factor analysis model mixed with hidden Markov model. Appl. Math. J. Chinese Univ. Ser. A, 2015, 30 (1): 17- 30 (in Chinese)
CrossRef
Google scholar
|
[109] |
Xia Y M, Liu Y A. Bayesian semiparametric analysis and model comparison for confirmatory factor model. Chinese J. Appl. Probab. Statist., 2016, 32 (2): 157- 183
|
[110] |
Xia Y M, Pan M L. Bayesian analysis for confirmatory factor model with finitedimensional Dirichlet prior mixing. Comm. Statist. Theory Methods, 2017, 46 (9): 4599- 4619
CrossRef
Google scholar
|
[111] |
Xia Y M, Tang N S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Computational Statistics & Data Analysis, 2019, 132: 190- 211
|
[112] |
Yang M G, Dunson D B. Bayesian semiparametric structural equation models with latent variables. Psychometrika, 2010, 75 (4): 675- 693
CrossRef
Google scholar
|
/
〈 |
|
〉 |