Frontiers of Mathematics in China >
Dirichlet process and its developments: a survey
Published date: 15 Feb 2022
Copyright
The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random, and assign them a prior. Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting. As the distribution of distribution, Dirichlet process (DP) is the most appreciated nonparametric prior due to its nice theoretical proprieties, modeling flexibility and computational feasibility. In this paper, we review and summarize some developments of DP during the past decades. Our focus is mainly concentrated upon its theoretical properties, various extensions, statistical modeling and applications to the latent variable models.
Yemao XIA , Yingan LIU , Jianwei GOU . Dirichlet process and its developments: a survey[J]. Frontiers of Mathematics in China, 2022 , 17(1) : 79 -115 . DOI: 10.1007/s11464-022-1004-3
1 |
Aldous D J. Exchangeability and related topics, In: École d’Éte de Probabilités de Saint-Flour XIII–1983, Lecture Notes in Math., Vol. 1117, New York: Springer-Verlag, 1985, 23- 34
|
2 |
Antoniak C E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 1974, 2 (6): 1152- 1174
|
3 |
Basu S, Chib S. Marginal likelihood and Bayes factors for Dirichlet process mixture models, J. Amer. Statist. Assoc., 2003, 98 (461): 224- 235
|
4 |
Bentler P M, Wu E J C. EQS6: Structural Equations Program Manual. Encino, CA: Multivariate Software, 2006
|
5 |
Blackwell D. Discreteness of Ferguson selections. Ann. Statist., 1973, 1 (2): 356- 358
|
6 |
Blackwell D, MacQueen J B. Ferguson distributions via polya urn schemes. Ann. Statist., 1973, 1 (2): 353- 355
|
7 |
Bollen K A. Structural Equations with Latent Variables. New York: John Wiley & Sons, 1989
|
8 |
Bush C A, MacEachern S N. A semiparametric Bayesian model for randomised block designs. Biometrika, 1996, 83 (2): 275- 285
|
9 |
Carota C, Parmigiani G. Semiparametric regression for count data. Biometrika, 2002, 89 (2): 265- 281
|
10 |
Chow S M, Tang N S, Yuan Y, Song X Y, Zhu H T. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior. Br. J. Math. Stat. Psychol., 2011, 64 (1): 69- 106
|
11 |
Cifarelli D, Regazzini E. Problemi statistici non parametrici in condizioni di scambialbilità parziale impiego di medie associative. Technical Report, Quad. Insitit. Mat. Finana. Univ. Torino III, 1978, 1- 13 (in Italian)
|
12 |
Connor R J, Mosimann J E. Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Amer. Statist. Assoc., 1969, 64 (325): 194- 206
|
13 |
Crandell L J, Dunson D B. Posterior simulation across nonparametric models for functional clustering. Sankhya B, 2011, 73 (1): 42- 61
|
14 |
Dalal S R. Dirichlet invariant processes and applications to nonparametric estimation of symmetric distribution functions. Stochastic Process. Appl., 1979, 9 (1): 99- 107
|
15 |
De Iorio M, Müller P, Rosner G L, MacEacher S N. An ANOVA model for dependent random measures. J. Amer. Statist. Assoc., 2004, 99 (465): 205- 215
|
16 |
Doss H. Bayesian nonparametric estimation of the median: Part I. Computation of the estimates. Ann. Statist., 1985, 13 (4): 1432- 1444
|
17 |
Doss H. Bayesian nonparametric estimation of the median: Part II. Asymptotic properties of the estimates. Ann. Statist., 1985, 13 (4): 1445- 1464
|
18 |
Doss H. Bayesian nonparametric estimation for incomplete data via successive substitution sampling. Ann. Statist., 1994, 22 (4): 1763- 1786
|
19 |
Duan J A, Guindani M, Gelfand A E. Generalized spatial Dirichlet process models. Biometrika, 2007, 94 (4): 809- 825
|
20 |
Dunson D B. Nonparametric Bayes local partition models for random effects. Biometrika, 2009, 96 (2): 249- 262
|
21 |
Dunson D B, Park J H. Kernel stick-breaking processes. Biometrika, 2008, 95 (2): 307- 323
|
22 |
Dunson D B, Pillai N, Park J H. Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2007, 69 (2): 163- 183
|
23 |
Escobar M D. Estimating the means of several normal populations by estimating the distribution of the means, Ph. D. Thesis. New Haven: Yale Univ., 1988
|
24 |
Escobar M D. Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc., 1994, 89 (425): 268- 277
|
25 |
Escobar M D, West M. Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc., 1995, 90 (430): 577- 588
|
26 |
Ewens W J. Population Genetics Theory — The Past and the Future. In: Lessard S. (eds) Mathematical and Statistical Developments of Evolutionary Theory. NATO ASI Series (Series C: Mathematical and Physical Sciences), vol 299. Dordrecht: Springer, 1990
|
27 |
Fabius J. Asymptotic behavior of Bayes’ estimates. Ann. Math. Statist., 1964, 35 (2): 846- 856
|
28 |
Ferguson T S. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973, 1 (2): 209- 230
|
29 |
Ferguson T S. Prior distributions on spaces of probability measures. Ann. Statist., 1974, 2 (4): 615- 629
|
30 |
Fong D K H, Pammer S E, Arnold S F, Bolton G E. Reanalyzing ultimatum bargaining: comparing nondecreasing curves without shape constraints. J. Busin. Econom. Statist., 2002, 20 (3): 423- 430
|
31 |
Freedman D A. On the asymptotic behavior of Bayes’ estimates in the discrete case II. Ann. Math. Statist., 1963, 34 (4): 1386- 1403
|
32 |
Gelfand A E, Kottas A. A computational approach for full nonparametric Bayesian inference under Dirichlet Process mixture models. J. Comput. Graph. Stat., 2002, 11 (2): 289- 305
|
33 |
Gelfand A E, Kottas A. Bayesian semiparametric for median residual life. Scandinavian Journal of Statistics, 2003, 30 (4): 651- 665
|
34 |
Gelfand A E, Kottas A, MacEachern S N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc., 2005, 100 (471): 1021- 1035
|
35 |
Gelfand A E, Kuo L. Nonparametric Bayesian bioassay including ordered polytomous response. Biometrika, 1991, 78 (3): 657- 666
|
36 |
Gelfand A E, Smith A F M. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 1990, 85 (410): 398- 409
|
37 |
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Trans. Pattern Anal. Mach. Intell., 1984, PAMI-6 (6): 721- 741
|
38 |
Ghosh J K, Ramamoorthi R V. Bayesian Nonparametrics, New York: Springer-Verlag,, 2003
|
39 |
Giudici P, Mezzetti M, Muliere P. Mixtures of products of Dirichlet processes for variable selection in survival analysis. J. Statist. Plann. Inference, 2003, 111 (1/2): 101- 115
|
40 |
Gou J W, Xia Y M, Jiang D P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Statistical Modelling, 2021,
|
41 |
Griffin J E, Steel M F J. Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc., 2006, 101 (473): 179- 194
|
42 |
Halmos P R. Random alms. Ann. Math. Statist., 1944, 15 (2): 182- 189
|
43 |
Hanson T E. Inference for mixtures of finite Polya tree models. J. Amer. Statist. Assoc., 2006, 101 (476): 1548- 1565
|
44 |
Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57 (1): 97- 109
|
45 |
Ishwaran H, James L F. Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 2001, 96 (453): 161- 173
|
46 |
Ishwaran H, James L F. Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comput. Graph. Stat., 2002, 11 (3): 508- 532
|
47 |
Ishwaran H, James L F. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sin., 2003, 13 (4): 1211- 1235
|
48 |
Ishwaran H, James L F. Computational methods for multiplicative intensity models using weighted Gamma process: proportional hazards, marked point processes, and panel count data. J. Amer. Statist. Assoc., 2004, 99 (465): 175- 190
|
49 |
Ishwaran H, Takahara G. Independent and identically distributed Monte Carlo algorithms for semiparametric linear mixed models. J. Amer. Statist. Assoc., 2002, 97 (460): 1154- 1166
|
50 |
Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika, 2000, 87 (2): 371- 390
|
51 |
Jöreskog K, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Hove and London: Scientific Software International, 1996
|
52 |
Kelloway E K. Using Mplus for Structural Equation Modeling. Canadian Psychology, 1998, 40 (4): 381- 383
|
53 |
Kingman J F C, Taylor S J, Hawkes A G, Walker A M, Cox D R, Smith A F M, Hill B M, Burville P J, Leonard T. Random discrete distributions. J. R. Stat. Soc. Ser. B., 1975, 37: 1- 22
|
54 |
Kleinman K P, Ibrahim J G. A semiparametric Bayesian approach to the random effects model. Biometrics, 1998, 54 (3): 921- 938
|
55 |
Kleinman K P, Ibrahim J G. A semi-parametric Bayesian approach to generalized linear mixed models. Statist. Med., 1998, 17 (22): 2579- 2596
|
56 |
Kolmogorov A N. Foundations of the Theory of Probability, 2nd ed., trans. Nathan Morrison (1956). Chelsea: New-York, 1933. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288
|
57 |
Kong A, Liu J S, Wong W H. Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288
|
58 |
Korwar R M, Hollander M. Contributions to the theory of Dirichlet processes. Ann. Probab., 1973, 1 (4): 705- 711
|
59 |
Kuo L. Computations of mixtures of Dirichlet processes. SIAM J. Sci. Stat. Comput., 1986, 7 (1): 60- 71
|
60 |
Lavine M. Some aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1992, 20 (3): 1222- 1235
|
61 |
Lavine M. More aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1994, 22 (3): 1161- 1176
|
62 |
Lee S Y. Structural Equation Modeling: A Bayesian Approach. Chichester: John Wiley & Sons., 2007
|
63 |
Lee S Y, Lu B, Song X Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statist. Med., 2008, 27 (13): 2341- 2360
|
64 |
Lennox K P, Dahl D B, Vannucci M, Day R, Tsai J W. A Dirichlet process mixture of hidden Markov Models for protein structure prediction. Ann. Appl. Stat., 2010, 4 (2): 916- 942
|
65 |
Li Y S, Lin X H, Müller P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics, 2010, 66 (1): 70- 78
|
66 |
Liu J S. Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist., 1996, 24 (3): 911- 930
|
67 |
Lo A Y. On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist., 1984, 12 (1): 351- 357
|
68 |
MacEachern S N. Estimating normal means with a conjugate style Dirichlet process prior. Comm. Stat. Simulat. Comput., 1994, 23 (3): 727- 741
|
69 |
MacEachern S N. Dependent Dirichlet processes, In: ASA Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: Amer. Statist. Assoc., 1999: 50- 55
|
70 |
MacEachern S N. Decision theoretic aspects of dependent nonparametric processes. In: Bayesian Methods with Applications to Science, Policy and Official Statistics, Crete: International Society for Bayesian Analysis, 2000: 551- 560
|
71 |
MacEachern S N, Clyde M, Liu J S. Sequential importance sampling for nonparametric Bayes models: The next generation. Canad. J. Statist., 1999, 27 (2): 251- 267
|
72 |
MacEachern S N, Müller P. Estimating mixture of Dirichlet process models. J. Comput. Graph. Stat., 1998, 7 (2): 223- 238
|
73 |
MacEachern S N, Müller P. Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models. In: Robust Bayesian Analysis, Lecture Notes in Statist., Vol. 152. New York: Springer-Verlag, 2000: 295- 315
|
74 |
McCloskey J W. A model for the distribution of individuals by species in an environment. Ph.D. Thesis, East Lansing, MI: Michigan State Univ., 1965
|
75 |
Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys., 1953, 21 (6): 1087- 1092
|
76 |
Mira A, Petrone S. Bayesian hierarchical non-parametric inference for change-point problems. In: Bayesian Statistics 5, Oxford: Oxford Univ. Press, 1996: 693- 703
|
77 |
Muliere P, Petrone S. A Bayesian predictive approach to sequential search for an optimal dose: parametric and nonparametric models. J. Ital. Statist. Soc., 1993, 2 (3): 349- 364
|
78 |
Muliere P, Tardella L. Approximating distributions of random functionals of FergusonDirichlet priors. Canadian J. Statist., 1998, 26 (2): 283- 297
|
79 |
Müller P, Erkanli A, West M. Bayesian curving fitting using multivariate normal mixtures. Biometrika, 1996, 83 (1): 67- 79
|
80 |
Müller P, Quintana F, Rosner G. A method for combining inference across related nonparametric Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2004, 66 (3): 735- 749
|
81 |
Müller P, Quintana F, Rosner G. A product partition model with regression on covariates. Journal of Computational and Graphical Statistics, 2011, 20, 260- 278
|
82 |
Müller P, Quintana F A, Rosner G L, Maitland M L. Bayesian inference for longitudinal data with non-parametric treatment effects. Biostatistics, 2014, 15 (2): 341- 352
|
83 |
Muthén L K, Muthén B O. Mplus user’s guild. Los Angels, CA: Muthén & Muthé, 1998. Biostatistics, 2014, 15 (2): 341- 352
|
84 |
Neal R M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist., 2000, 9 (2): 249- 265
|
85 |
Papaspiliopoulos O, Roberts G O. Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 2008, 95 (1): 169- 186
|
86 |
Petrone S, Guindani M, Gelfand A E. Hybrid dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2009, 71 (4): 755- 782
|
87 |
Pitman J. Some developments of the Blackwell-MacQueen urn scheme. In: Statistics, Probability and Game Theory, Papers in honor of David Blackwell, Hayward, CA: IMS, 1996: 245- 267
|
88 |
Pitman J. Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab., 1996, 28 (2): 525- 539
|
89 |
Reich B J, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat., 2007, 1 (1): 249- 264
|
90 |
Ripley B D. Stochastic Simulation. Chichester: John Wiley & Sons, 1987
|
91 |
Rodríguez, A, Dunson D B, Gelfand A E. The nested Dirichlet process. J. Amer. Statist. Assoc., 2008, 103 (483): 1131- 1154
|
92 |
Rodriguez A, Dunson D B, Gelfand A E. Bayesian nonparametric functional data analysis through density estimation. Biometrika, 2009, 96 (1): 149- 162
|
93 |
Scarpa B, Dunson D B. Enriched stick-breaking processes for functional data. J. Amer. Statist. Assoc., 2014, 109 (506): 647- 660
|
94 |
Sethuraman J. A constructive definition of Dirichlet priors. Statist. Sin., 1994, 4 (2): 639- 650
|
95 |
Sethuraman J, Tiwari R C. Convergence of Dirichlet measures and the interpretation of their parameters. In: Statistical Decision Theory and Related Topics III, New York: Academic Press, 1982: 305- 316
|
96 |
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York: Chapman & Hall/CRC, 2004
|
97 |
Song X Y, Lee S Y. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. New York: John Wiley & Sons, 2012
|
98 |
Song X Y, Xia Y M, Lee S Y. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables. Statist. Med., 2009, 28 (17): 2253- 2276
|
99 |
Song X Y, Xia Y M, Pan J H, Lee S Y. Model comparison of Bayesian semiparametric and parametric structural equation models. Struct. Equat. Model., 2011, 18 (1): 55- 72
|
100 |
Tang A M, Tang N S. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statist. Med., 2015, 34 (5): 824- 843
|
101 |
Tanner M A, Wong W H. The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc., 1987, 82 (398): 528- 540
|
102 |
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 2006, 101 (476): 1566- 1581
|
103 |
Tomlinson G, Escobar M. Analysis of densities. Technical Report, Toronto: University of Toronto, 1999
|
104 |
Walker S G. Sampling the Dirichlet mixture model with slices, Comm. Statist. Simulation Comput., 2007, 36 (1): 45- 54
|
105 |
West M, Müller P, Escobar M D. Hierarchical priors and mixtures models, with applications in regression and density estimates. In: Aspects of Uncertainty, A Tribute to D. V. Lindley. London: John Wiley & Sons, 1994: 363- 386
|
106 |
Xia Y M, Gou J W. Assessing heterogeneity in multilevel factor analysis model A semiparametric Bayesian approach. Acta Math. Sin., 2015, 38 (4): 751- 768 (in Chinese)
|
107 |
Xia Y M, Gou J W. Bayesian semiparametric analysis for latent variable models with mixed continuous and ordinal outcomes. J. Korean Statist. Soc., 2016, 45 (3): 451- 465
|
108 |
Xia Y M, Gou J W, Liu Y A. Semi-parametric Bayesian analysis for factor analysis model mixed with hidden Markov model. Appl. Math. J. Chinese Univ. Ser. A, 2015, 30 (1): 17- 30 (in Chinese)
|
109 |
Xia Y M, Liu Y A. Bayesian semiparametric analysis and model comparison for confirmatory factor model. Chinese J. Appl. Probab. Statist., 2016, 32 (2): 157- 183
|
110 |
Xia Y M, Pan M L. Bayesian analysis for confirmatory factor model with finitedimensional Dirichlet prior mixing. Comm. Statist. Theory Methods, 2017, 46 (9): 4599- 4619
|
111 |
Xia Y M, Tang N S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Computational Statistics & Data Analysis, 2019, 132: 190- 211
|
112 |
Yang M G, Dunson D B. Bayesian semiparametric structural equation models with latent variables. Psychometrika, 2010, 75 (4): 675- 693
|
/
〈 | 〉 |