SURVEY ARTICLE

Dirichlet process and its developments: a survey

  • Yemao XIA 1 ,
  • Yingan LIU 2 ,
  • Jianwei GOU , 1
Expand
  • 1. School of Sciences, Nanjing Forestry University, Nanjing 210037, China
  • 2. College of Information Science and Technology,Nanjing Forestry University,Nanjing 210037, China

Published date: 15 Feb 2022

Copyright

2022 Higher Education Press

Abstract

The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random, and assign them a prior. Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting. As the distribution of distribution, Dirichlet process (DP) is the most appreciated nonparametric prior due to its nice theoretical proprieties, modeling flexibility and computational feasibility. In this paper, we review and summarize some developments of DP during the past decades. Our focus is mainly concentrated upon its theoretical properties, various extensions, statistical modeling and applications to the latent variable models.

Cite this article

Yemao XIA , Yingan LIU , Jianwei GOU . Dirichlet process and its developments: a survey[J]. Frontiers of Mathematics in China, 2022 , 17(1) : 79 -115 . DOI: 10.1007/s11464-022-1004-3

1
Aldous D J. Exchangeability and related topics, In: École d’Éte de Probabilités de Saint-Flour XIII–1983, Lecture Notes in Math., Vol. 1117, New York: Springer-Verlag, 1985, 23- 34

2
Antoniak C E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 1974, 2 (6): 1152- 1174

3
Basu S, Chib S. Marginal likelihood and Bayes factors for Dirichlet process mixture models, J. Amer. Statist. Assoc., 2003, 98 (461): 224- 235

DOI

4
Bentler P M, Wu E J C. EQS6: Structural Equations Program Manual. Encino, CA: Multivariate Software, 2006

5
Blackwell D. Discreteness of Ferguson selections. Ann. Statist., 1973, 1 (2): 356- 358

6
Blackwell D, MacQueen J B. Ferguson distributions via polya urn schemes. Ann. Statist., 1973, 1 (2): 353- 355

7
Bollen K A. Structural Equations with Latent Variables. New York: John Wiley & Sons, 1989

8
Bush C A, MacEachern S N. A semiparametric Bayesian model for randomised block designs. Biometrika, 1996, 83 (2): 275- 285

DOI

9
Carota C, Parmigiani G. Semiparametric regression for count data. Biometrika, 2002, 89 (2): 265- 281

DOI

10
Chow S M, Tang N S, Yuan Y, Song X Y, Zhu H T. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior. Br. J. Math. Stat. Psychol., 2011, 64 (1): 69- 106

DOI

11
Cifarelli D, Regazzini E. Problemi statistici non parametrici in condizioni di scambialbilità parziale impiego di medie associative. Technical Report, Quad. Insitit. Mat. Finana. Univ. Torino III, 1978, 1- 13 (in Italian)

12
Connor R J, Mosimann J E. Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Amer. Statist. Assoc., 1969, 64 (325): 194- 206

DOI

13
Crandell L J, Dunson D B. Posterior simulation across nonparametric models for functional clustering. Sankhya B, 2011, 73 (1): 42- 61

DOI

14
Dalal S R. Dirichlet invariant processes and applications to nonparametric estimation of symmetric distribution functions. Stochastic Process. Appl., 1979, 9 (1): 99- 107

DOI

15
De Iorio M, Müller P, Rosner G L, MacEacher S N. An ANOVA model for dependent random measures. J. Amer. Statist. Assoc., 2004, 99 (465): 205- 215

DOI

16
Doss H. Bayesian nonparametric estimation of the median: Part I. Computation of the estimates. Ann. Statist., 1985, 13 (4): 1432- 1444

17
Doss H. Bayesian nonparametric estimation of the median: Part II. Asymptotic properties of the estimates. Ann. Statist., 1985, 13 (4): 1445- 1464

18
Doss H. Bayesian nonparametric estimation for incomplete data via successive substitution sampling. Ann. Statist., 1994, 22 (4): 1763- 1786

19
Duan J A, Guindani M, Gelfand A E. Generalized spatial Dirichlet process models. Biometrika, 2007, 94 (4): 809- 825

DOI

20
Dunson D B. Nonparametric Bayes local partition models for random effects. Biometrika, 2009, 96 (2): 249- 262

DOI

21
Dunson D B, Park J H. Kernel stick-breaking processes. Biometrika, 2008, 95 (2): 307- 323

DOI

22
Dunson D B, Pillai N, Park J H. Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2007, 69 (2): 163- 183

DOI

23
Escobar M D. Estimating the means of several normal populations by estimating the distribution of the means, Ph. D. Thesis. New Haven: Yale Univ., 1988

24
Escobar M D. Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc., 1994, 89 (425): 268- 277

DOI

25
Escobar M D, West M. Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc., 1995, 90 (430): 577- 588

DOI

26
Ewens W J. Population Genetics Theory — The Past and the Future. In: Lessard S. (eds) Mathematical and Statistical Developments of Evolutionary Theory. NATO ASI Series (Series C: Mathematical and Physical Sciences), vol 299. Dordrecht: Springer, 1990

27
Fabius J. Asymptotic behavior of Bayes’ estimates. Ann. Math. Statist., 1964, 35 (2): 846- 856

DOI

28
Ferguson T S. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973, 1 (2): 209- 230

29
Ferguson T S. Prior distributions on spaces of probability measures. Ann. Statist., 1974, 2 (4): 615- 629

30
Fong D K H, Pammer S E, Arnold S F, Bolton G E. Reanalyzing ultimatum bargaining: comparing nondecreasing curves without shape constraints. J. Busin. Econom. Statist., 2002, 20 (3): 423- 430

DOI

31
Freedman D A. On the asymptotic behavior of Bayes’ estimates in the discrete case II. Ann. Math. Statist., 1963, 34 (4): 1386- 1403

DOI

32
Gelfand A E, Kottas A. A computational approach for full nonparametric Bayesian inference under Dirichlet Process mixture models. J. Comput. Graph. Stat., 2002, 11 (2): 289- 305

DOI

33
Gelfand A E, Kottas A. Bayesian semiparametric for median residual life. Scandinavian Journal of Statistics, 2003, 30 (4): 651- 665

DOI

34
Gelfand A E, Kottas A, MacEachern S N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc., 2005, 100 (471): 1021- 1035

DOI

35
Gelfand A E, Kuo L. Nonparametric Bayesian bioassay including ordered polytomous response. Biometrika, 1991, 78 (3): 657- 666

DOI

36
Gelfand A E, Smith A F M. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 1990, 85 (410): 398- 409

DOI

37
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Trans. Pattern Anal. Mach. Intell., 1984, PAMI-6 (6): 721- 741

DOI

38
Ghosh J K, Ramamoorthi R V. Bayesian Nonparametrics, New York: Springer-Verlag,, 2003

39
Giudici P, Mezzetti M, Muliere P. Mixtures of products of Dirichlet processes for variable selection in survival analysis. J. Statist. Plann. Inference, 2003, 111 (1/2): 101- 115

40
Gou J W, Xia Y M, Jiang D P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Statistical Modelling, 2021,

DOI

41
Griffin J E, Steel M F J. Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc., 2006, 101 (473): 179- 194

DOI

42
Halmos P R. Random alms. Ann. Math. Statist., 1944, 15 (2): 182- 189

DOI

43
Hanson T E. Inference for mixtures of finite Polya tree models. J. Amer. Statist. Assoc., 2006, 101 (476): 1548- 1565

DOI

44
Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57 (1): 97- 109

DOI

45
Ishwaran H, James L F. Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 2001, 96 (453): 161- 173

DOI

46
Ishwaran H, James L F. Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comput. Graph. Stat., 2002, 11 (3): 508- 532

DOI

47
Ishwaran H, James L F. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sin., 2003, 13 (4): 1211- 1235

48
Ishwaran H, James L F. Computational methods for multiplicative intensity models using weighted Gamma process: proportional hazards, marked point processes, and panel count data. J. Amer. Statist. Assoc., 2004, 99 (465): 175- 190

DOI

49
Ishwaran H, Takahara G. Independent and identically distributed Monte Carlo algorithms for semiparametric linear mixed models. J. Amer. Statist. Assoc., 2002, 97 (460): 1154- 1166

DOI

50
Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika, 2000, 87 (2): 371- 390

DOI

51
Jöreskog K, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Hove and London: Scientific Software International, 1996

52
Kelloway E K. Using Mplus for Structural Equation Modeling. Canadian Psychology, 1998, 40 (4): 381- 383

53
Kingman J F C, Taylor S J, Hawkes A G, Walker A M, Cox D R, Smith A F M, Hill B M, Burville P J, Leonard T. Random discrete distributions. J. R. Stat. Soc. Ser. B., 1975, 37: 1- 22

54
Kleinman K P, Ibrahim J G. A semiparametric Bayesian approach to the random effects model. Biometrics, 1998, 54 (3): 921- 938

DOI

55
Kleinman K P, Ibrahim J G. A semi-parametric Bayesian approach to generalized linear mixed models. Statist. Med., 1998, 17 (22): 2579- 2596

DOI

56
Kolmogorov A N. Foundations of the Theory of Probability, 2nd ed., trans. Nathan Morrison (1956). Chelsea: New-York, 1933. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288

DOI

57
Kong A, Liu J S, Wong W H. Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc., 1994, 89 (425): 278- 288

DOI

58
Korwar R M, Hollander M. Contributions to the theory of Dirichlet processes. Ann. Probab., 1973, 1 (4): 705- 711

59
Kuo L. Computations of mixtures of Dirichlet processes. SIAM J. Sci. Stat. Comput., 1986, 7 (1): 60- 71

DOI

60
Lavine M. Some aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1992, 20 (3): 1222- 1235

61
Lavine M. More aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1994, 22 (3): 1161- 1176

62
Lee S Y. Structural Equation Modeling: A Bayesian Approach. Chichester: John Wiley & Sons., 2007

63
Lee S Y, Lu B, Song X Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statist. Med., 2008, 27 (13): 2341- 2360

DOI

64
Lennox K P, Dahl D B, Vannucci M, Day R, Tsai J W. A Dirichlet process mixture of hidden Markov Models for protein structure prediction. Ann. Appl. Stat., 2010, 4 (2): 916- 942

65
Li Y S, Lin X H, Müller P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics, 2010, 66 (1): 70- 78

DOI

66
Liu J S. Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist., 1996, 24 (3): 911- 930

67
Lo A Y. On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist., 1984, 12 (1): 351- 357

68
MacEachern S N. Estimating normal means with a conjugate style Dirichlet process prior. Comm. Stat. Simulat. Comput., 1994, 23 (3): 727- 741

DOI

69
MacEachern S N. Dependent Dirichlet processes, In: ASA Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: Amer. Statist. Assoc., 1999: 50- 55

70
MacEachern S N. Decision theoretic aspects of dependent nonparametric processes. In: Bayesian Methods with Applications to Science, Policy and Official Statistics, Crete: International Society for Bayesian Analysis, 2000: 551- 560

71
MacEachern S N, Clyde M, Liu J S. Sequential importance sampling for nonparametric Bayes models: The next generation. Canad. J. Statist., 1999, 27 (2): 251- 267

DOI

72
MacEachern S N, Müller P. Estimating mixture of Dirichlet process models. J. Comput. Graph. Stat., 1998, 7 (2): 223- 238

73
MacEachern S N, Müller P. Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models. In: Robust Bayesian Analysis, Lecture Notes in Statist., Vol. 152. New York: Springer-Verlag, 2000: 295- 315

74
McCloskey J W. A model for the distribution of individuals by species in an environment. Ph.D. Thesis, East Lansing, MI: Michigan State Univ., 1965

75
Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys., 1953, 21 (6): 1087- 1092

DOI

76
Mira A, Petrone S. Bayesian hierarchical non-parametric inference for change-point problems. In: Bayesian Statistics 5, Oxford: Oxford Univ. Press, 1996: 693- 703

77
Muliere P, Petrone S. A Bayesian predictive approach to sequential search for an optimal dose: parametric and nonparametric models. J. Ital. Statist. Soc., 1993, 2 (3): 349- 364

DOI

78
Muliere P, Tardella L. Approximating distributions of random functionals of FergusonDirichlet priors. Canadian J. Statist., 1998, 26 (2): 283- 297

DOI

79
Müller P, Erkanli A, West M. Bayesian curving fitting using multivariate normal mixtures. Biometrika, 1996, 83 (1): 67- 79

DOI

80
Müller P, Quintana F, Rosner G. A method for combining inference across related nonparametric Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2004, 66 (3): 735- 749

DOI

81
Müller P, Quintana F, Rosner G. A product partition model with regression on covariates. Journal of Computational and Graphical Statistics, 2011, 20, 260- 278

DOI

82
Müller P, Quintana F A, Rosner G L, Maitland M L. Bayesian inference for longitudinal data with non-parametric treatment effects. Biostatistics, 2014, 15 (2): 341- 352

DOI

83
Muthén L K, Muthén B O. Mplus user’s guild. Los Angels, CA: Muthén & Muthé, 1998. Biostatistics, 2014, 15 (2): 341- 352

DOI

84
Neal R M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist., 2000, 9 (2): 249- 265

85
Papaspiliopoulos O, Roberts G O. Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 2008, 95 (1): 169- 186

DOI

86
Petrone S, Guindani M, Gelfand A E. Hybrid dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2009, 71 (4): 755- 782

DOI

87
Pitman J. Some developments of the Blackwell-MacQueen urn scheme. In: Statistics, Probability and Game Theory, Papers in honor of David Blackwell, Hayward, CA: IMS, 1996: 245- 267

88
Pitman J. Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab., 1996, 28 (2): 525- 539

DOI

89
Reich B J, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat., 2007, 1 (1): 249- 264

90
Ripley B D. Stochastic Simulation. Chichester: John Wiley & Sons, 1987

91
Rodríguez, A, Dunson D B, Gelfand A E. The nested Dirichlet process. J. Amer. Statist. Assoc., 2008, 103 (483): 1131- 1154

DOI

92
Rodriguez A, Dunson D B, Gelfand A E. Bayesian nonparametric functional data analysis through density estimation. Biometrika, 2009, 96 (1): 149- 162

DOI

93
Scarpa B, Dunson D B. Enriched stick-breaking processes for functional data. J. Amer. Statist. Assoc., 2014, 109 (506): 647- 660

DOI

94
Sethuraman J. A constructive definition of Dirichlet priors. Statist. Sin., 1994, 4 (2): 639- 650

95
Sethuraman J, Tiwari R C. Convergence of Dirichlet measures and the interpretation of their parameters. In: Statistical Decision Theory and Related Topics III, New York: Academic Press, 1982: 305- 316

96
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York: Chapman & Hall/CRC, 2004

97
Song X Y, Lee S Y. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. New York: John Wiley & Sons, 2012

98
Song X Y, Xia Y M, Lee S Y. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables. Statist. Med., 2009, 28 (17): 2253- 2276

DOI

99
Song X Y, Xia Y M, Pan J H, Lee S Y. Model comparison of Bayesian semiparametric and parametric structural equation models. Struct. Equat. Model., 2011, 18 (1): 55- 72

DOI

100
Tang A M, Tang N S. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statist. Med., 2015, 34 (5): 824- 843

DOI

101
Tanner M A, Wong W H. The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc., 1987, 82 (398): 528- 540

DOI

102
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 2006, 101 (476): 1566- 1581

DOI

103
Tomlinson G, Escobar M. Analysis of densities. Technical Report, Toronto: University of Toronto, 1999

104
Walker S G. Sampling the Dirichlet mixture model with slices, Comm. Statist. Simulation Comput., 2007, 36 (1): 45- 54

DOI

105
West M, Müller P, Escobar M D. Hierarchical priors and mixtures models, with applications in regression and density estimates. In: Aspects of Uncertainty, A Tribute to D. V. Lindley. London: John Wiley & Sons, 1994: 363- 386

106
Xia Y M, Gou J W. Assessing heterogeneity in multilevel factor analysis model A semiparametric Bayesian approach. Acta Math. Sin., 2015, 38 (4): 751- 768 (in Chinese)

107
Xia Y M, Gou J W. Bayesian semiparametric analysis for latent variable models with mixed continuous and ordinal outcomes. J. Korean Statist. Soc., 2016, 45 (3): 451- 465

DOI

108
Xia Y M, Gou J W, Liu Y A. Semi-parametric Bayesian analysis for factor analysis model mixed with hidden Markov model. Appl. Math. J. Chinese Univ. Ser. A, 2015, 30 (1): 17- 30 (in Chinese)

DOI

109
Xia Y M, Liu Y A. Bayesian semiparametric analysis and model comparison for confirmatory factor model. Chinese J. Appl. Probab. Statist., 2016, 32 (2): 157- 183

110
Xia Y M, Pan M L. Bayesian analysis for confirmatory factor model with finitedimensional Dirichlet prior mixing. Comm. Statist. Theory Methods, 2017, 46 (9): 4599- 4619

DOI

111
Xia Y M, Tang N S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Computational Statistics & Data Analysis, 2019, 132: 190- 211

112
Yang M G, Dunson D B. Bayesian semiparametric structural equation models with latent variables. Psychometrika, 2010, 75 (4): 675- 693

DOI

Outlines

/