PDF
Abstract
Model averaging has been considered to be a powerful tool for model-based prediction in the past decades. However, its application in ultra-high dimensional multi-categorical data is faced with challenges arising from the model uncertainty and heterogeneity. In this article, a novel two-step model averaging method is proposed for multi-categorical response when the number of covariates is ultra-high. First, a class of adaptive multinomial logistic regression candidate models are constructed where different covariates for each category are allowed to accommodate heterogeneity. Second, the optimal model weights is chosen by applying the Kullback–Leibler loss plus a penalty term. We show that the proposed model averaging estimator is asymptotically optimal by achieving the minimum Kullback–Leibler loss among all possible averaging estimators. Empirical evidences from simulation studies and a real data example demonstrate that the proposed model averaging method has superior performance to the state-of-the-art approaches.
Keywords
Asymptotic optimality
/
Kullback–Leibler loss
/
Model averaging
/
Multinomial logistic regression
/
Ultra-high dimensionality
Cite this article
Download citation ▾
Jing Lv, Chaohui Guo.
Ultra-High Dimensional Model Averaging for Multi-Categorical Response.
Communications in Mathematics and Statistics 1-28 DOI:10.1007/s40304-023-00379-x
| [1] |
Ando T, Li K. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc.. 2014, 109 254-265
|
| [2] |
Ando T, Li K. A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann. Stat.. 2017, 45 2654-2679
|
| [3] |
Breiman L. Random forests. Mach. Learn.. 2001, 45 5-32
|
| [4] |
Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. 1984 Belmont, CA: Wadsworth
|
| [5] |
Chen L, Wan A, Tso G, Zhang X. A model averaging approach for the ordered probit and nested logit models with applications. J. Appl. Stat.. 2018, 45 3012-3052
|
| [6] |
Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc.. 2015, 110 630-641
|
| [7] |
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.. 2001, 96 1348-1360
|
| [8] |
Fang F, Li J, Xia X. Semiparametric model averaging prediction for dichotomous response. J. Econom.. 2022, 229 219-245
|
| [9] |
Fang F, Yuan C, Tian W. An asymptotic theory for least squares model averaging with nested models. Econom. Theo.. 2022
|
| [10] |
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.. 2008, 33 1-22
|
| [11] |
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.. 2010, 33 1-22
|
| [12] |
He B, Ma S, Zhang X, Zhu L. Rank-based greedy model averaging for high-dimensional survival data. J. Am. Stat. Assoc.. 2022
|
| [13] |
Hosmer D, Lemeshow S. Applied Logistic Regression. 2004 New York: John Wiley and Sons
|
| [14] |
Lee Y, Lin Y, Wahba G. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc.. 2004, 99 67-81
|
| [15] |
Li J, Jiang B, Fine J. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics. 2013, 14 382-394
|
| [16] |
Li J, Lv J, Wan A, Liao J. Adaboost semiparametric model averaging prediction for multiple categories. J. Am. Stat. Assoc.. 2022, 117 495-509
|
| [17] |
Li J, Xia X, Wong W, Nott D. Varying-coefficient semiparametric model averaging prediction. Biometrics. 2018, 74 1417-1426
|
| [18] |
Li J, Yu T, Lv J, Lee M. Semiparametric model averaging prediction for lifetime data via hazards regression. J. R. Stat. Soc. Ser. C. 2021, 70 1187-1209
|
| [19] |
Lu X, Su L. Jackknife model averaging for quantile regressions. J. Econom.. 2015, 188 40-58
|
| [20] |
Ma S, Li R, Tsai C. Variable screening via quantile partial correlation. J. Am. Stat. Assoc.. 2017, 112 650-663
|
| [21] |
Mai Q, Zou H. The kolmogorov filter forvariable screening in high-dimensional binary classification. Biometrika. 2013, 100 229-234
|
| [22] |
Ripley B. Pattern Recognition and Neural Networks. 1996 Cambridge: Cambridge University Press
|
| [23] |
Vincent M, Hansen N. Sparse group lasso and high dimensional multinomial classification. Comput. Stat. Data Anal.. 2014, 71 771-786
|
| [24] |
Wan A, Zhang X, Wang S. Frequentist model averaging for multinomial and ordered logit models. Int. J. Forecast.. 2014, 30 118-128
|
| [25] |
Wan A, Zhang X, Zou G. Least squares model averaging by mallows criterion. J. Econom.. 2010, 156 277-283
|
| [26] |
Xie J, Lin Y, Yan X, Tang N. Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc.. 2020, 115 747-760
|
| [27] |
Xie J, Yan X, Tang N. A model-averaging method for high-dimensional regression with missing responses at random. Stat. Sin.. 2021, 31 1005-1026
|
| [28] |
Zhang X, Yu D, Zou G, Liang H. Model averaging estimation for generalized linear models and generalized linear mixed- effects models. J. Am. Stat. Assoc.. 2016, 111 1775-1790
|
| [29] |
Zhang X, Zou G, Liang H, Carroll R. Parsimonious model averaging with a diverging number of parameters. J. Am. Stat. Assoc.. 2020, 115 972-984
|
| [30] |
Zhu L, Li L, Li R, Zhu L. Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc.. 2011, 106 1464-1475
|
| [31] |
Zhu R, Wan A, Zhang X, Zou G. A mallows-type model averaging estimator for the varying-coefficient partially linear model. J. Am. Stat. Assoc.. 2019, 114 882-892
|
| [32] |
Zou J, Wang W, Zhang X, Zou G. Optimal model averaging for divergent-dimensional poisson regressions. Economet. Rev.. 2022, 41 775-805
|
Funding
Natural Science Foundation of Chongqing Grant(CSTB2022NSCQ-MSX0852)
the National Natural Science Foundation of China(12201091)
the National Statistical Science Research Program(2022LY019)
the Natural Science Foundation of Chongqing Grant(cstc2021jcyj-msxmX0502)