Ultra-High Dimensional Model Averaging for Multi-Categorical Response

Jing Lv , Chaohui Guo

Communications in Mathematics and Statistics ›› 2026, Vol. 14 ›› Issue (2) : 285 -312.

PDF
Communications in Mathematics and Statistics ›› 2026, Vol. 14 ›› Issue (2) :285 -312. DOI: 10.1007/s40304-023-00379-x
Article
research-article
Ultra-High Dimensional Model Averaging for Multi-Categorical Response
Author information +
History +
PDF

Abstract

Model averaging has been considered to be a powerful tool for model-based prediction in the past decades. However, its application in ultra-high dimensional multi-categorical data is faced with challenges arising from the model uncertainty and heterogeneity. In this article, a novel two-step model averaging method is proposed for multi-categorical response when the number of covariates is ultra-high. First, a class of adaptive multinomial logistic regression candidate models are constructed where different covariates for each category are allowed to accommodate heterogeneity. Second, the optimal model weights is chosen by applying the Kullback–Leibler loss plus a penalty term. We show that the proposed model averaging estimator is asymptotically optimal by achieving the minimum Kullback–Leibler loss among all possible averaging estimators. Empirical evidences from simulation studies and a real data example demonstrate that the proposed model averaging method has superior performance to the state-of-the-art approaches.

Keywords

Asymptotic optimality / Kullback–Leibler loss / Model averaging / Multinomial logistic regression / Ultra-high dimensionality / 62F10 / 62H30

Cite this article

Download citation ▾
Jing Lv, Chaohui Guo. Ultra-High Dimensional Model Averaging for Multi-Categorical Response. Communications in Mathematics and Statistics, 2026, 14(2): 285-312 DOI:10.1007/s40304-023-00379-x

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Ando T, Li K. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc.. 2014, 109: 254-265.

[2]

Ando T, Li K. A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann. Stat.. 2017, 45: 2654-2679.

[3]

Breiman L. Random forests. Mach. Learn.. 2001, 45: 5-32.

[4]

Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. 1984, Belmont, CA, Wadsworth

[5]

Chen L, Wan A, Tso G, Zhang X. A model averaging approach for the ordered probit and nested logit models with applications. J. Appl. Stat.. 2018, 45: 3012-3052.

[6]

Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc.. 2015, 110: 630-641.

[7]

Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.. 2001, 96: 1348-1360.

[8]

Fang F, Li J, Xia X. Semiparametric model averaging prediction for dichotomous response. J. Econom.. 2022, 229: 219-245.

[9]

Fang F, Yuan C, Tian W. An asymptotic theory for least squares model averaging with nested models. Econom. Theo.. 2022.

[10]

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.. 2008, 33: 1-22

[11]

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.. 2010, 33: 1-22.

[12]

He B, Ma S, Zhang X, Zhu L. Rank-based greedy model averaging for high-dimensional survival data. J. Am. Stat. Assoc.. 2022.

[13]

Hosmer D, Lemeshow S. Applied Logistic Regression. 2004, New York, John Wiley and Sons

[14]

Lee Y, Lin Y, Wahba G. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc.. 2004, 99: 67-81.

[15]

Li J, Jiang B, Fine J. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics. 2013, 14: 382-394.

[16]

Li J, Lv J, Wan A, Liao J. Adaboost semiparametric model averaging prediction for multiple categories. J. Am. Stat. Assoc.. 2022, 117: 495-509.

[17]

Li J, Xia X, Wong W, Nott D. Varying-coefficient semiparametric model averaging prediction. Biometrics. 2018, 74: 1417-1426.

[18]

Li J, Yu T, Lv J, Lee M. Semiparametric model averaging prediction for lifetime data via hazards regression. J. R. Stat. Soc. Ser. C. 2021, 70: 1187-1209.

[19]

Lu X, Su L. Jackknife model averaging for quantile regressions. J. Econom.. 2015, 188: 40-58.

[20]

Ma S, Li R, Tsai C. Variable screening via quantile partial correlation. J. Am. Stat. Assoc.. 2017, 112: 650-663.

[21]

Mai Q, Zou H. The kolmogorov filter forvariable screening in high-dimensional binary classification. Biometrika. 2013, 100: 229-234.

[22]

Ripley B. Pattern Recognition and Neural Networks. 1996, Cambridge, Cambridge University Press.

[23]

Vincent M, Hansen N. Sparse group lasso and high dimensional multinomial classification. Comput. Stat. Data Anal.. 2014, 71: 771-786.

[24]

Wan A, Zhang X, Wang S. Frequentist model averaging for multinomial and ordered logit models. Int. J. Forecast.. 2014, 30: 118-128.

[25]

Wan A, Zhang X, Zou G. Least squares model averaging by mallows criterion. J. Econom.. 2010, 156: 277-283.

[26]

Xie J, Lin Y, Yan X, Tang N. Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc.. 2020, 115: 747-760.

[27]

Xie J, Yan X, Tang N. A model-averaging method for high-dimensional regression with missing responses at random. Stat. Sin.. 2021, 31: 1005-1026

[28]

Zhang X, Yu D, Zou G, Liang H. Model averaging estimation for generalized linear models and generalized linear mixed- effects models. J. Am. Stat. Assoc.. 2016, 111: 1775-1790.

[29]

Zhang X, Zou G, Liang H, Carroll R. Parsimonious model averaging with a diverging number of parameters. J. Am. Stat. Assoc.. 2020, 115: 972-984.

[30]

Zhu L, Li L, Li R, Zhu L. Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc.. 2011, 106: 1464-1475.

[31]

Zhu R, Wan A, Zhang X, Zou G. A mallows-type model averaging estimator for the varying-coefficient partially linear model. J. Am. Stat. Assoc.. 2019, 114: 882-892.

[32]

Zou J, Wang W, Zhang X, Zou G. Optimal model averaging for divergent-dimensional poisson regressions. Economet. Rev.. 2022, 41: 775-805.

Funding

Natural Science Foundation of Chongqing Grant(CSTB2022NSCQ-MSX0852)

the National Natural Science Foundation of China(12201091)

the National Statistical Science Research Program(2022LY019)

the Natural Science Foundation of Chongqing Grant(cstc2021jcyj-msxmX0502)

RIGHTS & PERMISSIONS

School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature

PDF

307

Accesses

0

Citation

Detail

Sections
Recommended

/