Model-Free Feature Screening for Ultrahigh-Dimensional Multiclass Classification via a Likelihood Ratio-Based Measure of Dependence

Fei Ye , Weidong Ma , Jingsong Xiao , Ying Yang

Communications in Mathematics and Statistics ›› : 1 -42.

PDF
Communications in Mathematics and Statistics ›› :1 -42. DOI: 10.1007/s40304-025-00466-1
Article
research-article
Model-Free Feature Screening for Ultrahigh-Dimensional Multiclass Classification via a Likelihood Ratio-Based Measure of Dependence
Author information +
History +
PDF

Abstract

This article proposes a new likelihood ratio-based index (LR index for short), to measure the dependence between a categorical response variable and a continuous predictor variable. The LR index is nonnegative and is zero if and only if the variables are independent. We propose an estimate of the index, develop a novel independence test and derive the asymptotic null distribution. Next, based on the LR index, a feature screening procedure (LR-SIS for short) is developed for multiclass classification with ultrahigh-dimensional predictors. LR-SIS is model-free and robust to the heavy-tailed distribution of predictors and outliers. The sure screening property of LR-SIS is established allowing the number of response classes to be diverging. The finite sample performance of the proposed LR index in both independence testing and feature screening is demonstrated by comprehensive simulation studies. Application of the LR-SIS is also illustrated on a real data set.

Keywords

Feature screening / Test of independence / Multiclass classification / Ultrahigh dimensionality / 62H20 / 62H30 / 62F07

Cite this article

Download citation ▾
Fei Ye, Weidong Ma, Jingsong Xiao, Ying Yang. Model-Free Feature Screening for Ultrahigh-Dimensional Multiclass Classification via a Likelihood Ratio-Based Measure of Dependence. Communications in Mathematics and Statistics 1-42 DOI:10.1007/s40304-025-00466-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Cui H, Zhong W. A distribution-free test of independence based on mean variance index. Comput. Stat. Data Anal.. 2019, 139: 117-133.

[2]

Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc.. 2015, 110(510): 630-641.

[3]

Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics. 2004, 20(18): 3583-3593.

[4]

Einmahl JHJ, McKeague IW. Empirical likelihood based hypothesis testing. Bernoulli. 2003, 9(2): 267-290.

[5]

Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat Methodol.. 2008, 70(5): 849-911.

[6]

Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat.. 2010, 38(6): 3567-3604.

[7]

Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc.. 2011, 106(494): 544-557.

[8]

Fan J, Ma Y, Dai W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Am. Stat. Assoc.. 2014, 109(507): 1270-1284.

[9]

He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat.. 2013, 41(1): 342-369.

[10]

He S, Ma S, Xu W. A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Comput. Stat. Data Anal.. 2019, 137: 155-169.

[11]

Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Stat.. 2008, 36(2): 587-613.

[12]

Huang D, Li R, Wang H. Feature screening for ultrahigh dimensional categorical data with applications. J. Bus. Econ. Stat.. 2014, 32(2): 237-244.

[13]

Jiang B, Ye C, Liu JS. Nonparametric K\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K$$\end{document}-sample tests via dynamic slicing. J. Am. Stat. Assoc.. 2015, 110(510): 642-653.

[14]

Khan J, Wei J, Ringnér M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C, Peterson C, Meltzer P. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med.. 2001, 7(6): 673-679.

[15]

Kong E, Xia Y, Zhong W. Composite coefficient of determination and its application in ultrahigh dimensional variable screening. J. Am. Stat. Assoc.. 2019, 114(528): 1740-1751.

[16]

Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J. Am. Stat. Assoc.. 2012, 107(499): 1129-1139.

[17]

Liu J, Li R, Wu R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J. Am. Stat. Assoc.. 2014, 109(505): 266-274.

[18]

Liu W, Ke Y, Liu J, Li R. Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc.. 2022, 117(537): 428-443.

[19]

Ma S, Li R, Tsai C-L. Variable screening via quantile partial correlation. J. Am. Stat. Assoc.. 2017, 112(518): 650-663.

[20]

Ma W, Xiao J, Yang Y, Ye F. Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. J. Stat. Comput. Simul.. 2022, 92(15): 3222-3248.

[21]

Mai Q, Zou H. The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika. 2013, 100(1): 229-234.

[22]

Mai Q, Zou H. The fused Kolmogorov filter: a nonparametric model-free screening method. Ann. Stat.. 2015, 43(4): 1471-1497.

[23]

Ni, L., Fang, F., Shao, J.: Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Comput. Stat. Data Anal. 142, 106824-15 (2020). https://doi.org/10.1016/j.csda.2019.106824

[24]

Ni, L., Fang, F., Shao, J.: Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Comput. Statist. Data Anal. 142, 106824-15 (2020). https://doi.org/10.1016/j.csda.2019.106824

[25]

Ni L, Fang F. Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. J. Nonparam. Stat.. 2016, 28(3): 515-530.

[26]

Ni L, Fang F, Wan F. Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika. 2017, 80(6–8): 805-828.

[27]

Pan R, Wang H, Li R. Ultrahigh-dimensional multiclass linear discriminant analysis by pairwise sure independence screening. J. Am. Stat. Assoc.. 2016, 111(513): 169-179.

[28]

Scholz F-W, Stephens MA. k\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k$$\end{document}-sample Anderson-Darling tests. J. Am. Stat. Assoc.. 1987, 82(399): 918-924.

[29]

Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann. Stat.. 2007, 35(6): 2769-2794.

[30]

Tang W, Xie J, Lin Y, Tang N. Quantile correlation-based variable selection. J. Bus. Econ. Stat.. 2022, 40(3): 1081-1093.

[31]

Witten DM, Tibshirani R. Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Ser. B Stat Methodol.. 2011, 73(5): 753-772.

[32]

Xie J, Lin Y, Yan X, Tang N. Category-adaptive variable screening for ultra-high dimensional heterogeneous categorical data. J. Am. Stat. Assoc.. 2020, 115(530): 747-760.

[33]

Yan X, Tang N, Xie J, Ding X, Wang Z. Fused mean-variance filter for feature screening. Comput. Stat. Data Anal.. 2018, 122: 18-32.

[34]

Zhang J, Wu Y. k\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k$$\end{document}-sample tests based on the likelihood ratio. Comput. Stat. Data Anal.. 2007, 51(9): 4682-4691.

[35]

Zhang S, Zhou Y. Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations. J. Multivar. Anal.. 2018, 165: 1-13.

[36]

Zhong W, Wang J, Chen X. Censored mean variance sure independence screening for ultrahigh dimensional survival data. Comput. Stat. Data Anal.. 2021, 159. 107206

[37]

Zhou Y, Zhu L. Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation. Stat. Sin.. 2018, 28(3): 1351-1370.

[38]

Zhu L-P, Li L, Li R, Zhu L-X. Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc.. 2011, 106(496): 1464-1475.

[39]

Zhu L, Xu K, Li R, Zhong W. Projection correlation between two random vectors. Biometrika. 2017, 104(4): 829-843.

RIGHTS & PERMISSIONS

School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature

PDF

5

Accesses

0

Citation

Detail

Sections
Recommended

/