Directional FDR Control for Sub-Gaussian Sparse GLMs

Chang Cui , Jinzhu Jia , Yijun Xiao , Huiming Zhang

Communications in Mathematics and Statistics ›› : 1 -18.

PDF
Communications in Mathematics and Statistics ›› : 1 -18. DOI: 10.1007/s40304-024-00433-2
Article
research-article

Directional FDR Control for Sub-Gaussian Sparse GLMs

Author information +
History +
PDF

Abstract

High-dimensional sparse generalized linear models (GLMs) have emerged in the setting that the number of samples and the dimension of variables are large, and even the dimension of variables grows faster than the number of samples. False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results after getting the sparse penalized estimation of GLMs. Using the CLIME method for precision matrix estimations, we construct the debiased Lasso estimator and prove the asymptotical normality by minimax-rate oracle inequalities for sparse GLMs. In practice, it is often needed to accurately judge each regression coefficient’s positivity and negativity, which determines whether the predictor variable is positively or negatively related to the response variable conditionally on the rest variables. Using the debiased estimator, we establish multiple testing procedures. Under mild conditions, we show that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables (FDV) at a pre-specified significance level. Moreover, it can be shown that our multiple testing procedure can approximately achieve a statistical power of 1. We also extend our methods to the two-sample problems and propose the two-sample test statistics. Under suitable conditions, we can asymptotically achieve directional FDR control and directional FDV control at the specified significance level for two-sample problems. Some numerical simulations have successfully verified the FDR control effects of our proposed testing procedures, which sometimes outperforms the classical knockoff method.

Keywords

Sub-Gaussian regression models / Debiased Lasso estimator / Multiple hypothesis testing / Directional false discovery rate / Two-sample test / 62J07 / 62J12 / 62F05 / 62H15

Cite this article

Download citation ▾
Chang Cui, Jinzhu Jia, Yijun Xiao, Huiming Zhang. Directional FDR Control for Sub-Gaussian Sparse GLMs. Communications in Mathematics and Statistics 1-18 DOI:10.1007/s40304-024-00433-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

BarberRF, CandèsEJ. Controlling the false discovery rate via knockoffs. Ann. Stat., 2015, 43(5): 2055-2085

[2]

BenjaminiY, HochbergY. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 1995, 57(1): 289-300

[3]

Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001)

[4]

BickelPJ, RitovY, TsybakovAB. Simultaneous analysis of lasso and dantzig selector. Ann. Stat., 2009, 37(4): 1705-1732

[5]

BlazereM, LoubesJ-M, GamboaF. Oracle inequalities for a group lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inf. Theory, 2014, 60(4): 2303-2318

[6]

BuneaF. Honest variable selection in linear and logistic regression models via $\ell _1$ and $\ell _1+\ell _2$ penalization. Electron. J. Stat., 2008, 2: 1153-1194

[7]

CaiTT, LiuW, LuoX. A constrained $\ell _1$ minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc., 2011, 106(494): 594-607

[8]

CaiTT, LiuW, ZhouHH. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Stat., 2016, 44(2): 455-488

[9]

CandesE, FanY, JansonL, LvJ. Panning for gold : ‘model-x’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. B, 2018, 80(3): 551-577

[10]

Dedieu, A.: An error bound for lasso and group lasso in high dimensions. arXiv preprint arXiv:1912.11398 (2019)

[11]

Dedieu, A.: Improved error rates for sparse (group) learning with lipschitz loss functions. arXiv preprint arXiv:1910.08880 (2019)

[12]

DurrettRProbability: Theory and Examples, 2019, Cambridge. Cambridge University Press. 49

[13]

FanJ, LiR, ZhangC-H, ZouHStatistical Foundations of Data Science, 2020, Boca Raton. CRC Press.

[14]

FanY, DemirkayaE, LiG, LvJ. Rank: large-scale inference with graphical nonlinear knockoffs. J. Am. Stat. Assoc., 2020, 115(529): 362-379

[15]

GelmanA, TuerlinckxF. Type s error rates for classical and Bayesian single and multiple comparison procedures. Comput. Stat., 2000, 15(3): 373-390

[16]

HuangH, GaoY, ZhangH, LiB. Weighted lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors. Acta Math. Sci., 2021, 41(1): 207-230

[17]

JavanmardA, JavadiH. False discovery rate control via debiased lasso. Electron. J. Stat., 2019, 13(1): 1212-1253

[18]

JavanmardA, MontanariA. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res., 2014, 15(1): 2869-2909

[19]

Jia, J., Rohe, K., Yu, B.: The lasso under poisson-like heteroscedasticity. Stat. Sin. 99–118 (2013)

[20]

LiuW. Gaussian graphical model estimation with false discovery rate control. Ann. Stat., 2013, 41(6): 2948-2978

[21]

LiuW, ShaoQ-M. Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control. Ann. Stat., 2014, 42(5): 2003-2025

[22]

MaR, CaiTT, LiH. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. J. Am. Stat. Assoc., 2020, 166: 984-998

[23]

Negahban, S., Yu, B., Wainwright, M.J., Ravikumar, P.K.: A unified framework for high-dimensional analysis of $ m $-estimators with decomposable regularizers. In: Advances in Neural Information Processing Systems, pp. 1348–1356. Citeseer (2009)

[24]

NelderJA, WedderburnRW. Generalized linear models. J. R. Stat. Soc. Ser. A, 1972, 135(3): 370-384

[25]

OwenAB. Variance of the number of false discoveries. J. R. Stat. Soc. B, 2005, 67(3): 411-426

[26]

Pang, H., Liu, H., Vanderbei, R.J.: The fastclime package for linear programming and large-scale precision matrix estimation in r. J. Mach. Learn. Res. (2014)

[27]

ShaoJMathematical Statistics, 20032Berlin. Springer.

[28]

TibshiraniR. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 1996, 58(1): 267-288

[29]

Van De GeerS, BühlmannP, RitovY, DezeureR. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat., 2014, 42(3): 1166-1202

[30]

Wei, H., Lei, X., Han, Y., Zhang, H.: High-dimensional inference and fdr control for simulated markov random fields. arXiv preprint arXiv:2202.05612 (2022)

[31]

Xia, L., Nan, B., Li, Y.: Debiased lasso for generalized linear models with a diverging number of covariates. Biometrics (2021)

[32]

XiaY, CaiTT, LiH. Joint testing and false discovery rate control in high-dimensional multivariate regression. Biometrika, 2018, 105(2): 249-269

[33]

YeF, ZhangC-H. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls. J Mach Learn Res, 2010, 11(Dec): 3519-3540

[34]

YuY, BradicJ, SamworthR. Confidence intervals for high-dimensional cox models. Stat. Sin., 2021, 31(1): 1038-1068

[35]

ZhangC-H, ZhangSS. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. B, 2014, 76(1): 217-242

[36]

ZhangH, ChenSX. Concentration inequalities for statistical inference. Commun. Math. Res., 2021, 37(1): 1-85

[37]

ZhangH, JiaJ. Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signals detection. Stat. Sin., 2022, 32: 181-207

[38]

ZhaoP, YuB. On model selection consistency of lasso. J. Mach. Learn. Res., 2006, 7(Nov): 2541-2563

Funding

National Natural Science Foundation of China(12101630)

RIGHTS & PERMISSIONS

School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF

29

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/