Directional FDR Control for Sub-Gaussian Sparse GLMs
Chang Cui , Jinzhu Jia , Yijun Xiao , Huiming Zhang
Communications in Mathematics and Statistics ›› : 1 -18.
Directional FDR Control for Sub-Gaussian Sparse GLMs
High-dimensional sparse generalized linear models (GLMs) have emerged in the setting that the number of samples and the dimension of variables are large, and even the dimension of variables grows faster than the number of samples. False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results after getting the sparse penalized estimation of GLMs. Using the CLIME method for precision matrix estimations, we construct the debiased Lasso estimator and prove the asymptotical normality by minimax-rate oracle inequalities for sparse GLMs. In practice, it is often needed to accurately judge each regression coefficient’s positivity and negativity, which determines whether the predictor variable is positively or negatively related to the response variable conditionally on the rest variables. Using the debiased estimator, we establish multiple testing procedures. Under mild conditions, we show that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables (FDV) at a pre-specified significance level. Moreover, it can be shown that our multiple testing procedure can approximately achieve a statistical power of 1. We also extend our methods to the two-sample problems and propose the two-sample test statistics. Under suitable conditions, we can asymptotically achieve directional FDR control and directional FDV control at the specified significance level for two-sample problems. Some numerical simulations have successfully verified the FDR control effects of our proposed testing procedures, which sometimes outperforms the classical knockoff method.
Sub-Gaussian regression models / Debiased Lasso estimator / Multiple hypothesis testing / Directional false discovery rate / Two-sample test / 62J07 / 62J12 / 62F05 / 62H15
| [1] |
|
| [2] |
|
| [3] |
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001) |
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
Dedieu, A.: An error bound for lasso and group lasso in high dimensions. arXiv preprint arXiv:1912.11398 (2019) |
| [11] |
Dedieu, A.: Improved error rates for sparse (group) learning with lipschitz loss functions. arXiv preprint arXiv:1910.08880 (2019) |
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Jia, J., Rohe, K., Yu, B.: The lasso under poisson-like heteroscedasticity. Stat. Sin. 99–118 (2013) |
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Negahban, S., Yu, B., Wainwright, M.J., Ravikumar, P.K.: A unified framework for high-dimensional analysis of $ m $-estimators with decomposable regularizers. In: Advances in Neural Information Processing Systems, pp. 1348–1356. Citeseer (2009) |
| [24] |
|
| [25] |
|
| [26] |
Pang, H., Liu, H., Vanderbei, R.J.: The fastclime package for linear programming and large-scale precision matrix estimation in r. J. Mach. Learn. Res. (2014) |
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
Wei, H., Lei, X., Han, Y., Zhang, H.: High-dimensional inference and fdr control for simulated markov random fields. arXiv preprint arXiv:2202.05612 (2022) |
| [31] |
Xia, L., Nan, B., Li, Y.: Debiased lasso for generalized linear models with a diverging number of covariates. Biometrics (2021) |
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature
/
| 〈 |
|
〉 |