PDF
Abstract
Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and $\ell _q$ sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given dataset. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, $\ell _q$ sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.
Keywords
High-dimensional statistics
/
Sparse regression
/
Restricted eigenvalue
/
$\ell _q$ sensitivity')">$\ell _q$ sensitivity
/
Computational complexity
Cite this article
Download citation ▾
Edgar Dobriban, Jianqing Fan.
Regularity Properties for Sparse Regression.
Communications in Mathematics and Statistics, 2016, 4(1): 1-19 DOI:10.1007/s40304-015-0078-6
| [1] |
Arora S, Barak B. Computational Complexity: A Modern Approach. 2009 Cambridge: Cambridge University Press
|
| [2] |
Bandeira AS, Dobriban E, Mixon DG, Sawin W. Certifying the restricted isometry property is hard. IEEE Trans. Inf. Theory. 2013, 59 6 3448-3450
|
| [3] |
Bickel PJ, Ritov Y, Tsybakov A. Simultaneous analysis of lasso and dantzig selector. Ann. Stat.. 2009, 37 4 1705-1732
|
| [4] |
Bühlmann P, van de Geer S. Statistics for High-Dimensional Data. 2011 1 Berlin: Springer
|
| [5] |
Bunea F. Sparsity oracle inequalities for the Lasso. Electron. J. Stat.. 2007, 1 169-194
|
| [6] |
Candès E, Tao T. The dantzig selector: statistical estimation when p is much larger than n. Ann. Stat.. 2007, 35 6 2313-2351
|
| [7] |
Candès EJ, Tao T. Decoding by linear programming. IEEE Trans. Inf. Theory. 2005, 51 12 4203-4215
|
| [8] |
Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM Rev.. 2001, 43 1 129-159
|
| [10] |
d’Aspremont A, Bach F, Ghaoui LE. Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res.. 2008, 9 1269-1294
|
| [11] |
d’Aspremont A, El Ghaoui L. Testing the nullspace property using semidefinite programming. Math. Program.. 2011, 127 1 123-144
|
| [12] |
Donoho DL, Huo X. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory. 2001, 47 7 2845-2862
|
| [13] |
Fan J. Lin X, Genest C, Banks DL, Molenberghs G, Scott DW, Wang J-L. Features of big data and sparsest solution in high confidence set. Past, Present, and Future of Statistical Science. 2014 New York: Chapman & Hall. 507-523
|
| [14] |
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.. 2001, 96 456 1348-1360
|
| [15] |
Gautier, E., Tsybakov, A.B.: High-dimensional instrumental variables regression and confidence sets. arXiv:1105.2454 (2011)
|
| [16] |
Koltchinskii V. The dantzig selector and sparsity oracle inequalities. Bernoulli. 2009, 15 3 799-828
|
| [17] |
Lee, K., Bresler, Y.: Computing performance guarantees for compressed sensing. In: Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference, pp. 5129–5132 (2008). doi:10.1109/ICASSP.2008.4518813
|
| [18] |
Petrov VV. Limit Theorems of Probability Theory: Sequences of Independent Random Variables. 1995 Oxford: Clarendon Press
|
| [19] |
Raskutti G, Wainwright MJ, Yu B. Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res.. 2010, 11 2241-2259
|
| [20] |
Rauhut H, Schnass K, Vandergheynst P. Compressed sensing and redundant dictionaries. IEEE Trans. Inf. Theory. 2008, 54 5 2210-2219
|
| [21] |
Ravikumar P. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electron. J. Stat.. 2011, 5 935-980
|
| [22] |
Rudelson, M., Zhou, S.: Reconstruction from anisotropic random measurements. In: Proceedings of the 25th annual conference on learning theory (2012)
|
| [24] |
Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol). 1996, 58 1 267-288
|
| [25] |
Tillmann AM, Pfetsch ME. The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing. IEEE Trans. Inf. Theory. 2014, 60 2 1248-1259
|
| [26] |
van de Geer, S.: The deterministic lasso. In: JSM Proceedings. Americal Statistical Association. http://www.stat.math.ethz.ch/~geer/lasso.pdf (2007)
|
| [27] |
van de Geer SA, Bühlmann P. On the conditions used to prove oracle results for the lasso. Electron. J. Stat.. 2009, 3 1360-1392
|
| [28] |
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027. In: Eldar, Y.C., Kutyniok, G. (eds.) Compressed Sensing. Theory and Applications. Cambridge University Press, Cambridge (2012)
|
| [29] |
Ye F, Zhang C. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls. J. Mach. Learn. Res.. 2010, 11 3519-3540
|
Funding
NIH(R01GM100474-04)
DMS(1206464)