Open access
Author
Date
2017Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
High-dimensional data with a sparse structure occur in many areas of science, industry and entertainment. Diverse applications motivated the need to devise efficient statistical methods for analyzing high-dimensional datasets. While methodology for point estimation is elaborate and generally well understood, in many applications it is essential to quantify statistical uncertainty by providing a confidence interval, a p-value or a test. In this thesis, we concentrate on developing efficient methods and theory for uncertainty estimation in specific high-dimensional settings which have a sparse structure.
In Chapter 2, we study estimation of high-dimensional inverse covariance matrices. We propose a simple approach for construction of asymptotically normal estimators for entries of the precision matrix based on Lasso-regularized estimators. Two explicit constructions are provided: one based on a global method that maximizes the joint likelihood and one based on a local (nodewise) method that sequentially applies the Lasso. When applied in the context of Gaussian graphical models, the proposed estimators lead to confidence intervals for edge weights or recovery of the edge structure. We evaluate their empirical performance in extensive simulation studies. The theoretical guarantees for the methods are achieved under a sparsity condition relative to the sample size, and mild distributional and regularity conditions. Additionally, we apply the results derived in this chapter to construct confidence intervals for edge weights in directed acyclic graphs.
In Chapter 3, we construct confidence intervals for loadings in high-dimensional principal component analysis. The non-convexity of the problem is handled by proposing a computationally efficient two-step procedure which yields a near-oracle estimator of the loadings vector. We derive oracle inequalities for the estimator and propose a de-biasing scheme to obtain an asymptotically normal estimator. We also provide an asymptotically valid confidence interval for the maximum eigenvalue of the underlying covariance matrix. Asymptotic guarantees are derived under a sparsity condition on the vector of loadings and sparsity in the inverse Hessian of the population risk function, under mild distributional and regularity conditions.
In Chapter 4, motivated by robust regression, we explore construction of confidence intervals in settings where the loss function may not be differentiable. We show that differentiability of the loss function is not essential and may be replaced by differentiability of the expected loss and an entropy condition measuring the complexity of the considered class of functions. We apply these results to particular estimators which arise in robust regression and show that a de-biased estimator has entry-wise Gaussian limiting distribution. The price we pay for non-differentiability is a stronger sparsity condition on the high-dimensional parameter.
Chapter 5 explores asymptotic efficiency of de-biased estimators in high-dimensional linear regression and Gaussian graphical models. The classical theory on asymptotic lower bounds on variance is not directly applicable in the high-dimensional settings due to the model changing with the sample size. We derive lower bounds on the variance of estimators which are strongly asymptotically unbiased, roughly meaning that their squared bias is of smaller order than variance. For the linear model under Gaussianity, we show that a de-biased estimator based on the Lasso achieves the asymptotic lower bound and is in this sense efficient, under sparsity conditions on both the high-dimensional parameter and the Fisher information matrix. We provide analogous results for Gaussian graphical models. As a by-product of our analysis, we establish oracle inequalities for the l1 -error of the Lasso, which hold in expectation. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000248167Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Lasso; High-dimensional statistical inference; Sparsity; Asymptotic confidence intervals; Graphical model; Asymptotic normality; Covariance matrix estimation; Robust regression; Sparse principal component analysis; Asymptotic efficiency; De-biased LassoOrganisational unit
02537 - Seminar für Statistik (SfS) / Seminar for Statistics (SfS)03717 - van de Geer, Sara (emeritus) / van de Geer, Sara (emeritus)
Funding
149145 - Inference in high-dimensional statistics (SNF)
More
Show all metadata
ETH Bibliography
yes
Altmetrics