\thesection Introduction
###### Abstract

We propose methodology for estimation of sparse precision matrices and statistical inference for their low-dimensional parameters in a high-dimensional setting where the number of parameters can be much larger than the sample size. We show that the novel estimator achieves minimax rates in supremum norm and the low-dimensional components of the estimator have a Gaussian limiting distribution. These results hold uniformly over the class of precision matrices with row sparsity of small order and spectrum uniformly bounded, under a sub-Gaussian tail assumption on the margins of the true underlying distribution. Consequently, our results lead to uniformly valid confidence regions for low-dimensional parameters of the precision matrix. Thresholding the estimator leads to variable selection without imposing irrepresentability conditions. The performance of the method is demonstrated in a simulation study and on real data.

\keywords

precision matrix and sparsity and inference and asymptotic normality and confidence regions

{subclass}

62J07 and 62F12

## \thesection Introduction

We consider the problem of estimation of the inverse covariance matrix in a high-dimensional setting, where the number of parameters can significantly exceed the sample size . Suppose that we are given an design matrix , where the rows of are -dimensional i.i.d. random vectors from an unknown distribution with mean zero and covariance matrix We denote the precision matrix by , assuming the inverse of exists.
The problem of estimating the precision matrix arises in a wide range of applications. Precision matrix estimation in particular plays an important role in graphical models that have become a popular tool for representing dependencies within large sets of variables. Suppose that we associate the variables with the vertex set of an undirected graph with an edge set . A graphical model represents the conditional dependence relationships between the variables, namely every pair of variables not contained in the edge set is conditionally independent given all remaining variables. If the vector is normally distributed, each edge corresponds to a non-zero entry in the precision matrix (\citetlauritzen). Practical examples of applications of graphical modeling include modeling of brain connectivity based on FMRI brain analysis \citepfmri, genetic networks, financial data processing, social network analysis and climate data analysis.
A lot of work has been done on methodology for point estimation of precision matrices. We discuss some of the approaches below, but a selected list of papers includes for instance \citetbuhlmann,glasso,bickel2,yuan2,cai,sunzhang. A common approach assumes that the precision matrix is sufficiently sparse and employs the -penalty to induce a sparse estimator. The main goal of these works is to show that, under some regularity conditions, the sparse estimator behaves almost as the oracle estimator that has the knowledge of the true sparsity pattern.
Our primary interest in this paper lies not in point estimation, but we aim to quantify uncertainty of estimation by providing interval estimates for the entries of the precision matrix. The challenge of this problem arises since asymptotics of regularized estimators which are the main tool in high-dimensional estimation is not easily tractable \citepknight2000, as opposed to the classical setting when the dimension of the unknown parameter is fixed.

### \thesubsection Overview of related work

Methodology for inference in high-dimensional models has been mostly studied in the context of linear and generalized linear regression models. From the work on linear regression models, we mention the paper by [zhang] where a semi-parametric projection approach using the Lasso methodology \citeplasso was proposed, which was further developed and studied in [vdgeer13]. The approach leads to asymptotically normal estimation of the regression coefficients and an extension of the method to generalized linear models is given in [vdgeer13]. The method requires sparsity of small order in the high-dimensional parameter vector and uses -norm error bound of the Lasso. Further alternative methods for inference in the linear model have been proposed and studied in [stanford1], [belloni1] and bootstrapping approach was suggested in [bootstrap1], [bootstrap2].
Other lines of work on inference for high-dimensional models suggest post-model selection procedures, where in the first step a regularized estimator is used for model selection and in the second step e.g. a maximum likelihood estimator is applied on the selected model. In the linear model, simple post-model selection methods have been proposed e.g. in [gausslasso], [candes2007]. These approaches are however only guaranteed to work under irrepresentability and beta-min conditions (see [hds]). Especially in view of inference, beta-min conditions which assume that the non-zero parameters are sufficiently large in absolute value, should be avoided.
In this paper, we consider estimation of precision matrices, which is a problem related to linear regression, however, it is a non-linear problem and thus it requires a more involved treatment. One approach to precision matrix estimation is based on regularization of the maximum likelihood in terms of the -penalty. This approach is typically referred to as the graphical Lasso, and has been studied in detail in several papers, see e.g. [glasso], [rothman], [ravikumar] and [yuan]. Another common approach to precision matrix estimation is based on projections. This approach reduces the problem to a series of regression problems and estimates each column of the precision matrix using a Lasso estimator or Dantzig selector \citepcandes2007. The idea was first introduced in [buhlmann] as neighbourhood selection for Gaussian graphical models and further studied in [yuan2], [cai] and [sunzhang].
Methodology leading to statistical inference for the precision matrix has been studied only recently. The work [zhou] proposes to use a more involved variation of the regression approach to obtain an estimator which leads to statistical inference. This approach leads to an estimator of the precision matrix which is elementwise asymptotically normal, under row sparsity of order , bounded spectrum of the true precision matrix and Gaussian distribution of the sample. The paper [jvdgeer14] proposes a method for statistical inference based on the graphical Lasso. The work introduces a de-sparsified estimator based on the graphical Lasso, which is also shown to be elementwise asymptotically normal.

### \thesubsection Contributions and outline

We propose methodology leading to honest confidence intervals and testing for low-dimensional parameters of the precision matrix, without requiring irrepresentability conditions or beta-min conditions to hold. Our work is motivated by the semi-parametric approach in [vdgeer13] and is a follow-up of the work [jvdgeer14]. Compared to the previous work on statistical inference for precision matrices, this methodology has several advantages. Firstly, the estimator we propose is a simple modification of the nodewise Lasso estimator proposed in [buhlmann]. Hence the estimator is easy to implement and efficient solutions are available on the computational side. Secondly, the novel estimator enjoys a range of optimality properties and leads to statistical inference under mild conditions. Firstly, the asymptotic distribution of low-dimensional components of the estimator is shown to be Gaussian. This holds uniformly over the class of precision matrices with row sparsity of order , spectrum uniformly bounded in and sub-Gaussian margins of the underlying distribution. This results in honest confidence regions \citepli for low-dimensional parameters. The proposed estimator achieves rate optimality as shown in Section Document. Moreover, the de-sparsified estimator may be thresholded to guarantee variable selection without imposing irrepresentable conditions. The computational cost of the method is order Lasso regressions for estimation of all parameters and two Lasso regressions for a single parameter.

The paper is organized as follows. Section Document introduces the methodology. Section Document contains the main theoretical results for estimation and inference and in Section Document the suggested method is applied to variable selection. Section Document provides a comparison with related work. Section Document illustrates the theoretical results in a simulation study. In Section Document, we analyze two real datasets and apply our method to variable selection. Section Document contains a brief summary of the results. Finally, the proofs were deferred to the Supplementary material.

Notation. For a vector and we use to denote the norm of in the classical sense. We denote For a matrix we use the notations , and . The symbol denotes the vectorized version of a matrix obtained by stacking the rows of on each other. By we denote a -dimensional vector of zeros with one at position . For real sequences , we write if for some independent of and all We write if both and hold. Finally, if Furthermore, for a sequence of random variables we write if is bounded in probability and we write if We write if converges in probability to zero.
Let denote the convergence in distribution and the convergence in probability. Let denote the cumulative distribution function of a standard normal random variable. By and we denote the minimum and maximum eigenvalue of , respectively. Let , denote respectively. We use letters to denotes universal constants. These are used in the proofs repeatedly to denote possibly different constants.

## \thesection De-sparsified nodewise Lasso

Our methodology is a simple modification of the nodewise Lasso estimator proposed in [buhlmann]. The idea is to remove the bias term which arises in the nodewise Lasso estimator due to -penalty regularization. This approach in inspired by literature on semiparametric statistics [bickelbook, vdv]. We note several papers have used this idea in the context of high-dimensional sparse estimation, see [zhang, vdgeer13, vdgeer14, stanford1, jvdgeer14].We first summarize the nodewise Lasso method introduced in [buhlmann] and discuss some of its properties. This method estimates an unknown precision matrix using the idea of projections to approximately invert the sample covariance matrix. For each we define the vector as follows

 γj:=argminγ∈Rp−1E∥Xj−X−jγ∥22/n (\theequation)

and denote and the noise level by We define the column vector Then one may show

 Θ0=(Θ01,…,Θ0p)=(Γ1/τ21,…,Γp/τ2p), (\theequation)

where is the -th column of Hence the precision matrix may be recovered from the partial correlations and from the noise level In our problem, we are only given the design matrix . The idea of nodewise Lasso is to estimate the partial correlations and the noise levels by doing a projection of every column of the design matrix on all the remaining columns. In low-dimensional settings, this procedure would simply recover the sample covariance matrix However, due to the high-dimensionality of our setting, the matrix is not invertible and we can only do approximate projections. If we assume sparsity in the precision matrix (and thus also in the partial correlations), this idea can be effectively carried out using the Lasso. Hence, for each define the estimators of the regression coefficients, , as follows

 ^γj:=argminγ∈Rp−1∥Xj−X−jγ∥22/n+2λj∥γ∥1. (\theequation)

We further define the column vectors

 ^Γj:=(−^γj,1,…,−^γj,j−1,1,−^γj,j+1,…,−^γj,p)T,

and estimators of the noise level

 ^τ2j:=∥Xj−X−j^γj∥22/n+λj∥^γj∥1,

for . Finally, we define the -th column of the nodewise Lasso estimator as

 ^Θj:=^Γj/^τ2j. (\theequation)

The estimator of the precision matrix was studied in several papers (following [buhlmann]) and has been shown to enjoy oracle properties under mild conditions on the model. These conditions include bounded spectrum of the precision matrix, row sparsity of small order and a sub-Gaussian distribution of the rows of (alternatively to sub-Gaussianity, one may assume that the covariates are bounded as in [vdgeer13]). Our approach uses the nodewise Lasso estimator as an initial estimator. The next step involves de-biasing or de-sparsifying, which may be viewed as one step using the Newton-Raphson scheme for numerical optimization. This is equivalent to “inverting” the Karush-Kuhn-Tucker (KKT) conditions by the inverse of the Fisher information as in [vdgeer13]. The challenge then also comes from the need to estimate the Fisher information matrix which is a matrix. We show that the estimator can be used in a certain way to create a surrogate of the inverse Fisher information matrix. Since the estimator can be characterized by its KKT conditions, it is convenient to work with these conditions to derive the new de-sparsified estimator. Consider hence the KKT conditions for the optimization problem (Document)

 −XT−j(Xj−X−j^γj)/n+λj^κj=0, (\theequation)

for , where is the sub-differential of the function at i.e.

where . If we define to be a vector

 ^Zj:=(^κj,1,…,^κj,j−1,0,^κj,j+1,…,^κj,p)/^τ2j,

then the KKT conditions may be equivalently stated as follows

 ^Σ^Θj−ej−λj^Zj=0,forj=1,…,p, (\theequation)

where is the sample covariance matrix. This is shown in Lemma 11 in the Supplementary material. Consequently, the KKT conditions \eqrefmkkt imply a bound for each which will be useful later. Note that the KKT conditions may be equivalently summarized in a matrix form as where the columns of are given by for and is a diagonal matrix with elements .
Multiplying the KKT conditions (\theequation) by , we obtain

 ^ΘTi(^Σ^Θj−ej)−^ΘTiλj^Zj=0.

Then we note that adding to both sides and rearranging we get

 ^Θij−^ΘTiλj^Zj−Θ0ij = ^Θij−^ΘTi(^Σ^Θj−ej)−Θ0ij = −(Θ0i)T(^Σ−Σ0)Θ0j+~Δij,

where is a term which can be shown to be under certain conditions (Lemma Document). Hence we define the de-sparsified nodewise Lasso estimator

 ^T:=^Θ−^ΘT(^Σ^Θ−I)=^Θ+^ΘT−^ΘT^Σ^Θ. (\theequation)

## \thesection Theoretical results

In this part, we inspect the asymptotic behaviour of the de-sparsified nodewise Lasso estimator (\theequation). In particular we consider the limiting distribution of individual entries of and show the convergence to the Gaussian distribution is uniform over the considered model. For construction of confidence intervals, we consider estimators of the asymptotic variance of the proposed estimator, both for Gaussian and sub-Gaussian design. We derive convergence rates of the method in supremum norm and consider application to variable selection.
For completeness, in Lemma \thesection in the Supplementary material, we summarize convergence rates of the nodewise Lasso estimator. The result is essentially the same as Theorem 2.4 in [vdgeer13] so the proof of the common parts is omitted. Recall that and we define the row sparsity by , maximum row sparsity by and the coordinates of non-zero entries of the precision matrix by . For the analysis below, we will need the following conditions.

1. [label=A0,start=1]

2. (Bounded spectrum) The inverse covariance matrix exists and there exists a universal constant such that

 1/L≤Λmin(Θ0)≤Λmax(Θ0)≤L.
3. (Sparsity)

4. (Sub-Gaussianity condition) Suppose that the design matrix has uniformly sub-Gaussian rows , i.e. there exists a universal constant such that

 supα∈Rp:∥α∥2≤1Eexp(|αTXi|2/K2)≤2(i=1,…,n).

The lower bound in Document guarantees that the noise level does not diverge. The upper bound (equivalently lower bound on eigenvalues of ) guarantees that the compatibility condition (see [hds]) is satisfied for the matrix , which is the true covariance matrix without the -th row and -th column. The sub-Gaussianity condition Document is used to obtain concentration results which are crucial to our analysis. Condition Document is also used to ensure that the compatibility condition is satisfied for with high probability (see [hds]). Conditions Document, Document and Document are the same conditions as used in [vdgeer13] to obtain rates of convergence for the nodewise regression estimator. Define the parameter set

 G(s):={Θ∈Rp×p:max1≤i≤p∥Θi∥0≤s,???issatisfied}.

The following lemma shows that the proposed estimator can be decomposed into a pivot term and a term which is of small order with high probability. {lema} Suppose that is the nodewise Lasso estimator with regularization parameters , uniformly in , for some sufficiently large constant . Suppose that Document and Document are satisfied. Then for each it holds

 √n(^Tij−Θ0ij)=−√n(Θ0i)T(^Σ−Σ0)Θ0j+Δij, (\theequation)

where there exists a constant such that

 limn→∞supΘ0∈G(s)P(maxi,j=1,…,p|Δij|≥Cslogp√n)=0.

From Lemma Document it follows that we need to assume stronger sparsity condition than Document for the remainder term to be negligible after normalization by . This is accordance with other literature on the topic, see [vdgeer13], [zhou]. Hence we introduce the following strengthened sparsity condition.

1. [label=A0,start=2]

The next result shows that the elements of are indeed asymptotically normal. To this end, we further define the asymptotic variance

 σ2ij:=var((Θ0i)TX1XT1Θ0j).

In some of the results to follow, we shall assume a universal lower bound on as follows.

1. [label=A0,start=4]

2. There exists a universal constant such that

Assumption Document is satisfied e.g. under Gaussian design and Document. Denote a parameter set

 ~G(s):={Θ∈Rp×p:max1≤i≤p∥Θi∥0≤s,???,???aresatisfied}.
{theorem}

[Asymptotic normality] Suppose that is the nodewise Lasso estimator with regularization parameters uniformly in . Suppose that Document and Document are satisfied. Then for every and it holds

 limn→∞supΘ0∈~G(s)|PΘ0(√n(^Tij−Θ0ij)/σij≤z)−Φ(z)|=0.

To construct confidence intervals, a consistent estimator of the asymptotic variance is required. Consistent estimators of are discussed in Section Document (see Lemmas Document and Document). Hence Theorem Document implies uniformly valid asymptotic confidence intervals , i.e.

 limn→∞supΘ0∈~G(s)|PΘ0(Θ0ij∈Iα)−(1−α)|=0.

The result also enables testing hypotheses about individual elements of the precision matrix. For testing multiple hypothesis simultaneously, we may use the standard procedures such as Bonferroni-Holm procedure (see [vdgeer13]).

### \thesubsection Variance estimation

For the case of Gaussian observations, we may easily calculate the theoretical variance and plug in the estimate in place of the unknown as is displayed in Lemma Document below. {lema} Suppose that assumptions Document and Document are satisfied and assume that the rows of the design matrix are independent -distributed. Let be the nodewise Lasso estimator and let uniformly in for some Then for we have

 supΘ0∈G(s)P(maxi,j=1,…,p|^σ2ij−σ2ij|≥Cτ√slogp/n)≤c1p1−τc2,

for some constants Lemma Document implies that under , we have a rate If Gaussianity is not assumed, we may replace the estimator of the variance with the empirical version, and plug in in place of the unknown Thus we take the following estimator of , where is the nodewise regression estimator

 ^σ2ij:=1nn∑k=1(^ΘTiXkXTk^Θj)2−^Θ2ij. (\theequation)

The following lemma justifies this procedure under Document, Document, Document. {lema} Suppose that the assumptions Document and Document are satisfied and for some , it holds that . Let be the nodewise Lasso estimator and let uniformly in for some Let be the estimator defined in (\theequation). Then for all

 limn→∞supΘ0∈G(s)P(maxi,j=1,…,p|^σ2ij−σ2ij|≥η)=0.

### \thesubsection Rates of convergence

The de-sparsified estimator achieves optimal rates of convergence in supremum norm. Observe first that for the nodewise regression estimator it holds by (\theequation@IDi), Lemma 1 and Lemma 10 in the Supplementary material that

 ^Θij−Θ0ij=^ΘTi(^Σ^Θj−ej)+OP(max{1√n,slogpn}).

By Hölder’s inequality and the KKT conditions it follows

 |^ΘTi(^Σ^Θj−ej)|≤λj∥^Θi∥1/^τ2j.

Consequently, for the rates of convergence of the nodewise Lasso in supremum norm we find

De-sparsifying the estimator as in (\theequation) removes the term involving in the above rates. {theorem}[Rates of convergence] Assume that Document and Document are satisfied. Let and let be the de-sparsified nodewise Lasso estimator with regularization parameters for some sufficiently large constant , uniformly in . Then there exist constants such that

 supΘ0∈G(s)P(|^Tij−Θ0ij|≥Cτmax{1√n,slogpn})≤c1e−c2τ.

and

 supΘ0∈G(s)P(∥^T−Θ0∥∞≥Cτmax{√logpn,slogpn})≤c1p1−c2τ.

We compare the results of Theorem Document with results on optimal rates of convergence derived for Gaussian graphical models in [zhou]. Suppose that the observations are Gaussian, i.e. . For for some and it holds (see [zhou])

 inf^ΘijsupΘ0∈G(s)P(|^Θij−Θ0ij|>max{C11√n,C2slogpn})>c1>0,

and

 inf^ΘsupΘ0∈G(s)P(∥^Θ−Θ0∥∞>max{C′1√logpn,C′2slogpn})>c2>0,

where are positive constants depending on and only. As follows from Theorem Document, the de-sparsified nodewise Lasso attains the lower bound on rates and thus is in this sense optimal (considering the Gaussian setting).

### \thesubsection Other de-sparsified estimators

The de-sparsification may work for other estimators of the precision matrix, provided that certain conditions are satisfied. This is formulated in Lemma Document below. A particular example of interest is the square-root nodewise Lasso estimator, that will be discussed below. This estimator has the advantage that it is self-scaling in the variance, similarly as the square-root Lasso \citepsqrtlasso on which it is based. {lema} Assume that for some estimator it holds

 maxj=1,…,p∥^Ωj−Θ0j∥1=OP(s√logp/n),∥^Σ^Ω−I∥∞=OP(√logp/n). (\theequation)

Then for it holds under Document, Document, Document

 ∥^T−Θ0∥∞=OP(max{slogp/n,√logp/n}).

Moreover, We briefly consider nodewise regression with the square-root Lasso as an example. The square-root Lasso estimators may be defined via

 ^γj:=argminγ∈Rp−1∥Xj−X−jγ∥2/n+2λ0∥γ∥1,

for Define and . The nodewise square-root Lasso is then given by where

 ^Γj:=(−^γj,1,…,−^γj,j−1,1,−^γj,j+1,…,−^γj,p)T.

Note that compared to the nodewise Lasso, the difference lies in estimation of the partial correlations, where we used the square-root Lasso that “removes the square” from the squared loss. The Karush-Kuhn-Tucker conditions similarly as in Lemma 11 in Supplementary material give

 ^Σ^Θj,sqrt−ej−^τj~τ2jλ0~Zj=0,

where and is the sub-differential of the function with respect to evaluated at The paper [sqrtlasso] further shows that the -rates for the square-root Lasso satisfy condition \eqrefcon1. The de-sparsified estimator may then be defined in the same way as in (\theequation). Then the conditions of Lemma Document are satisfied and this implies that a de-sparsified nodewise square-root Lasso achieves the same rates as the de-sparsified nodewise Lasso and thus is also rate-optimal.

### \thesubsection The thresholded estimator and variable selection

The de-sparsified estimator can be used for variable selection without imposing irrepresentable conditions. Under mild conditions, the procedure leads to exact recovery of the coefficients that are sufficiently larger in absolute value than the noise level. The following corollary is implied by Theorem Document and Lemmas Document and Document. {cor} Let be obtained using the nodewise Lasso and be defined as in (\theequation) with tuning parameters uniformly in , for some . Assume that conditions Document, Document, Document and Document are satisfied. Let be the estimator from Lemma Document and assume that for some . Then there exists some constant such that

 limn→∞P(maxi,j=1,…,p|^Tij−Θ0ij|/^σij≥Cτ√logp/n)=0.

If, in addition, the rows of are -dsitributed and is instead the estimator from Lemma Document, then there exist constants such that

 P(maxi,j=1,…,p|^Tij−Θ0ij|/^σij≥Cτ√logp/n)≤c1p1−c2τ.

Corollary Document implies that we may define the re-sparsified estimator

 ^Tthreshij:=^Tij1|^Tij|>Cτ^σij√logp/n,

where is defined as in Corollary Document. Denote Denote Then it follows directly from Corollary Document that with high probability

 Sact0⊂^Sthresh⊂S0.

The inclusion represents that correctly identifies all the non-zero parameters which are above the noise level. The inclusion means that there are no false positives. If for all it holds

 |Θ0ij|≥2Cτσij√logp/n, (\theequation)

then we have exact recovery, i.e. with high probability:

## \thesection Further comparison to previous work

Closely related is the paper [jvdgeer14], where asymptotically normal estimation of elements of the concentration matrix is considered based on the graphical Lasso. While the analysis follows the same principles, the estimation method used here does not require the irrepresentability condition \citepravikumar that is assumed in [jvdgeer14]. Hence we are able to show that our results hold uniformly over the considered class. Furthermore, regarding the computational cost, our method uses Lasso regressions, which can be implemented using fast algorithms as in [lars]. In comparison, the graphical Lasso method presents a more challenging computational problem, for more details see e.g. [glasso.comp].
Another related work is the paper [zhou]. This paper suggests an estimator for the precision matrix which is shown to have one dimensional asymptotically normal components with asymptotic variance to The assumptions and results used in the paper are essentially identical with our assumptions and theoretical results in the present paper. However, there are some differences. The paper [zhou] assumes Gaussianity of the underlying distribution, while we only require sub-Gaussianity of the margins. Another difference is in the construction of the estimators. Both approaches use regression to estimate the elements of the precision matrix, but the paper [zhou] concentrates on estimation of the joint distribution of each pair of variables for Thus it is computationally more intensive as it requires high-dimensional regressions (see [zhou]), while our methodology only requires

## \thesection Simulation results

In this section we report on the performance of our method on simulated data and provide a comparison to another methodology. The random sample satisfies , where the precision matrix is defined by

We consider the settings and . The second setting is further adjusted by randomly perturbing each non-zero off-diagonal element of by adding a realization from the uniform distribution on the interval We denote this new perturbed model by Hence the second precision matrix was chosen randomly. The sparsity assumption requires We have chosen the sample sizes for numerical experiments according to the sparsity assumption (for this purpose, we ignored possible constants in the sparsity restriction), i.e. .

### \thesubsection Asymptotic normality and confidence intervals for individual parameters

#### \thesubsubsection The Gaussian setting

In this section, we consider normally-distributed observations, for In Figure 1 we display histograms of for where is defined in (\theequation) and the empirical variance is estimated as suggested by Lemma Document. Superimposed is the density of
Secondly, we investigate the properties of confidence intervals constructed using the de-sparsified nodewise Lasso. For comparison, we also provide results using confidence intervals based on the de-sparsified graphical Lasso introduced in [jvdgeer14]. The coverage and length of the confidence interval were estimated by their empirical versions,

 ^αij:=PN1{Θ0,ij∈Iij,α}and^ℓij:=PN2Φ−1(1−α/2)^σij/√n,

respectively, using random samples. For a set , we define the average coverage over the set (and analogously average length ) as

 avgcovA:=∑(i,j)∈A^αij/|A|.

We report average coverages over the sets and . These are denoted by and respectively. Similarly, we calculate average lengths of confidence intervals for each parameter from iterations and report and .
The results of the simulations are shown in Tables 1 and 2. The target coverage level is The methodology for the choice of the tuning parameters was used as follows (see [zhou]), for both methods,

 ^s=√n/logp,B=qt(1−^s/(2p),n−1),λ=B/√n−1+B2, (\theequation)

where denotes the -quantile of a -distribution with degrees of freedom.

#### \thesubsubsection A sub-Gaussian setting

In this section, we consider a design matrix with rows having a sub-Gaussian distribution other than the Gaussian distribution. Let be an matrix with jointly independent entries generated from a continuous uniform distribution on the interval . Further consider a matrix and let Then we define

 Xi:=Σ1/20Ui

for Then the expectation of is zero and the covariance matrix of is exactly and the precision matrix is It follows by Hoeffding’s inequality that defined as above is sub-Gaussian with a universal constant
A further difference compared to the simulations in Section Document is that we now estimate the variance of the de-sparsified estimator using the formula proposed in \eqrefvar.gene for sub-Gaussian settings:

 ^σ2ij:=1nn∑k=1(^ΘTiXkXTk^Θj)2−^Θ2ij, (\theequation)

where is the nodewise Lasso. The regularization parameters for the nodewise Lasso are used in accordance with \eqreftp. Figure 2 again displays the histograms related to several entries of the de-sparsified nodewise Lasso. Results related to the constructed confidence intervals are summarized in Table 3. The results demonstrate that the de-sparsified nodewise Lasso performs relatively well even under this non-Gaussian setting.

### \thesubsection Variable selection

For variable selection as suggested in Corollary Document, we compare the de-sparsified nodewise Lasso and the de-sparsified graphical Lasso. The setting is again as in Section Document. Average true positives and false positives over 100 repetitions are reported. Choice of the tuning parameters is according to (\theequation) and the thresholding level is given by

 λthresh=^σij√2νlogpn, (\theequation)

taking for the de-sparsified nodewise regression, for the de-sparsified graphical Lasso. We take as in Lemma Document. The results of this simulation experiment are summarized in Table 4.