Regularized rank-based estimation of high-dimensional nonparanormal graphical models\thanksrefT1

Regularized rank-based estimation of high-dimensional nonparanormal graphical models\thanksrefT1

\fnmsLingzhou \snmXuelabel=e1]lzxue@stat.umn.edu [    \fnmsHui \snmZou\correflabel=e2]zouxx019@umn.edu [ University of Minnesota School of Statistics
University of Minnesota
Minneapolis, Minnesota 55455
USA
\printeade1
E-mail: \printead*e2
\smonth2 \syear2012\smonth8 \syear2012
\smonth2 \syear2012\smonth8 \syear2012
\smonth2 \syear2012\smonth8 \syear2012
Abstract

A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes high-dimensional precision matrix estimation very appealing in many applications. However, in practice we often face nonnormal data, and variable transformation is often used to achieve normality. In this paper we consider the nonparanormal model that assumes that the variables follow a joint normal distribution after a set of unknown monotone transformations. The nonparanormal model is much more flexible than the normal model while retaining the good interpretability of the latter in that each zero entry in the sparse precision matrix of the nonparanormal model corresponds to a pair of conditionally independent variables. In this paper we show that the nonparanormal graphical model can be efficiently estimated by using a rank-based estimation scheme which does not require estimating these unknown transformation functions. In particular, we study the rank-based graphical lasso, the rank-based neighborhood Dantzig selector and the rank-based CLIME. We establish their theoretical properties in the setting where the dimension is nearly exponentially large relative to the sample size. It is shown that the proposed rank-based estimators work as well as their oracle counterparts defined with the oracle data. Furthermore, the theory motivates us to consider the adaptive version of the rank-based neighborhood Dantzig selector and the rank-based CLIME that are shown to enjoy graphical model selection consistency without assuming the irrepresentable condition for the oracle and rank-based graphical lasso. Simulated and real data are used to demonstrate the finite performance of the rank-based estimators.

[
\kwd
\doi

10.1214/12-AOS1041 \volume40 \issue5 2012 \firstpage2541 \lastpage2571

\runtitle

Rank-based estimation of nonparanormal graphical models \thankstextT1Some results in this paper were reported in a research proposal funded by the Office of Naval Research in November 2010. Supported in part by NSF Grant DMS-08-46068 and a grant from ONR.

{aug}

and

class=AMS] \kwd[Primary ]62G05 \kwd62G20 \kwd[; secondary ]62F12 \kwd62J07 CLIME \kwdDantzig selector \kwdgraphical lasso \kwdnonparanormal graphical model \kwdrate of convergence \kwdvariable transformation

1 Introduction.

Estimating covariance or precision matrices is of fundamental importance in multivariate statistical methodologies and applications. In particular, when data follow a joint normal distribution, the precision matrix can be directly translated into a Gaussian graphical model. The Gaussian graphical model serves as a noncausal structured approach to explore the complex systems consisting of Gaussian random variables, and it finds many interesting applications in areas such as gene expression genomics and macroeconomics determinants study [Friedman (2004), Wille et al. (2004), Dobra, Eicher and Lenkoski (2010)]. The precision matrix plays a critical role in the Gaussian graphical models because the zero entries in precisely capture the desired conditional independencies, that is, if and only if [Lauritzen (1996), Edwards (2000)].

The sparsity pursuit in precision matrices was initially considered by Dempster (1972) as the covariance selection problem. Multiple testing methods have been employed for network exploration in the Gaussian graphical models [Drton and Perlman (2004)]. With rapid advances of the high-throughput technology (e.g., microarray, functional MRI), estimation of a sparse graphical model has become increasingly important in the high-dimensional setting. Some well-developed penalization techniques have been used for estimating sparse Gaussian graphical models. In a highly-cited paper, Meinshausen and Bühlmann (2006) proposed the neighborhood selection scheme which tries to discover the smallest index set for each variable satisfying . Meinshausen and Bühlmann (2006) further proposed to use the lasso [Tibshirani (1996)] to fit each neighborhood regression model. Afterwards, one can summarize the zero patterns by aggregation via union or intersection. Yuan (2010) considered the Dantzig selector [Candes and Tao (2007)] as an alternative to the lasso penalized least squares in the neighborhood selection scheme. Peng et al. (2009) proposed the joint neighborhood lasso selection. Penalized likelihood methods have been studied for Gaussian graphical modeling [Yuan and Lin (2007)]. Friedman, Hastie and Tibshirani (2008) developed a fast blockwise coordinate descent algorithm [Banerjee, El Ghaoui and d’Aspremont (2008)] called graphical lasso for efficiently solving the lasso penalized Gaussian graphical model. Rate of convergence under the Frobenius norm was established by Rothman et al. (2008). Ravikumar et al. (2011) obtained the convergence rate under the elementwise norm and the spectral norm. Lam and Fan (2009) studied the nonconvex penalized Gaussian graphical model where a nonconvex penalty such as SCAD [Fan and Li (2001)] is used to replace the lasso penalty in order to overcome the bias issue of the lasso penalization. Zhou et al. (2011) proposed a hybrid method for estimating sparse Gaussian graphical models: they first infer a sparse Gaussian graphical model structure via thresholding neighborhood selection and then estimate the precision matrix of the submodel by maximum likelihood. Cai, Liu and Luo (2011) recently proposed a constrained minimization estimator called CLIME for estimating sparse precision matrices and established its convergence rates under the elementwise norm and Frobenius norm.

Critical value Cramer–von Mises Lilliefors Shapiro–Francia
Raw data 0.05 30 30 35
0.0539 24 26 28
Log data 0.05 29 24 33
0.0539 14 12 16
Table 1: Testing for normality of the gene expression measurements in the Arabidposis thaliana data. This table illustrates the number out of 39 genes rejecting the null hypothesis of normality at the significance level of

Although the normality assumption can be relaxed if we only focus on estimating a precision matrix, it plays an essential role in making the neat connection between a sparse precision matrix and a sparse Gaussian graphical model. Without normality, we ought to be very cautious when translating a good sparse precision matrix estimator into an interpretable sparse Gaussian graphical model. However, the normality assumption often fails in reality. For example, the observed data are often skewed or have heavy tails. To illustrate the issue of nonnormality in real applications, let us consider the gene expression data to construct isoprenoid genetic regulatory network in Arabidposis thaliana [Wille et al. (2004)], including 16 genes from the mevalonate (MVA) pathway in the cytosolic, 18 genes from the plastidial (MEP) pathway in the chloroplast and 5 encode proteins in the mitochondrial. This dataset contains gene expression measurements of 39 genes assayed on Affymetrix GeneChip microarrays. This dataset was analyzed by Wille et al. (2004), Li and Gui (2006) and Drton and Perlman (2007) in the context of Gaussian graphical modeling after taking the log-transformation of the data. However, the normality assumption is still inappropriate even after the log-transformation. To show this, we conduct the normality test at the significance level of as in Table 1. It is clear that at most 9 out of 39 genes would pass any of three normality tests. Even after log-transformation, at least genes reject the null hypothesis of normality. With Bonferroni correction there are still over genes that fail to pass any normality test. Figure 1 plots the histograms of two key isoprenoid genes MECPS in the MEP pathway and MK in the MVA pathway after the log-transformation, clearly showing the nonnormality of the data after the log-transformation.

Using transformation to achieve normality is a classical idea in statistical modeling. The celebrated Box–Cox transformation is widely used in regression analysis. However, any parametric modeling of the transformation suffers from model mis-specification which could lead to misleading inference. In this paper we take a nonparametric transformation strategy to handle the nonnormality issue. Let be the CDF of a continuous random variable and be the inverse of the CDF of . Consider the transformation from to by . Then it is easy to see that is standard normal regardless of . Motivated by this simple fact, we consider modeling the data by the following nonparanormal model:

The nonparanormal model: follows a -dimensional nonparanormal distribution if there exists a vector of unknown univariate monotone increasing transformations, denoted by , such that the transformed random vector follows a multivariate normal distribution with mean 0 and covariance ,

(1)

where without loss of generality the diagonals of are equal to 1.

Figure 1: Illustration of the nonnormality after the log-transformation preprocessing.

Note that model (1) implies that is a standard normal random variable. Thus, must be where is the CDF of . The marginal normality is always achieved by transformations, so model (1) basically assumes that those marginally normal-transformed variables are jointly normal as well. We follow Liu, Lafferty and Wasserman (2009) to call model (1) the nonparanormal model, but model (1) is in fact a semiparametric Gaussian copula model. The semiparametric Gaussian copula model is a nice combination of flexibility and interpretability, and it has generated a lot of interests in statistics, econometrics and finance; see Song (2000), Chen and Fan (2006), Chen, Fan and Tsyrennikov (2006) and references therein. Let . By the joint normality assumption of , we know that if and only if Interestingly, we have that

Therefore, a sparse can be directly translated into a sparse graphical model for presenting the original variables.

In this work we primarily focus on estimating which is then used to construct a nonparanormal graphical model. As for the nonparametric transformation function, by the expression , we have a natural estimator for the transformation function of the th variable as where is a Winsorized empirical CDF of the th variables. Note that the Winsorization is used to avoid infinity value and to achieve better bias-variance tradeoff; see Liu, Lafferty and Wasserman (2009) for detailed discussion. In this paper we show that we can directly estimate without estimating these nonparametric transformation functions at all. This statement seems to be a bit surprising because a natural estimation scheme is a two-stage procedure: first estimate and then apply a well-developed sparse Gaussian graphical model estimation method to the transformed data . Liu, Lafferty and Wasserman (2009) have actually studied this “plug-in” estimation approach. They proposed a Winsorized estimator of the nonparametric transformation function and used the graphical lasso in the second stage. They established convergence rate of the “plug-in” estimator when is restricted to a polynomial order of . However, Liu, Lafferty and Wasserman (2009) did not get a satisfactory rate of convergence for the “plug-in” approach, because the rate of convergence can be established for the Gaussian graphical model even when grows with almost exponentially fast [Ravikumar et al. (2011)]. As noted in Liu, Lafferty and Wasserman (2009), it is very challenging, if not impossible, to push the theory of the “plug-in” approach to handle exponentially large dimensions. One might ask if using a better estimator for the transformation functions could improve the rate of convergence such that could be allowed to be nearly exponentially large relative to . This is a legitimate direction for research. We do not pursue this direction in this work. Instead, we show that we could use a rank-based estimation approach to achieve the exact same goal without estimating these transformation functions at all.

Our estimator is constructed in two steps. First, we propose using the adjusted Spearman’s rank correlation to get a nonparametric sample estimate of . As the second step, we compute a sparse estimator from the rank-based sample estimate of . For that purpose, we consider several regularized rank estimators, including the rank-based graphical lasso, the rank-based neighborhood Dantzig selector and the rank-based CLIME. The complete methodological details are presented in Section 2. In Section 3 we establish theoretical properties of the proposed rank-based estimators, regarding both precision matrix estimation and graphical model selection. In particular, we are motivated by the theory to consider the adaptive version of the rank-based neighborhood Dantzig selector and the rank-based CLIME, which can select the true support set with an overwhelming probability without assuming a stringent irrepresentable condition required for the oracle and rank-based graphical lasso. Section 4 contains numerical results and Section 5 has some concluding remarks. Technical proofs are presented in an Appendix.

A referee informed us in his/her review report that Liu et al. (2012) also independently used the rank-based correlation in the context of nonparametric Gaussian graphical model estimation. A major focus in Liu et al. (2012) is the numerical demonstration of the robustness property of the rank-based methods using both Spearman’s rho and Kendall’s tau when data are contaminated. In the present paper we provide a systematic analysis of the rank-based estimators, and our theoretical analysis further leads to the rank-based adaptive Dantizg selector and the rank-based adaptive CLIME in order to achieve improved sparsity recovery properties. Our theoretical analysis of the rank-based adaptive Dantizg selector is of independent interest. Although the theory is established for the rank-based estimators using Spearman’s rho, the same analysis can be easily adopted to prove the theoretical properties of the rank-based estimators using Kendall’s tau rank correlation.

2 Methodology.

We first introduce some necessary notation. For a matrix , we define its entry-wise norm as , and its entry-wise norm as . For a vector , we define its norm as and its norm as . To simplify notation, define as the sub-matrix of with row indexes and column indexes , and define as the sub-vector of with indexes . Let be the index set . Denote by the sub-matrix of with both th row and column removed, and denote by the vector including all the covariances associated with the th variable. In the same fashion, we can also define , , and so on.

2.1 The “oracle” procedures.

Suppose an oracle knows the underlying transformation vector; then the oracle could easily recover “oracle data” by applying these true transformations, that is, . Before presenting our rank-based estimators, it is helpful to revisit the “oracle” procedures that are defined based on the “oracle data.”

  • The oracle graphical lasso. Let be the sample covariance matrix for the “oracle” data, and then the “oracle” log-profile-likelihood becomes . The “oracle” graphical lasso solves the following penalized likelihood problem:

    (2)
  • The oracle neighborhood lasso selection. Under the nonparanormal model, for each , the “oracle” variable given is normally distributed as which can be written as with and Notice that and are closely related to the precision matrix , that is, and Thus for the th variable, and share the same sparsity pattern. Following Meinshausen and Bühlmann (2006), the oracle neighborhood lasso selection obtains the solution from the following lasso penalized least squares problem:

    (3)

    and then the sparsity pattern of can be estimated by aggregating the neighborhood support set of () via intersection or union.

    We notice the fact that

    Then (3) can be written in the following equivalent form:

    (4)
  • The oracle neighborhood Dantzig selector. Following Yuan (2010) the lasso least squares in (3) can be replaced with the Dantzig selector

    (5)

    Then the sparsity pattern of can be similarly estimated by aggregating via intersection or union. Furthermore, we notice that

    Then (5) can be written in the following equivalent form:

    (6)
  • The oracle CLIME. Following Cai, Liu and Luo (2011) we can estimate precision matrices by solving a constrained minimization problem,

    (7)

    Cai, Liu and Luo (2011) compared the CLIME with the graphical lasso, and showed that the CLIME enjoys nice theoretical properties without assuming the irrepresentable condition of Ravikumar et al. (2011) for the graphical lasso.

2.2 The proposed rank-based estimators.

The existing theoretical results in the literature can be directly applied to these oracle estimators. However, the “oracle data” are unavailable and thus the above-mentioned “oracle” procedures are not genuine estimators. Naturally we wish to construct a genuine estimator that can mimic the oracle estimator. To this end, we can derive an alternative estimator of based on the actual data and then feed this genuine covariance estimator to the graphical lasso, the neighborhood selection or CLIME. To implement this natural idea, we propose a rank-based estimation scheme. Note that can be viewed as the correlation matrix as well, that is, . Let () be the observed values of variable . We convert them to ranks denoted by . Spearman’s rank correlation is defined as Pearson’s correlation between and . Spearman’s rank correlation is a nonparametric measure of dependence between two variables. It is important to note that are the ranks of the “oracle” data. Therefore, is also identical to the Spearman’s rank correlation between the “oracle” variables . In other words, in the framework of rank-based estimation, we can treat the observed data as the “oracle” data and avoid estimating nonparametric transformation functions. We make a note here that one may consider other rank correlation measures such as Kendall’s tau correlation. To fix the idea we use Spearman’s rank correlation throughout this paper.

The nonparanormal model implies that follows a bivariate normal distribution with correlation parameter . Then a classical result due to Kendall (1948) [see also Kruskal (1958)] shows that

(8)

which indicates that is a biased estimator of . To correct the bias, Kendall (1948) suggested using the adjusted Spearman’s rank correlation

(9)

Combining (8) and (9) we see that is an asymptotically unbiased estimator of . Naturally we define the rank-based sample estimate of as follows:

In Section 3 we show is a good estimator of . Then we naturally come up with the following rank-based estimators of by using the graphical lasso, the neighborhood Dantzig selector and CLIME:

  • The rank-based graphical lasso:

    (10)
  • The rank-based neighborhood Dantzig selector: A rank-based estimate of can be solved by

    (11)

    The support of can be estimated from the support of via aggregation by union or intersection. We can also construct the rank-based precision matrix estimator with

    and

    (). We can symmetrize by solving the following optimization problem [Yuan (2010)]:

    Theoretical analysis of the rank-based neighborhood Dantzig selector in Section 3.2 motivated us to consider using the adaptive Dantzig selector in the rank-based neighborhood estimation in order to achieve better graphical model selection performance. See Section 3.2 for more details.

  • The rank-based CLIME:

    (12)

    Let ’s be the natural basis in . By Lemma 1 in Cai, Liu and Luo (2011) the above optimization problem can be further decomposed into subproblems of vector minimization,

    (13)

    for . Then is exactly equivalent to . Note that could be asymmetric. Following Cai, Liu and Luo (2011) we consider

    with In the original paper Cai, Liu and Luo (2011) proposed to use hard thresholding for graphical model selection. Borrowing the basic idea from Zou (2006), we propose an adaptive version of the rank-based CLIME in order to achieve better graphical model selection. See Section 3.3 for more details.

2.3 Rank-based neighborhood lasso?

One might consider the rank-based neighborhood lasso defined as follows:

(14)

However, there is a technical problem for the above definition. The Spearman’s rank correlation matrix is always positive semidefinite, but the adjusted correlation matrix could become indefinite. To our best knowledge, Devlin, Gnanadesikan and Kettenring (1975) were the first to point out the indefinite issue of the estimated rank correlation matrix. Here we also use a toy example to illustrate this point. Consider the correlation matrix

Note that is positive-definite with eigenvalues , and , but becomes indefinite with eigenvalues , and . The negative eigenvalues will make (14) an ill-defined optimization problem. Fortunately, the positive definite issue does not cause any problem for the graphical lasso, Dantzig selector and CLIME. Notice that the diagonal elements of are obviously strictly positive, and thus Lemma 3 in Ravikumar et al. (2011) suggests that the rank-based graphical lasso always has a unique positive definite solution for any regularization parameter . The rank-based neighborhood Dantzig selector and the rank-based CLIME are still well defined, even when becomes indefinite, and the according optimization algorithms also tolerate the indefiniteness of . One might consider a positive definite correction of for implementing the neighborhood lasso estimator. However, the resulting estimator shall behave similarly to the rank-based neighborhood Dantzig selector because the lasso penalized least squares and Dantzig selector, in general, work very similarly [Bickel, Ritov and Tsybakov (2009), James, Radchenko and Lv (2009)].

3 Theoretical properties.

For a vector , let denote the minimum absolute value, that is, . For a matrix , we define the following matrix norms: the matrix norm , the matrix norm and the Frobenius norm . For any symmetric matrix, its matrix norm coincides its matrix norm. Denote by and the smallest and largest eigenvalues of , respectively. Define as the true covariance matrix, and let be its inverse. Let be the true support set of the off-diagonal elements in . Let be the maximal degree over the underlying graph corresponding to , and let be the total degree over the whole graph.

In this section we establish theoretical properties for the proposed rank-based estimators. The main conclusion drawn from these theoretical results is that the rank-based graphical lasso, neighborhood Dantzig selector and CLIME work as well as their oracle counterparts in terms of the rates of convergence. We first provide useful concentration bounds concerning the accuracy of the rank-based sample correlation matrix.

Lemma 1

Fix any , and let . Then there exists some absolute constant , and we have the following concentration bounds:

Lemma 1 is a key ingredient of our theoretical analysis. It basically shows that the rank-based sample estimator of works as well as the usual sample covariance estimator of based on the “oracle data.”

3.1 Rank-based graphical lasso.

Denote by the minimal entry of in the absolute scale. Define and . Define as the Kronecker product .

Theorem 1

Assume for . {longlist}[(a)]

Element-wise maximal bound: if is chosen such that

with probability at least , the rank-based graphical lasso estimator satisfies that for any and

Graphical model selection consistency: picking a regularization parameter  to satisfy that

then with probability at least , is sign consistent satisfying that for any and for any .

In Theorem 1, the condition is also referred as the irrepresentable condition for studying the theoretical properties of the graphical lasso [Ravikumar et al. (2011)]. We can obtain a straightforward understanding of Theorem 1 by considering its asymptotic consequences.

Corollary 1

Assume that there is a constant such that . Suppose that and are both fixed constants. {longlist}[(a)]

Rates of convergence: assume , and pick a regularization parameter such that . Then we have

Furthermore, the convergence rates in both Frobenius and matrix -norms can also be obtained as follows:

Graphical model selection consistency: assume is also fixed and . Pick a such that Then we have , and , .

Under the same conditions of Theorem 1 and Corollary 1, by the results in Ravikumar et al. (2011), we know that the conclusions of Theorem 1 and Corollary 1 hold for the oracle graphical lasso. In other words, the rank-based graphical lasso estimator is comparable to its oracle counterpart in terms of rates of convergence.

3.2 Rank-based neighborhood Dantzig selector.

We define , and . For each variable , define the corresponding active set with the maximal cardinality . Then we can organize and with respect to as and

Likewise we can partition and with respect to .

Theorem 2

Pick the such that and . With probability at least , there exists depending on , and only such that

Corollary 2

Suppose that , and are all fixed. Let , and pick such that . Then we have

Yuan (2010) established the rates of convergence of the neighborhood Dantzig selector under the norm, which can be directly applied to the oracle neighborhood Dantzig selector under the nonparanormal model. Comparing Theorem 2 and Corollary 2 to the results in Yuan (2010), we see that the rank-based neighborhood Dantzig selector and the oracle neighborhood Dantzig selector achieve the same rates of convergence.

Dantzig selector and the lasso are closely related [Bickel, Ritov and Tsybakov (2009), James, Radchenko and Lv (2009)]. Similarly to the lasso, the Dantzig selector tends to over-select. Zou (2006) proposed the adaptive weighting idea to develop the adaptive lasso which improves the selection performance of the lasso and corrects its bias too. The very same idea can be used to improve the selection performance of Dantzig selector which leads to the adaptive Dantzig selector [Dicker and Lin (2009)]. We can extend the rank-based Dantzig selector to the rank-based adaptive Dantzig selector. Given adaptive weights , consider

(15)

where denotes the Hadamard product, and denotes the set of entrywise inequalities for ease of notation. In both our theoretical analysis and numerical implementation, we utilize the optimal solution of the rank-based Dantzig selector to construct the adaptive weights by

(16)

Define and let Thus the support of exactly coincides with that of , and then it is further equivalent to the active set . Define , and for . Let .

Theorem 3

For each , we pick as in (11) satisfying that and , and pick as in (15) such that and In addition, we also choose as in (16) for each . Then with a probability at least , for each , the rank-based adaptive Dantzig selector finds the unique solution with and , and thus the rank-based neighborhood adaptive Dantzig selector is consistent for the graphical model selection.

Corollary 3

Suppose , , , , and () are all constants. Assume that and . Pick the tuning parameters and such that and . Then with probability tending to , for each , the rank-based adaptive Dantzig selector with as in (16) finds the unique optimal solution with and , and thus the rank-based neighborhood adaptive Dantzig selector is consistent for the graphical model selection.

The sign-consistency of the adaptive Dantzig selector is similar to that of the adaptive lasso [van de Geer, Bühlmann and Zhou (2011)]. Based on Theorem 2 we construct the adaptive weights in (16) which is critical for the success of the rank-based adaptive Dantzig selector in the high-dimensional setting. It is important to mention that the rank-based adaptive Dantzig selector does not require the strong irrepresentable condition for the rank-based graphical lasso to have the sparsity recovery property. Our treatment of the adaptive Dantzig selector is fundamentally different from Dicker and Lin (2009). Dicker and Lin (2009) focused on the canonical linear regression model and constructed the adaptive weights as the inverse of the absolute values of ordinary least square estimator. Their theoretical results only hold in the classical fixed setting. In our problem can be much bigger than . The choice of adaptive weights in (16) plays a critical role in establishing the graphical model selection consistency for the adaptive Dantzig selector under the high-dimensional setting where is at a nearly exponential rate to . Our technical analysis uses some key ideas such as the strong duality and the complementary slackness from the linear optimization theory [Bertsimas and Tsitsiklis (1997), Boyd and Vandenberghe (2004)].

3.3 Rank-based CLIME.

Compared to the graphical lasso, the CLIME can enjoy nice theoretical properties without assuming the irrepresentable condition [Cai, Liu and Luo (2011)]. This continues to hold when comparing the rank-based graphical lasso and the rank-based CLIME.

Theorem 4

Recall that . Pick a regularizing parameter such that . With a probability at least ,

Moreover, assume that , and suppose is a fixed constant. Pick a regularization parameter satisfying . Then we have

Theorem 4 is parallel to Theorem 6 in Cai, Liu and Luo (2011) which can be used to establish the rate of convergence of the oracle CLIME.

To improve graphical model selection performance, Cai, Liu and Luo (2011) suggested an additional thresholding step by applying the element-wise hard-thresholding rule to ,

(17)

where is the threshold, and is given in Theorem 4. Here we show that consistent graphical model selection can be achieved by an adaptive version of the rank-based CLIME. Given an adaptive weight matrix we define the rank-based adaptive CLIME as follows:

(18)

where is a simplified expression for the set of inequalities (for all ). Write . By Lemma 1 in Cai, Liu and Luo (2011) the above linear programming problem in (18) is exactly equivalent to vector minimization subproblems,

In both our theory and implementation, we utilize the rank-based CLIME’s optimal solution to construct an adaptive weight matrix by

(19)

We now prove the graphical model selection consistency of the rank-based adaptive CLIME. Denote as