Testing independence with high-dimensional correlated samples

Testing independence with high-dimensional correlated samples

\fnmsXi \snmChen \thanksrefm1label=e1]xchen3@stern.nyu.edu [    \fnmsWeidong \snmLiu \thanksreft1,m2label=e2]weidongl@sjtu.edu.cn [ New York University\thanksmarkm1 and Shanghai Jiao Tong University\thanksmarkm2 Stern School of Business
New York University
44 West 4th Street
New York, NY, USA
\printeade1
Department of Mathematics
Institute of Natural Sciences and Moe-LSC
Shanghai Jiao Tong University
Shanghai, China
\printeade2
Abstract

Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging identically distributed -dimensional random vectors into a data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical power and prove its minimax optimality. As an important by-product of the test statistic, a ratio-consistent estimator for the quadratic functional of a covariance matrix from correlated samples is developed. We further study the effect of correlation among samples to an important high-dimensional inference problem — large-scale multiple testing of Pearson’s correlation coefficients. Indeed, blindly using classical inference results based on the assumed independence of samples will lead to many false discoveries, which suggests the need for conducting independence testing before applying existing methods. To address the challenge arising from correlation among samples, we propose a “sandwich estimator” of Pearson’s correlation coefficient by de-correlating the samples. Based on this approach, the resulting multiple testing procedure asymptotically controls the overall false discovery rate at the nominal level while maintaining good statistical power. Both simulated and real data experiments are carried out to demonstrate the advantages of the proposed methods.

[
\kwd
\runtitle

Testing independence

{aug}

,

\thankstext

t1Research supported by NSFC, Grants No.11322107 and No.11431006, the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning, Shanghai Shuguang Program and 973 Program (2015CB856004).

class=MSC] \kwd[Primary ]62F05 \kwd[; secondary ]62H10

Independence test \kwdmultiple testing of correlations \kwdfalse discovery rate \kwdmatrix-variate normal \kwdquadratic functional estimation \kwdhigh-dimensional sample correlation matrix

1 Introduction

The independence among samples is a fundamental assumption in most statistical modeling upon which numerous estimation and inference methods and theories have been developed. Indeed, from classical statistical inference (e.g., student’s -test) to popular topics in modern statistics (e.g., high-dimensional problems, such as regression, matrix estimation and inference), this assumption of independence occurs widely. Consider samples , where each sample is a -dimensional vector from the same population distribution with mean and covariance . It is often convenient to pool samples together to form a data matrix . More specifically, for example in microarray data, X is an expression level matrix for genes measured on subjects. Such data are usually high-dimensional; thus, we mainly consider the setting where is much larger than . Most existing works in high-dimensional literature make the independence assumption among columns of X, serving as the starting point of methodology development and technical analysis. However, recent studies have shown that there are correlation structures among subjects in various microarray datasets (see, e.g., Teng and Huang (2009); Efron (2009); Allen and Tibshirani (2012); Kim et al. (2012)), demonstrating the potential risk of making the seemingly natural assumption of independence. Therefore, given a data matrix , it is important to first test whether the samples are indeed independent before applying any method that assumes independence.

A data matrix is known as transposable data when both rows and columns are potentially correlated (Lazzeroni and Owen, 2002; Allen and Tibshirani, 2012). For a transposable data matrix X, it is commonly assumed that X follows a matrix-variate normal distribution, which has been widely applied to model microarray data (see, e.g., Teng and Huang (2009); Efron (2009); Muralidharan (2010); Allen and Tibshirani (2012); Yin and Li (2012); Kim et al. (2012); Zhou (2014)). The matrix-variate normal distribution is a natural generalization of familiar vector-variate normal distribution (Dawid, 1981). In particular, let be the vectorization of matrix obtained by stacking the columns of on top of each other. We say follows a matrix-variate normal distribution with the mean matrix and covariance matrix (denoted by ) if and only if . Here, denotes the transpose of X, is the Kronecker product, and is the covariance matrix of row vectors of X. Given a matrix-variate normal , each column for , where is the -th column of the mean matrix M. Recall our problem setup: each follows the same population distribution with mean vector and covariance . Thus, we have where 1 is the -dimensional all one column vector and for . Under the matrix-variate normal modeling of the data, the independence testing problem is equivalent to the global test of whether is a diagonal matrix, i.e.,

(1)

The testing problem in (1) is closely related to the following correlation test problem

(2)

where is the Pearson’s correlation coefficient. The testing problem in (2) is a classical problem in multivariate analysis (Nagao, 1973; Anderson, 2003). It has also been extensively studied in the past decade under the high-dimensional setting (e.g., Johnstone (2001), Ledoit and Wolf (2002), Jiang (2004), Schott (2005), Liu, Lin and Shao (2008), Bai et al. (2009), Cai and Jiang (2011), Jiang and Yang (2013), Han and Liu (2014)). However, the reported results are based on the assumption that samples are independent. In fact, our problem in (1) is equivalent to the testing problem (2) with correlated samples. To see this, note that when treating each row of X as an individual sample, the role of and interchanges since , i.e., the matrix models the correlations among row samples while becomes the population covariance matrix. For many types of data (e.g., genetic data, financial data), there exists a complicated correlation structure among variables. Thus, will not be a diagonal matrix and row vectors are not independent. Our problem in (1) essentially tests the correlation among row vectors when samples are correlated. The correlation among samples makes our problem more challenging; and the aforementioned methods for testing (2), which are based on the assumption of sample independence, cannot be applied to our problem.

The classical methods for testing independence among samples commonly assume is fixed and are usually designed only for time series data. It is also known as serial independence test, see Hong (1998) and the references therein. In such a framework, the methods require that the samples under alternatives come from some time series. These samples satisfy an ordering structure such that the dependence between two samples decays as the distance of their indices increases. In our setting, there is no structural assumption among samples. Without any structural assumption, we will show in Theorem 2.7 that any test will not have the power tending to 1 uniformly over a large class of alternatives when the dimension is small (e.g., fixed constant or ). On the other hand, for but is small compared to , the independence test is relatively easy. In fact, if is known, the data matrix can be transformed as ; and thus the independence test can be directly carried out using existing approaches (e.g., Jiang (2004); Liu, Lin and Shao (2008)). One can apply such an approach with a plug-in estimator . However, as we will explain later in Section 5, when , even the optimal convergence rate of the estimator is not fast enough to solve this problem. In fact, although we have more information (i.e., row samples) as becomes larger, the number of unknown parameters in increases accordingly, which makes the problem challenging. Therefore, the high-dimensional setting is the most interesting case, and will be the main focus of the paper.

Although the testing of independence among high-dimensional samples is an important and fundamental problem, few existing works have done so. Based on matrix-variate normal modeling of the data, some inference approaches were proposed by Efron (2009) and Muralidharan (2010). However, these works do not explore the limiting null distributions as well as the validity and power of the test. Pan, Gao and Yang (2014) proposed a statistic for this problem based on random matrix theory. However, it requires the condition that is proportionally as large as (i.e., ), and thus cannot be applied to cases where with or, as in the ultra high-dimensional setting, where for some ; both scenarios are common in genetic applications. Further, the method in Pan, Gao and Yang (2014) requires splitting samples into two parts and differences in splitting could lead to different test results.

In this paper, we consider the (ultra) high-dimensional setup and propose a minimax optimal test procedure in terms of the statistical power for the testing problem in (1). We show that the distribution of the proposed max-type test statistic converges to a type I extreme value distribution under the null (Theorem 2.4). Therefore, the proposed test has the pre-specified significance level asymptotically. We also investigate the statistical power. Roughly speaking, we show that under some very mild conditions on off-diagonal elements of , the power will converge to 1. Further, we prove that the proposed test is minimax rate-optimal over a large class of (Theorems 2.5 and 2.6).

Our construction of the test statistic combines a bias correction and a variance correlation based on the sample covariance matrix , where we treat each row of X as a sample. The bias correction technique allows us to handle the ultra high-dimensional case. Moreover, the variance correlation technique deals with the correlation structure among “row samples” of X, which is specified by . To characterize the strength of correlation among row samples, we identify a key quantity , which comes from the asymptotic variance of our bias-corrected statistic. Here, denotes the Frobenius norm and is the trace of . To simultaneously control the type I error under null and maintain the minimax rate-optimal statistical power, we need a ratio consistent estimator of regardless of the correlation among samples. Therefore, the remaining task essentially reduces to the problem of estimating from correlated samples.

It is noteworthy that estimating itself is an important problem, which is known as quadratic functional estimation of (see, e.g., Bai and Saranadasa (1996); Chen and Qin (2010); Fan, Rigollet and Wang (2015)). Most existing works are based on the assumption that samples are independent and identically distributed (i.i.d.) and, thus, cannot be directly applied to our problem. Motivated by the thresholding estimator in Fan, Rigollet and Wang (2015), we propose a plugin estimator for based on a thresholded sample covariance matrix but we relax the independence assumption among samples. Further, we propose a definite threshold level, which is adaptive to the amount of correlations among samples and guarantees the consistency of the resulting estimator. Our simulation results demonstrate the superior performance of the proposed estimator of over the existing approaches, which leads to a significant improvement in statistical power.

In summary, we propose a simple max-type test statistic to conduct the global test of independence among high-dimensional random samples in (1). Our approach has the following advantages:

  1. Our construction is direct and computationally attractive, which only requires the row sample covariance matrix and a threshold estimator of . Further, our test statistic is completely tuning free.

  2. The limiting null distribution is characterized and thus the type I error is controlled asymptotically. Further, our test procedure is minimax rate-optimal over a sufficiently large class of , which is enough for most practical purposes.

  3. As an important by-product, we provide a ratio-consistent estimator for estimating quadratic functional of covariance matrix from correlated samples.

We would like to note that we only focus on the matrix-variate normal distribution, which is a common assumption for studying a transposable data matrix and widely used for modeling correlated microarray data. It is of interest to investigate the independence test for more general distributions, e.g., a matrix elliptical distribution (Dawid, 1977; Fang and Zhang, 1990) or , where entries of are i.i.d. random variables with unit variance. We leave the extension to such distributions of data matrices for future work.

After we conduct the independence test, if the samples are indeed correlated, many classical inference approaches cannot be directly applied. We use the multiple testing problem of Pearson’s correlation coefficients to illustrate the effect of the correlation among samples, demonstrate the reason why the classical approach will fail when samples are correlated and further develop a new method to de-correlate the samples. In particular, we consider the following large-scale multiple testing problem, for ,

(3)

Problem (3) is a natural extension of the global test of independence in (2). In fact, the hypothesis that is a diagonal matrix is a strong null hypothesis, which will be rejected in most real data applications (e.g., microarray data, stock data). In contrast, the goal of the multiple testing problem (3) is to identity the pairs of correlated variables and thus find many applications in real data analysis, e.g., gene coexpression network analysis (Lee, Hsu and Sajdak, 2004; Carter et al., 2004; Zhu et al., 2005; Hirai, 2007), and brain connectivity analysis (Shaw, 2006). The goal of the testing problem in (3) is consistent with the goal of support recovery of a sparse . The latter problem has been extensively studied in recent years (e.g., see Rothman, Levina and Zhu (2009), Lam and Fan (2009), Cai and Liu (2011), Bien and Tibshirani (2011)). These works establish consistency results of support recovery from independent samples under certain conditions, for example, all the absolute values of nonzero are lower bounded by , which might be hard to hold in practice. Instead of trying to achieve the perfect support recovery, the multiple testing problem (3) has a more refined control of the type I error rate in support recovery under weaker assumptions. In particular, it usually aims to control the false discovery rate (FDR), which is a useful measure for evaluating the performance of support recovery. We also note that Cai and Liu (2015) recently studied problem (3) in a high-dimensional setting but it still requires the independence assumption.

For correlated samples from a matrix-variate normal distribution, we first establish the following result on the limiting distribution of the sample correlation coefficient (see Proposition 3.1):

(4)

where , which quantifies the strength of the correlation among samples. Eq. (4) subsumes as a special case the classical results on the limiting distribution of when samples are i.i.d. ( in (4)) (see Theorem 4.2.4 in Anderson (2003)). When the correlation is strong to a certain extent such that for some constant , directly using sample correlation coefficient or Fisher’s statistic will lead to many false positives; this is verified by our simulations in Section 4.2. In fact, even if is known and one uses the correct limiting null distribution of , the variance of becomes larger as increases, which leads to a lower power of the test.

To overcome the side effect of correlation among samples, we propose a “sandwich estimator” of by de-correlating the samples, which has the limiting distribution . The corresponding asymptotical variance does not depend on and is smaller than that of the naïve estimator . Therefore the proposed “sandwich estimator” has an improved statistical power especially when the correlation among samples is strong. Based on the proposed “sandwich estimator” of , the standard multiple testing procedure (Benjamini and Hochberg, 1995) is proven to asymptotically control the FDR at the nominal level (see Theorem 3.2).

Finally, we introduce some necessary notations. For a positive integer , . For a square matrix , let denote the trace of , the maximum eigenvalue of , and the minimum eigenvalue of . Let be the indicator function that takes value one when the event is true and zero otherwise. For a given set , let be the cardinality of . For any two real numbers and , let and . We use and to denote limit superior and limit inferior, respectively. Throughout the paper, we use to denote the identity matrix, and use , , , etc. to denote constants for which values might change from place to place and do not depend on and .

The rest of the paper is organized as follows. In Section 2, we study the global test in (1). The test statistic is proposed in Section 2.1. In Section 2.2, we provide the ratio consistent estimator of and from correlated samples. The estimation error is characterized in Theorem 2.1. We further provide the limiting null distribution of the test statistic and the power analysis (Theorems 2.42.7). Section 3 studies the multiple testing of correlations in (3) from correlated samples. Experimental results are given in Section 4 followed by discussion in Section 5. The proofs of our results as well as some additional experimental results are provided in Appendix.

2 Sample independence test

We study the global testing problem of sample independence in (1) given the data matrix .

2.1 Construction of the test statistic

Recall that denotes the -th sample for and let . Define

(5)

In fact, from the proof, the statistic is the sample covariance coefficient corresponding to . Further, under the null , we can show that

(6)

The first term has mean and variance . The bias term comes from the centralization statistics in (5). When , we have and can be shown to converge to a normal distribution. However, as we are interested in the ultra high-dimensional case where can be as large as for some , when becomes larger such that , in probability under the null. To enable the applicability of our test statistic in the ultra high-dimensional setting, we first propose the following bias corrected quantity:

(7)

where is the sample variance corresponding to . Since the first term in (6) has variance , the asymptotic variance of is , where

(8)

quantifies the strength of correlations among row vectors of X.

Given in (8), we will show that under the null as ,

(9)

for , where the term plays the role of variance correction for . The remaining task is to develop a ratio consistent estimator for . In addition, to maintain the statistical power, the estimator should also be consistent for correlated samples. In Section 2.2, we will develop such an estimator for . Given the estimator (see (14)), we propose the following test statistic for the independence test in (1),

(10)

2.2 Estimation of and from correlated samples

The estimation of finds many applications and has been studied in several works (Bai and Saranadasa, 1996; Chen and Qin, 2010; Fan, Rigollet and Wang, 2015). However, all these works rely on the sample independence assumption. In particular, Fan, Rigollet and Wang (2015) proved that the simple plug-in procedure based on threshold estimators are minimax optimal over a large class of covariance matrices. Moreover, the threshold level in Fan, Rigollet and Wang (2015) takes the form of , where the constant needs to be carefully tuned to achieve good performance in practice. A cross-validation (CV) procedure was suggested; however, there is no theoretical justification for such a CV procedure. In this section, we introduce a threshold estimator for with an explicit threshold level, which is completely data-driven without any tuning and automatically adaptive to the correlation among samples. We will show in Theorem 2.1 that the obtained estimator is ratio-consistent for correlated samples.

Let us define the (column) sample covariance matrix with and sample correlation coefficient for . Further, define

(11)

which quantifies the average correlation among samples. It can be shown that (see Proposition 3.1 in Section 3 and note that in probability). We propose the following threshold estimator , where

(12)

Here, is an estimator of and can be any constant larger than . Let . Using the approach from Bai and Saranadasa (1996), we construct

(13)

Given the threshold estimator in (12), the is estimated by and is estimated by

(14)

Now we will show that and are ratio-consistent estimators of and , respectively. We first make the following three assumptions throughout this section. Let be the eigenvalues of and be eigenvalues of . We make the following standard assumption on eigenvalues:

(C1) We assume that and for some constant .

The condition (C1) is a typical eigenvalue assumption in high-dimensional covariance estimation literature (see the survey Cai, Ren and Zhou (2016) and references therein). This assumption is natural for many important classes of covariance matrices, e.g., bandable, Toeplitz, and sparse covariance matrices. There are cases that the assumption (C1) is violated, e.g., when the covariance matrix has equal correlation structure (i.e., for some ). Our result will not hold for such a setting and please refer to Figure 3 for the experimental illustrations.

We also note that this condition can be weakened by replacing the constant by some at a certain rate. However, for the sake of simplicity, we do not intend to seek the optimal rate of . We only mention that this type of constraint on eigenvalues is needed in our problem. Without this type of constraints, in (7) will no longer be asymptotic normal because the Lindeberg’s condition for the central limit theorem (CLT) of independent random variables (see the expression of in Eq. (67) in Appendix) is violated. Thus, our result on type I error rate control in Proposition 2.3 will no longer hold.

The second condition is also a standard assumption on the norm of each row of and .

(C2) For some , assume that uniformly over each row and uniformly over each row .

Notably, the upper bounds on eigenvalues of and in (C1) only imply the -boundedness of each row of and , i.e., and . The condition (C2) is stronger than this implication by noticing that . Moreover, when , this assumption becomes the typical weak sparsity assumption in high-dimensional covariance estimation.

The third assumption is on the relationship between and .

(C3) We assume that for some universal constant that does not depend on and . We further assume that with for some .

The first condition is quite natural in a high-dimensional setting and the second condition allows us to deal with an ultra high-dimensional setting.

Under these three assumptions, we provide the following theorem, which establishes the ratio consistency of the estimators and .

Theorem 2.1.

Assume that (C1)-(C3) hold. For any , we have and .

According to Theorem 2.1, we will simply set in the estimator in our experiment. In fact, the experimental results are quite robust with respect to the choice of . As long as the is above and does not take a too large value, the experimental results will not be affected.

Due to the term in the thresholding level, our estimator is adaptive to the correlations between the samples. We next show that, even when , if we use the thresholding level designed for i.i.d. samples without as in Fan, Rigollet and Wang (2015), the resultant estimator will over-estimate and, hence, reduce the power. In particular, define the thresholding estimator

and . Fan, Rigollet and Wang (2015) showed that, under the i.i.d. assumption, for a large constant-valued (not depending on ), attains the minimax-optimal rate for estimating . Let . When the samples are correlated, is no longer a ratio consistent estimator for and hence results in a poor estimator for .

Proposition 2.2.

Assume that and (C1)-(C3) hold. For any and , there is a class of covariance matrices with such that as .

Proposition 2.2 shows that will over-estimate when . If is used to estimate , then the resultant testing approach will be less powerful than the test with our estimator . We will further show the impact of on the power in the simulation.

2.3 Type I error rate control and optimality of statistical power

The following proposition gives the limiting distribution of .

Proposition 2.3.

Assume that for some constant (which does not depend on and ) and (C1) holds. Under the null , for , we have as

In Proposition 2.3, the test statistic can be viewed as a sample correlation coefficient related with . We first note that Proposition 2.3 cannot be implied by Theorem 4 in Cai and Jiang (2011). Let us denote the sample correlation coefficient by

(15)

Cai and Jiang (2011) established the limiting distribution of for . Their result requires that random vectors for in the sum in (15) are i.i.d. On the contrary, our statistic is based on , which is a sum of potentially correlated random variables, no matter under the null or alternatives.

In addition, it is worthwhile to note that Cai and Jiang (2012) revealed an interesting phase transition phenomenon in the limiting distribution of the largest off-diagonal entry of the sample correlation matrix. There are different regimes for large , in which the limiting distributions are different. In contrast, in our problem, there is no such a phase transition phenomenon and the limiting distribution is unified in the high-dimensional setting when . To see this more clearly, let us assume that for are independent so that the results in Cai and Jiang (2012) are valid. Now, the quantity is the sample size and is the dimension. According to Corollary 2.2 of Cai and Jiang (2012), there is a phase transition phenomenon for the distribution of the statistic in Proposition 2.3 (i.e., ) between two regimes and . In our high-dimensional setting, we have , which belongs to the first regime . Thus, there is no phase transition phenomenon in the high-dimensional setting.

Using Theorem 2.1, we provide the limiting null distribution of our test statistic in the next theorem.

Theorem 2.4.

Assume that (C1)-(C3) hold. Under the null , we have

(16)

for , as .

Remark: In Theorem 2.4, we need the additional assumption in (C3), which is used to obtain a ratio-consistent estimator of and . If we consider only the limiting distribution of the test statistic under the null (i.i.d. samples), one may use the method from Chen and Qin (2010) to estimate . The estimator from Chen and Qin (2010) does not require the condition . However, in terms of statistical power, as we have shown in our simulations, their estimator will over-estimate (see Figure 6 in Appendix) and reduce the power (see Figure 1) especially when the correlation among samples is strong. Our estimator is ratio-consistent for both null and alternative (see Theorem 2.1) under the extra condition . For the thresholding estimator, such a condition on is necessary. To see this, if is much larger than , then the thresholding level in (12) is much larger than one. Thus, becomes diag and will no longer be consistent. As a future direction, it would be interesting to construct a consistent estimator for and under the null and alternative simultaneously without the restriction on .

According to Theorem 2.4, for a given significance level , we reject the null hypothesis whenever where is the quantile of the type I extreme value distribution with the cumulative distribution function (CDF) , i.e.,

(17)

Theorem 2.4 shows that the proposed test statistic controls the type I error rate at the nominal level asymptotically.

We now turn to the power analysis. For a given pair of , let us define

(18)

and

(19)

The next theorem shows that for a large class of , the null hypothesis will be rejected by our test with probability tending to one.

Theorem 2.5.

Assume that (C1)-(C3) hold and suppose that for some and all large enough and ,

(20)

We have as .

We next show that our test statistic is minimax rate optimal for statistical power even when and are known. To this end, we introduce a class of covariance matrix for for some as follows:

Let be the set of -level tests with and being known, i.e. Here means the rejection of .

Theorem 2.6.

Let and . Assume that (C3) holds. For any , we have

(21)

Theorem 2.6 shows that for any -level test and any , there must exist a covariance matrix such that the probability of rejecting the null is less than asymptotically for any . Theorems 2.5 and 2.6 together show that the proposed test based on is minimax rate optimal by noting that for some constant according to the condition (C1). In other words, the order of the lower bound on cannot be improved, which establishes the minimax-optimal rate for the test. Moreover, when , we have and, hence, our test statistic is also minimax constant optimal.

We further show that (20) is a rather wide class of in the sense that if (20) does not hold, it will be safe to assume the independence for some applications. In particular, assume that is a sparse matrix, i.e., the number of nonzero elements in each row of is bounded from above by . Then by (18),

Thus, a sufficient condition for (20) to hold is and

(22)

Theorem 2.5 shows that under (22), the null hypothesis will be rejected with probability tending to 1. In fact, when (22) does not hold, the samples can be safely treated as independent for some applications. Let us take the multiple testing problem of correlations in (3) as an example. As we discussed in the introduction, the effect of the correlation among samples is quantified by and when , the limiting distribution of in (4) will be the same as the limiting distribution of estimated from independent samples. Indeed, when (22) does not hold and for some , then we have (note that for ). Thus, the correlation among samples is asymptotically negligible.

We next give a more general result on the relation between the lower bound of , and . Here we only assume that and is a function of (note that can be a constant). Let

Theorem 2.7.

Let and . For any and satisfying

(23)

we have

Theorem 2.7 shows that when the dimension is fixed, it is impossible to reject correctly for all with probability greater than , even when the lower bound is close to one. It is easy to understand since the role of and is interchanged in our setting (we are testing an covariance matrix with row samples). It also indicates that the independence test problem (1) is essentially different from the serial independence test in time series analysis. When for some constant , we must require for some such that the independence testing problem (1) is solvable over . Note that Pan, Gao and Yang (2014) requires , which means that their method fails to deal with the setting . On the other hand, by (22), such a setting of minimum signal can be solved by the proposed test.

3 Multiple testing of correlations with correlated observations

As we mentioned in the introduction, when the independence hypothesis in (1) is rejected, there is potential risk of using inference methods developed based on independence assumption. To illustrate the effect of sample correlations, we study an important high-dimensional problem —the large-scale multiple testing of correlations when the samples are correlated, i.e.,

(24)

When the samples are i.i.d. and normally distributed, the following classical result from Anderson (2003) (Theorem 4.2.4) establishes the limiting distribution of the sample correlation coefficient ,

(25)

However, when the samples are correlated, the limiting distribution of in (25) does not hold. In fact, we can prove the following proposition.

Proposition 3.1.

Assume that the condition (C1) holds. We have

(26)

where .

The term is same quantity as in (11), which represents the average correlation among samples. When the sample correlation is strong enough to extent such that , the multiple testing procedure based on (25) (e.g., Benjamini—Hochberg (BH) procedure, Benjamini and Hochberg (1995)) will lead to many false positives. In fact, even when the correct limiting distribution in (26) is used, the resulting test will lose statistical power. For simplicity, let us consider a single testing problem . To control the type I error rate when the samples are correlated, we need a larger critical value for , which is linear in . That is, the rejection region should be , where is the standard normal CDF function. Plugging in a ratio-consistent estimator of (e.g., using the method developed in Section 2.2 to estimate ), we will obtain a test that controls the type I error rate asymptotically. However, such a test will lose statistical power since the length of the acceptance region grows with the strength of the correlation among samples.

In this section, we propose a multiple testing procedure that asymptotically controls the FDR at the nominal level while maintaining good statistical power. Our method is based on the construction of a “sandwich estimator” of by de-correlating the samples. In particular, first assume that and are known. We transform the data X into and columns for are i.i.d. from . The corresponding “sample” covariance matrix of Y is (“sample” is quoted here since and are unknown and thus is not a real sample covariance matrix):

(27)

Let be the “sample” correlation coefficient matrix. By (25), we have

(28)

which implies that the performance of the test statistic is the same as that of for independent samples. By comparing (28) and (26), the asymptotic variance of the sandwich estimator is always smaller than that of the sample correlation coefficient as . Therefore, even when is bounded by a constant, the sandwich estimator is more powerful.

To obtain an estimate of , we need to estimate and . Let be the estimator of , where for . For estimating , we adopt the CLIME estimator proposed in Cai, Liu and Luo (2011). In particular, following Cai, Liu and Luo (2011), we assume that is a weakly sparse matrix, which belongs to the class,

(29)

where , and the relationship among , and will be specified in the condition of Theorem 3.2. Let , where is defined in (5), and be any optimal solution of the following optimization problem,

(30)

Here, , is a sufficiently large constant,