Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis

Yin Xia
Department of Statistics and Operations Research,
University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, U.S.A.
Email: xiayin@email.unc.edu
and
Lexin Li
Division of Biostatistics,
University of California at Berkeley, Berkeley, CA 94720, U.S.A.
Email: lexinli@berkeley.edu
Abstract

Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is characterized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statistical dependences that are often encoded in terms of partial correlations. Such a graph is inferred from matrix-valued neuroimaging data such as electroencephalography and functional magnetic resonance imaging. There have been a good number of successful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this family of solutions do not offer a statistical significance quantification for the estimated links. In this article, we adopt a matrix normal distribution framework and formulate the brain connectivity analysis as a precision matrix hypothesis testing problem. Based on the separable spatial-temporal dependence structure, we develop oracle and data-driven procedures to test the global hypothesis that all spatial locations are conditionally independent, which are shown to be particularly powerful against the sparse alternatives. In addition, simultaneous tests for identifying conditional dependent spatial locations with false discovery rate control are proposed in both oracle and data-driven settings. Theoretical results show that the data-driven procedures perform asymptotically as well as the oracle procedures and enjoy certain optimality properties. The empirical finite-sample performance of the proposed tests is studied via simulations, and the new tests are applied on a real electroencephalography data analysis.

plus 0.3ex

Key Words: Connectivity analysis; False discovery rate; Gaussian graphical model; Matrix-variate normal distribution; Multiple testing.

1 Introduction

In recent years, matrix-valued data are becoming ubiquitous in a wide range of scientific and business applications, including bioinformatics (Yin and Li, 2012), brain imaging analysis (Reiss and Ogden, 2010; Aston and Kirch, 2012; Zhou and Li, 2014), finance (Leng and Tang, 2012), among many others. Accordingly, the matrix normal distribution is becoming increasingly popular in modeling the matrix-variate observations (Zhou, 2014). Our motivating example is an electroencephalography (EEG) data, which measures voltage value from electrodes placed at various brain locations over a period of time for a group of alcoholic subjects and normal controls. One scientific goal is to infer the connectivity patterns among those spatial locations. More generally, accurate and informative mapping of the human connectivity network is now at the center stage of neuroscience research. The objective is to infer brain connectivity network, which is commonly characterized as a graph consisting of nodes and links. Here nodes represent neural elements, from micriscopic neurons to macroscopic brain regions, and links represent statistical dependencies between neural components (Johansen-Berg, 2013). Partial correlations, reported by a precision matrix, are frequently employed to describe such statistical dependencies (Fornito et al., 2013). This precision matrix, in turn, is to be derived from imaging modalities, such as EEG, magnetoencephalography, and functional magnetic resonance imaging. The data of those imaging modalities are in the common form of a two-dimensional matrix, with one spatial dimension and the other temporal dimension.

Adopting a matrix normal distribution framework, we formulate the brain connectivity network analysis as a precision matrix inference problem. Specifically, let denote the spatial-temporal matrix data from an image modality, e.g., EEG. It is assumed to follow a matrix normal distribution with the Kronecker product covariance structure,

where the operator vec stacks the columns of the matrix to a vector, is the Kronecker product, denotes the covariance matrix of spatial locations, denotes the covariance matrix for time points. Correspondingly,

where is the spatial precision matrix, is the temporal precision matrix. In brain connectivity analysis, our primary interest is to infer the the connectivity network characterized by the spatial precision matrix . By contrast, the temporal precision matrix is of little interest here and is to be treated as a nuisance parameter in our analysis. We also make some remarks regarding the assumptions of our adopted framework. First, the matrix normal assumption has been frequently adopted in various applications (Yin and Li, 2012; Leng and Tang, 2012), and is scientifically plausible in neuroimaging analysis. For instance, the majority standard neuroimaging processing software, such as SPM (Friston et al., 2007) and FSL (Smith et al., 2004), adopt a framework that assumes the data are normally distributed per voxel (location) with a noise factor and an autoregressive structure, which shares a similar spirit as the matrix normal formulation. Second, it is commonly assumed that the precision matrix is sparse, which we adopt for our inferential procedure as well. Again, this sparsity assumption is scientifically justifiable, as it is known that brain region connections are energy consuming (Olshausen and Field, 2004), and biological units tend to minimize energy-consuming activities (Bullmore and Sporns, 2009).

In this article, we aim to address the following two hypothesis testing questions. The first is to test if all spatial locations are conditionally independent, namely, we test the null hypothesis

(1)

The second is to identify those conditionally dependent pairs of locations with false discovery rate and false discovery proportion control; i.e., we simultaneously test

(2)

where is the th element of .

In the literature, there have been a good number of methods proposed to estimate a sparse precision matrix under normal distribution (Meinshausen and Bühlmann, 2006; Yuan and Lin, 2007; Friedman et al., 2008; Yuan, 2010; Ravikumar et al., 2011; Cai et al., 2011). There are extensions of this line of work from a single precision matrix to multiple precision matrices (Danaher et al., 2014; Zhu et al., 2014), and from a Gaussian distribution to a more flexible class of nonparanormal distribution (Liu et al., 2012). Extension of sparse precision matrix estimation has also emerged for matrix-valued data under matrix normal distribution (Yin and Li, 2012; Leng and Tang, 2012; Zhou, 2014). However, all those methods tackle the estimation aspect of the problem and induce a connectivity network from an estimated precision matrix. Only recently, hypothesis testing procedures have been developed under Gaussian graphical model. In particular, Liu (2013) proposed a testing procedure to recover a network in the one-sample case, whereas Xia et al. (2015) proposed a method to test the equality of two precision matrices, so to infer the differential network structures in genomics. Both papers, however, worked with vector normal data instead of matrix normal data.

We aim at hypothesis testing for the spatial precision matrix, under the matrix normal framework, to induce the connectivity network of brain regions. We separate the spatial and temporal dependence structures, then infer the precision matrix through inverse regression models by relating its elements to the regression coefficients. Two procedures are considered. One is to assume the temporal covariance matrix is known, and we term the resulting method as an oracle procedure. The other is to use a data-driven approach to estimate and plug in the temporal covariance matrix, and accordingly we term it a data-driven procedure. We construct test statistics based on the covariances between the residuals from the inverse regression models. We first develop a global test for (1), and show it is particularly powerful against a sparse alternative, then develop a multiple testing procedure for simultaneously testing the hypotheses (2) with false discovery rate and false discovery proportion control. We study the theoretical properties of the two testing procedures, in both the oracle and data-driven scenarios. We show that the data-driven procedure performs asymptotically as well as the oracle procedure, and enjoys certain optimality under the regularity conditions. Our numerical analysis also supports such findings.

Our contributions are multi-fold. First, brain connectivity analysis is now becoming a central goal of neuroscience research (Fornito et al., 2013), and it constantly calls for statistical significance quantification of the inferred connection between neural elements. However, there is a paucity of systematic hypothesis testing solutions developed for this type of problems in the literature, and our proposal offers a timely response. Second, although various network sparse estimation methods have been successfully applied in neural connectivity analyses, network hypothesis testing is an utterly different problem than estimation. The key of estimation is to seek a bias-variance tradeoff, and many common sparse estimation solutions such as graphical lasso (Friedman et al., 2008) and constrained -minimization for inverse matrix estimation (Cai et al., 2011, clime) are biased estimators. Such estimation methods do not produce a direct quantification of statistical significance for the network edges. By contrast, hypothesis testing starts with an nearly unbiased estimator, and produces an explicit significance quantification. Third, among the few network hypothesis testing solutions, Liu (2013) and Xia et al. (2015) focused on a vector-valued following a normal distribution rather than a matrix-valued data. Directly applying their methods to test the spatial conditional dependence, with no regard for the separable structure of and , is equivalent to assuming the columns of the matrix are independent. This is clearly not true as the measurements at different time points can be highly correlated. Thus it is important to separate the spatial-temporal dependence structure before testing.

The following notations are used throughout this article. For any matrix normal data , with , denotes the -th spatial location at the -th time point for the -th sample, and ; denotes the column spatial vector with the -th entry removed, and . For a vector , denotes the vector by removing the th entry from . For an data matrix , denotes an matrix , , , , where , and . Furthermore, for any , denotes the th row of with its th entry removed, and denotes the th column of with its th entry removed. denotes the submatrix of with its th row and th column removed. If is symmetric, then and denote the largest and smallest eigenvalues of . We also use the following norms. For a vector , define its norm as for . For a matrix , the matrix -norm is the maximum absolute column sum, . The matrix element-wise infinity norm is and the element-wise norm is . Finally, for two sequences of real numbers and , if there exists a constant such that holds for all , if , and if there are positive constants and such that for all .

2 Methodology

In this section, we derive test statistics and testing procedures for the global hypothesis (1), and the entry-wise hypothesis (2) with false discovery rate control.

2.1 Separation of spatial-temporal dependency

Let , each with dimension , denote i.i.d. random samples from a matrix normal distribution. The mean, without loss of generality, is assumed to be zero, and the covariance is of the form . Our interest is to infer about , while treating as a nuisance. To separate the spatial and temporal dependence structures, we build hypothesis testing procedures for (1) and (2) based upon the linear transformation of the original samples . Specifically, we consider two scenarios. We first treat as known, and term the resulting testing procedure an oracle procedure. In practice, however, is rarely known, and as such we plug in an estimator of . When trace holds true, the sample covariance matrix is an unbiased estimator of . However, when trace, is biased and we have trace. As we will show in Remark 2 in the next section, trace does not affect our proposed test statistics, and thus we can assume without loss of generality that trace to simplify the notations. Subsequently, we plug in the sample covariance matrix , develop the hypothesis testing based on the transformed samples, , and term it a data-driven procedure. One may also use other estimators of or , and we will briefly discuss those alternatives in Section 6.

2.2 Test statistics

We first develop test statistics for the two hypotheses in the oracle case. The development of the data-driven statistics is very similar, so we omit the details and will remark clearly the difference between the oracle and data-driven cases. For simplicity, we also use the same set of notations for the two scenarios, and will only differentiate them when we study their respective asymptotic properties in Section 3.

It is well established that, under the normal distribution, the precision matrix can be described in terms of regression models (Anderson, 2003, Sec 2.5). Specifically, letting , , denote the transformed samples, we have,

(3)

where is independent of . The regression coefficient vector and the error term satisfy that

As such, the elements of can be represented in terms of . Next, we construct an estimator of and its bias-corrected version. We then build on this estimator to obtain an estimator of , upon which we further develop our test statistics.

A natural estimator of is the sample covariance between the residuals ,

where , , are estimators of that satisfy Condition (C1) in the oracle case and satisfies Condition (C1) in the data-driven case, and these estimators can be obtained via standard estimation methods such as the Lasso and Dantzig selector, as will be discussed in Section 3.1. When , however, tends to be biased due to the correlation induced by the estimated parameters. We thus consider a bias-corrected estimator of ,

For , we let , which is a nearly unbiased estimator of . An estimator of the entry of the spatial precision matrix can then be constructed as,

To further estimate the variance of , note that

(4)

where . Then can be estimated by

Given are heteroscedastic and can possibly have a wide variability, we standardize by its standard error, which leads to the standardized statistics,

In the next section, we test hypotheses (1) and (2) based on .

Remark 1

Construction of the test statistics for the data-driven procedure is almost the same as that for the oracle procedure, except that the oracle procedure starts with transformed sample in (3), whereas the data-driven one replaces it with . Furthermore, the regression coefficients slightly vary at different time points in the data-driven scenario, and we shall replace (3) by , for .

Remark 2

When is unknown, trace. If trace, with , an unbiased estimator of becomes . Accordingly, we shall define the transformed data , for . Then we have the bias-corrected estimator , which in turn leads to , and . Thus, the standardized statistic remains the same, as the constant is cancelled. Therefore, does not affect our final test statistics, and thus for notational simplicity, we set from the beginning, without loss of any generality.

2.3 Global testing procedure

We propose the following test statistic for testing the global null hypothesis is diagonal,

Furthermore, we define the global test by

where is the quantile of the type I extreme value distribution with the cumulative distribution function , i.e.,

The hypothesis is rejected whenever .

The above test is developed based on the asymptotic properties of , which will be studied in detail in Section 3.2. Intuitively, are approximately standard normal variables under the null distribution, and are only weakly dependent under suitable conditions. Thus is the maximum of the squares of such random variables, and its value should be close to under . We will later show that, under certain regularity conditions, converges to a type I extreme value distribution under .

2.4 Multiple testing procedure

Next we develop a multiple testing procedure for , so to identify spatial locations that are conditionally dependent. The test statistic defined in Section 2.2 is employed. Since there are simultaneous hypotheses to test, it is important to control the false discovery rate. Let be the threshold level such that is rejected if . Let be the set of true nulls. Denote by the total number of false positives, and by the total number of rejections. The false discovery proportion and false discovery rate are then defined as

An ideal choice of would reject as many true positives as possible while controlling the false discovery rate and false discovery proportion at the pre-specified level . That is, we select

We shall estimate by , where is the standard normal cumulative distribution function. Note that can be estimated by due to the sparsity of . This leads to the following multiple testing procedure.

  1. Calculate the test statistics .

  2. For given , calculate

    If does not exist, set .

  3. For , reject if and only if .

3 Theory

In this section, we analyze the theoretical properties of the global and multiple testing procedures for both the oracle and data-driven scenarios. We show that the data-driven procedures perform asymptotically as well as the oracle procedures and enjoy certain optimality under the regularity conditions. For separate treatment of the oracle and data-driven procedures, we now distinguish the notations of the two, and add the superscript “” to denote the statistics and tests of the oracle procedures, e.g., , and the superscript “” to denote those of the data-driven procedures, e.g., , and .

3.1 Regularity conditions

For the oracle procedure, we require the following set of regularity conditions.

  1. Assume that , and .

  2. Assume that , and there are constants such that, , and .

  3. Let be the diagonal of and let with elements . Assume that , for some constant .

For the data-driven procedure, we replace the above condition (C1) with a slightly different one (C1), then introduce a new condition (C4).

  1. Assume that ,
    and .

  2. Define , where . Assume that
    , where , and .

A few remarks are in order. The estimator satisfying (C1) can be easily obtained via standard estimation methods such as Lasso and Dantzig selector. For instance, if one uses the Lasso estimator, then (C1) is satisfied under (C2) and the sparsity condition . Similarly, satisfying (C1) can be obtained by Lasso if (C4) holds and the data-driven regression coefficients satisfy the similar sparsity condition. Conditions (C2) and (C3) are regularity conditions that are commonly used in the high-dimensional hypothesis testing setting (Cai et al., 2013; Liu, 2013; Xia et al., 2015). (C4) is a mild technical condition. If , and satisfy and , for some constant , then the conditions on matrix 1-norms can be relaxed to the conditions only related to and , namely, and .

3.2 Oracle global testing procedure

We next analyze the limiting null distribution of the oracle global test statistic and the power of the corresponding test . We are particularly interested in the power of the test under the alternative when is sparse, and show that the power is minimax rate optimal.

The following theorem states the asymptotic null distribution for , and indicates that, under , converges weakly to a Gumbel random variable with distribution function .

Theorem 1

Assume (C1), (C2) and (C3). Then under , for any ,

Under , the above convergence is uniform for all satisfying (C1)-(C3).

We next study the power of the corresponding test . We define the following class of precision matrices for spatial locations:

(5)

This class of matrices include all precision matrices such that there exists one standardized off-diagonal entry having the magnitude exceeding . By the definition in (4), is of the order , and thus we only require one of the off-diagonal entries to have size larger than for some constant , where is fully determined by and in Condition (C2). Then if we choose the constant , that is, if there exists one standardized off-diagonal entry having the magnitude larger or equal than , the next theorem shows that the null parameter set in which is diagonal is asymptotically distinguishable from by the test . That is, is rejected by the test with overwhelming probability if .

Theorem 2

Assume (C1) and (C2). Then,

The next theorem further shows that this lower bound is rate-optimal. Let be the set of all -level tests, i.e., under for all .

Theorem 3

Suppose that . Let and . Then there exists a constant such that for all sufficiently large and ,

As Theorem 3 indicates, if is sufficiently small, then any level test is unable to reject the null hypothesis correctly uniformly over with probability tending to one. So the order in the lower bound of in (5) cannot be further improved.

3.3 Oracle multiple testing procedure

We next investigate the properties of the oracle multiple testing procedure. The following theorem shows that the oracle procedure controls the false discovery proportion and false discovery rate at the pre-specified level asymptotically.

Theorem 4

Assume (C1) and (C2), and let

Suppose for some , . Suppose for some , and for some . Letting , then,

in probability, as .

We comment that the condition in Theorem 4 is mild, because we have hypotheses in total and this condition only requires a few entries of having standardized magnitude exceeding for some constant .

3.4 Data-driven procedures

We next turn to data-driven procedures for both the global testing and the multiple testing. We show that they perform as well as the oracle testing procedures asymptotically.

Theorem 5

Assume (C1) , (C2)-(C4).

  1. Under , for any ,

    Under , the above convergence is uniform for all satisfying (C1), (C2)-(C4).

  2. Furthermore,

This theorem shows that has the same limiting null distribution as the oracle test statistics , and the power of the corresponding test performs as well as the oracle test and is thus minimax rate optimal. The same observation applies to Theorem 6 below, which shows that the data-driven multiple procedure also performs as well as the oracle case, and controls the false discovery proportion and false discovery rate at the pre-specified level asymptotically.

Theorem 6

Assume (C1) and (C4). Then under the same conditions as in Theorem 4,

in probability, as .

4 Simulations

We study in this section the finite-sample performance of the proposed testing procedures. For the global testing of (1), we measure the size and power of the oracle test and the data-driven version , and for the multiple testing of (2), we measure the empirical FDR and power. We compare the oracle and data-driven testing procedures, as well as a simple alternative that was developed by Xia et al. (2015) under normal instead of matrix normal distribution, which ignores the separable spatial-temporal structure. The temporal covariance matrix is constructed with elements , . The sample size and the number of time points is set at , and , , respectively, whereas the spatial dimension varies among . We have chosen this setting, since our primary interest is on inferring about spatial connectivity networks with different spatial dimensions. We keep the temporal dimension small, since it is a nuisance in our setup, and choose a relatively small sample size to reflect the fact that there is usually only a limited sample size in many neuroimaging studies.

For each generated dataset below, we use Lasso to estimate as

(6)

where is the data matrix by stacking the transformed samples , where for the oracle procedure and for the data-driven procedure, , , and is the sample covariance matrix of with transformed samples, and .

4.1 Global testing simulation

For the global testing, the data are generated from a matrix normal distribution with mean zero and precision matrix under the null. To evaluate the power, let be a matrix with eight random nonzero entries. The locations of four nonzero entries are selected randomly from the upper triangle of , each with a magnitude generated randomly and uniformly from the set . The other four nonzero entries in the lower triangle are determined by symmetry. We set , with , and choose the tuning parameter in (6).

The size and power, in percentage, of the global testing are reported in Table 1, based on 1000 data replications and the significance level . We see from Table 1 that the empirical sizes of the proposed oracle and data-driven procedures are well controlled under the significance level . However, for the vector normal based procedure that ignores the spatial-temporal dependence structure, there is a serious size distortion across all settings. The empirical sizes for the new procedures are slightly below the nominal level for high dimensions, due to the correlation among the variables. Similar phenomenon has also been observed and justified in Cai et al. (2013, Proposition 1). We also see from the table that the new procedures are powerful in all settings, even though the two spatial precision matrices differ only in eight entries with the magnitude of difference of the order . For both the empirical sizes and powers, the data-driven procedure is seen to perform similarly as the oracle procedure.

method


Size
oracle 3.6 3.5 2.8 2.9
data-driven 3.8 3.8 2.9 2.9
vector normal 36.4 56.7 64.7 75.3

Size
oracle 3.7 3.5 5.2 4.3
data-driven 3.5 3.5 5.1 4.1
vector normal 39.7 64.8 73.5 88.4
Power
oracle 77.9 85.6 87.7 90.9
data-driven 83.1 88.1 87.8 90.7
vector normal 86.2 94.1 95.0 99.1
oracle 62.4 74.3 68.2 75.2
data-driven 66.1 74.7 68.3 75.5
vector normal 76.2 92.1 93.1 90.1
Table 1: Empirical sizes and powers (%) for global testing. Three methods are compared: the proposed oracle and data-driven procedures based on the matrix normal distribution, and the simple alternative based on the vector normal distribution that ignores the spatial-temporal dependency. The results are based on 1000 data replications.

4.2 Multiple testing simulation

For the multiple testing, the data are generated from a matrix normal distribution with mean zero and precision matrix . Three choices of are considered:

  1. where , , and otherwise.

  2. where for and , . otherwise. with .

  3. where , for and . with .

We select the tuning parameters in (6) in the Lasso estimation adaptively given the data, following the general principle of making and close. The steps of parameter tuning are summarized as follows.

  1. Let , for . For each , calculate , , and construct the corresponding standardized statistics .

  2. Choose as the minimizer of

  3. The tuning parameters are then set as,

For comparison, we also carry out the alternative procedure that ignores the Kronecker product structure by using the stacked original data samples