Tellipsoid: Exploiting intergene correlation for improved detection of differential gene expression
Abstract
Motivation:
Algorithms for differential analysis of microarray data are vital to modern biomedical research. Their accuracy strongly depends on effective treatment of intergene correlation. Correlation is ordinarily accounted for in terms of its effect on significance cutoffs. In this paper it is shown that correlation can, in fact, be exploited to share information across tests, which, in turn, can increase statistical power.
Results:
Vastly and demonstrably improved differential analysis approaches are the result of combining identifiability (the fact that in most microarray data sets, a large proportion of genes can be identified a priori as nondifferential) with optimization criteria that incorporate correlation. As a special case, we develop a method which builds upon the widely used twosample statistic based approach and uses the Mahalanobis distance as an optimality criterion. Results on the prostate cancer data of Singh et al. (2002) suggest that the proposed method outperforms all published approaches in terms of statistical power.
Availability:
The proposed algorithm is implemented in MATLAB and in R. The software, called Tellipsoid, and relevant data sets are available at www.egr.msu.edu/~desaikey.
1 Introduction
Detecting differentially expressed genes in the presence of substantial intergene correlation is a challenging problem. Research has focused largely on understanding the harmful effects of correlation on the threshold settings demarcating null and nonnull genes. In fact, however, the nominally confounding correlation can be exploited to increase statistical power of microarray studies. This observation has engendered a fruitful research direction introduced in this paper.
The literature is not devoid of attempts to develop more powerful summary statistics which exploit correlation, but such efforts have not produced compelling results. We posit that the limitations of such developments are due, at least in part, to neglecting identifiability – the fact that in most microarray data sets, a large proportion of genes requires no technical effort to be identified as explicitly nondifferential.
In this paper, we present a new differential analysis method which exploits identifiability through an optimization criterion which incorporates intergene correlation. The method builds upon the widelyused twosample statistic approach with a minimization of the Mahalanobis distance as the optimality criterion. Although this paper focuses primarily on twosample microarray studies, the framework is readily generalizable for use in studies with multiple or continuous covariates, as well as to other multiple comparison applications.
Let us assume that genes are measured on microarrays, under two different experimental conditions, such as control and treatment. Based on the gene expression matrix , we are interested in identifying a small number of genes that are responsible for class distinction. One widely used strategy [e.g., Tusher et al. (2001) and Efron (2004)] is to represent each gene by its null hypothesis and twosample statistic,^{2}^{2}2null = nondifferential; nonnull = differential say
Null hypotheses:  
Test statistics: 
The magnitudes of the statistics establish a generanking and the
(presumably) genes with the largest scores are reported as statistically significant discoveries. The investigator can either explicitly supply a or rely on the false discovery rate (FDR) calculations to find a maximal with the allowable FDR.
The issue of correctly estimating the FDR in the presence of correlation has received much recent attention because highly correlated tests increase the variance of the FDR leading to unreliable results (Owen, 2005). As discussed in Efron (2007) and Desai et al. (2007), for “over powered” ’s, there may be significantly fewer tailarea null counts than expected, while for “under powered” ’s, the situation can worsen with many more tailarea null counts than expected. Importantly though, techniques for correctly estimating the FDR do not change the generanking, but only the size of the reported list.
The present research was motivated by the notion that, for “under powered” ’s, it might be possible to exploit correlation across ’s to establish a gene ranking that has better statistical power than the raw based ranking. The method that resulted from an exploration of this question indeed seems to improve the statistical power of all ’s. The proposed method uses, (i) a vector of statistics () and (ii) an estimate of the covariance matrix of the vector , to output a substantially revised version of , denoted , whose corresponding entries can be used to establish an improved generanking.
The method is summarized as follows. Let^{3}^{3}3No distributional information nor higher order statistics of are assumed. . Now based on the measured value of , we can estimate , but while doing so we invoke the zero assumption (ZA) (Efron, 2006) that the smallest (%) of ’s are associated with null genes. Based on the ZA, we can set corresponding entries of to zero. For the remaining entries of we obtain minimum Mahalanobis distance (Mahalanobis, 1936) estimates.
Intergene correlation causes to distribute around its center of mass in an hyperellipsoidal manner, and the Mahalanobis distance is a natural way to measure vector distances in such a distribution. In fact, the name Tellipsoid is adopted to emphasize the geometric intuition of tracking the center of an ellipsoid. We have done extensive experimentation with both real and simulated data and found that for a truly null which happens to be in tailarea, the corresponding consistently tends to zero (its theoretical value).
Two prior research efforts were particularly useful in formulating the present approach. Storey et al. (2007), present a more general approach to the notion of sharing information across ’s, which they describe as “borrowing strength across the tests,” for a potential increase in statistical power. Tibshirani and Wasserman (2006), discuss a new statistic called the “correlationshared” statistic, and derive theoretical bounds on its performance; however, their approach requires strong assumptions regarding the nature of correlation between null and nonnull genes which may not hold in many real data sets. The proposed method requires no such assumptions and yet provides considerable increase in power as demonstrated by results reported in Section 4.
The outline of this paper is as follows. Section 2 defines, and obtains closed form expressions for, the minimum Mahalanobis distance estimates in the statistic space. Section 3 builds on the theory of Section 2 to develop the Tellipsoid generanking method. In Section 4 we apply Tellipsoid to real cancer data of Singh et al. (2002) and compare its accuracy with the stateoftheart differential analysis tools EDGE (Storey et al., 2007) and SAM (Tusher et al., 2001). Section 4 also discusses the implications of these results. We conclude with Section 5 summarizing the main ideas.
2 Approach
2.1 Per gene summary statistic
Let be the matrix of gene expressions, for genes and samples. We assume that the samples fall into two groups and and there are samples in group with . We start with the standard (unpaired) statistic:
(1) 
where is the mean of gene in group and is the pooled withingroup standard deviation of gene . If the gene is indeed null, then we expect , where is the number of degrees of freedom, obtained from either the unpaired test theory or the permutation null calculations as in Efron (2007). If gene is nonnull, then we expect . For nonnull genes, the values of and depend on the amount of up / down regulation, the number of samples in each group, and .
2.2 The zero assumption
Without loss of generality, we may assume that the genes are indexed so that
(2) 
Then, a reasonable way to impose identifiability on null genes is through the ZA, namely, that (%) of the genes – those with the smallest statistics – are null. Efron (2006) discusses the use of the ZA in a variety of differential analysis approaches. It plays a central role in the literature on estimating the proportion of null genes, as in Pawitan et al. (2005) and Langaas et al. (2005). The ZA is equally crucial for the twogroup model approach developed in the Bayesian microarray literature, as in Lee et al. (2000), Newton et al. (2001), and Efron et al. (2001). Furthermore, the claim that the method in Storey (2002) improves upon the wellknown Benjamini and Hochberg (1995) FDR procedure (in terms of statistical power) also crucially relies on an adaptive version of the ZA. The use of the ZA is justified in the present situation as long as is sufficiently small so that the bottom (%) genes would almost certainly be declared null for reasonable FDR’s.
Formally, the ZA is stated as follows: Let be the largest integer (gene index) such that , denoted
(3) 
then genes with indices are assumed null. Let us partition the set of statistics into those corresponding to genes declared null under the ZA, , and those for the remaining genes which continue to compete for the nonnull designation, . For convenience, we introduce the following vector notation,
Then the random vector is distributed in the following way:
(4) 
where is the underlying mean vector and the covariance matrix. The corresponding partitions of are denoted
The central hypothesis of this paper is that there is a vector, say , whose elements represent a reordering of the elements of the , such that generanking represented by has better statistical power for detecting nonnull genes than that based on itself.
(In the present paper, we focus on mainly the secondmoment distributional characteristics of . However, in fact, if the gene expressions are normally distributed, then, perhaps, is described more accurately by the multivariate Student distribution. Exploitation of this additional structure will be considered in future work.)
2.3 Estimating
We are interested in obtaining an estimate of the vector mean based on the observation . This requires an appropriate metric in the space of vectors, with which to quantify the distance of the observed from the center of mass , say . The norm induces a useful metric between and provided that we first decorrelate the vector elements as , thus yielding
(5)  
This weighted Euclidean distance is sometimes called the Mahalanobis distance in the pattern classification literature as in Deller Jr et al. (1993), Devijver and Kittler (1982), etc.
We can relate to through the Mahalanobis distance but while doing so we invoke the ZA, which, in turn, implies that the first entries of are zero. This yields the estimate
In effect, combines the identifiability information based on the ZA with the information about the covariance structure of which too can be obtained from the measured itself. Notably the optimization in Eqn. (2.3) enjoys closed form solution:
(7) 
The derivation leading from Eqn. (2.3) to Eqn. (7) is provided in the Appendix.
2.4 Estimating
To estimate the required entries of , we make several observations. The validity of these observations can be established through computer simulations using the MATLAB script testtcov.m available with the Tellipsoid software. (Equation (7) does not require the covariance between two nonnull ’s.)
Observation 1. If genes and both are null, then
(8) 
Efron (2007) and Owen (2005) use this observation for their conditional FDR calculations. This observation maybe intuitive to the reader from Eqn. (1) itself, or it is easily verified through a computer simulation.
Observation 2. Similarly, if the gene is null and nonnull (or conversely), then
(9) 
where denotes the correlation between gene and within group . Equation (9) accommodates the possibility that the correlation between a null and a nonnull gene may change across groups. If this does not occur, then Eqn. (9) reduces to Eqn. (8).
Observation 3. Furthermore, if (which is the case for most microarray studies), then Eqn. (9) simplifies to:
(10) 
Equations (8) and (10) suggest that we may use sample correlation to estimate :
(11) 
where denotes the expression level of the gene measured on the microarray after subtracting the average response within each treatment group. The scale factors cancel in the terms and , so that estimating /(2) [see Eqn. (8)] is not required.
2.5 Tellipsoid equation
In light of Eqn. (11), Eqn. (7) takes the practical form
(12) 
where is the sample correlation matrix of the gene expression matrix (after removing the treatment effects). In most cases computing the full matrix inverse () is not necessary and solving the term through some efficient linear solver reduces the computation considerably. (See Subsection 3.3 for detail).
3 Algorithm
3.1 Tellipsoid
This section outlines a selfcontained differential analysis algorithm based on the ideas discussed in Section 2. Its name Tellipsoid was coined to emphasize the geometric intuition of tracking the center of an ellipsoid.
Tellipsoid takes gene expression matrix and provides a specified number, , of the most differentially expressed genes. In principle, the ranking is based on the set from Eqn. (2.3). In practice, we rely on the estimates from Eqn. (12).
The algorithm begins by reindexing the genes based on their twosample statistics [Eqn. (2)]. Then, based on the ZA, the first genes are identified as null, as specified in Eqn. (3). By default, is set to 50(%). Although the choice 50% is somewhat arbitrary, this fraction has worked well empirically in the data sets tested. Future research may yield more rigorous methods for choosing .
In order to nullify any genuine treatment differences, is converted to by subtracting each gene s average response within each treatment group. The sample correlation matrix of is computed subsequently. The crucial step is to compute based on Eqn. (12). The elements of determine the generanking: A gene with bigger is assigned a higher rank. The first genes are reported as top statistical discoveries.
3.2 Numerical stability
Because the number of samples is often less than the number of genes , the sample correlation matrix turns out to be singular and hence noninvertible. Therefore, we add a very small correction term (= ) to its diagonal entries to make it nonsingular, and in effect, invertible. After this correction, Tellipsoid shows excellent numerical stability.
3.3 Computational complexity
Equation (12) involves matrix inversion, which, if performed in a naive way, could be a prohibitive operation, since microarray data sets may have several tens of thousand genes. Indeed, solving the term as a system of simultaneous linear equations () is much faster than explicitly computing . In particular, we can employ the Cholesky decomposition to exploit the fact that the matrix is symmetric and positive definite. MATLAB implementation of Tellipsoid uses the inbuilt linslove with appropriate settings, which, in turn, uses highly optimized routines of LAPACK (Linear Algebra PACKage—http://www.netlib.org/lapack/).
For the Prostate data (used in Section 4) with genes and samples, Tellipsoid, running on a computer with a 2.2 GHz dualcore AMD Opteron processor with 8 GB of RAM and MATLAB version R2006b, requires just under 40 seconds to report the final gene list. For the same settings, the implementation with explicit matrix inversions takes minutes.
3.4 Summary of Tellipsoid
Tellipsoid: An improved differential analysis algorithm
Input: Labeled gene expression matrix; Size of gene list
Output: The gene list containing top differentially expressed genes

Calculate twosample (unpaired) statistics:

Reindex genes such that

Gather first = ’s in a vector ; By default =50(%)

Convert to by subtracting each gene s average response within each treatment group

Compute = the sample correlation matrix of

Find = , where =

Determine generanking: Assign a gene with bigger a higher rank

Report top genes as statistical discoveries
4 Results and Discussion
We compare Tellipsoid with two of the leading techniques, SAM [Significance Analysis of Microarrays (Tusher et al., 2001)] and EDGE [Extraction and analysis of Differential Gene Expression (Leek et al., 2006)]. SAM adds a small exchangeability factor to the pooled sample variance when computing the twosample statistic: ; whereas EDGE is based on a general framework for sharing information across tests (see Storey et al. (2007)). EDGE is reported to show substantial improvement (in terms of statistical power) over five of the leading techniques including SAM (Storey et al., 2007). The other four are: (i) /–test of Kerr et al. (2000) and Dudoit et al. (2002); (ii) Shrunken /–test of Cui et al. (2005); (iii) The empirical Bayes local FDR of Efron et al. (2001); (iv) The a posteriori probability approach of Lonnstedt and Speed (2002). It is noteworthy that Tellipsoid shows a major improvement over EDGE itself.
To determine the performance quality of various techniques, we focus primarily on the empirical FDR in the reported gene list: Empirical FDR = , where NoFP = the number of false positives. Broadly speaking, smaller the FDR better the technique.
4.1 Prostate data
(Singh et al., 2002) studied genes on oligonucleotide microarrays, comparing healthy males with prostate cancer patients. The purpose of their study was to identify genes that might anticipate the clinical behavior of Prostate cancer. We downloaded the .CEL files from http://wwwgenome.wi.mit.edu/MPR/prostate. The software RMAExpress (Irizarry et al., 2003) was used to obtain high quality gene expressions from these .CEL files. We let RMAExpress apply its inbuilt background adjustment, however, the quantile normalization was skipped. Each gene was represented in the final expression matrix by the logarithm (base 10) of its expression level. Taking the is thought to increase normality and stabilize across group standard deviations (Tsai et al., 2003).
4.1.1 Data with known results
Algorithm testing required an expression matrix with the knowledge of truly nondifferential genes. At the same time, we wanted the intergene correlation in to resemble that in the real microarray data. These two seemingly conflicting requirements were satisfied concurrently by row standardizing a real . The prostate cancer matrix was transformed to by subtracting each gene’s average response within each treatment group, and by normalizing within group sample mean squares. That is, for each group , and . Here, the sum runs over corresponding samples only. With this transformation, all genes have equal energy and yet the same within group intergene correlation structure as the original .^{4}^{4}4Note. Normalizing within group sample mean squares to unity is not implemented in the Tellipsoid algorithm.
To generate a test data set from , its 102 columns were randomly divided into groups of 50 (=) and 52 (=). Next () genes were randomly chosen for up (down) regulation by adding a positive (negative) offset () to the corresponding entries in group 2. Various choices of were tested to represent a range of differential analysis scenarios encountered in practice.
4.1.2 Results for prostate data
Two cases were studied. In the first, the proportion of truly differential genes, say , was taken to be relatively small: 0.01–0.05. The second case employed a larger 0.1. The former simulates microarray studies seeking genes that distinguish subtypes of cancer, diabetes, etc., whereas the latter resembles studies comparing healthy versus diseased cell activity.
Results were obtained using the subroutines samr.r from the package “samr” and statex.r from the package “edge.” Both routines compute their native gene summary statistics given the matrix and corresponding column labels. These statistics, in turn, can be used to determine top genes.
Case 1 [, =200, =100, =0.1, and =0.1]. Figure 1 shows plots of the FDRs for 40 different data sets with the size of the reported list, =. A large value of coincides with an attempt to extract as many differential expressions as possible, a desired goal especially in microarray studies performed to identify genes that are to be explored further – experimentally or computationally – to gain better understanding of underlying gene networks. Since the differential signal =0.1 and =0.1 is rather weak, recovering a good list is not easy as evident from the results – among all methods only Tellipsoid achieved sufficiently low FDRs to rescue a few ’s.
Figure 1 presents results for =100. A smaller would be chosen to identify highquality class distinguishing features for geneexpressionprofilingbased clinical diagnosis and prognosis, where the goal is to build accurate classifiers and predictors. Whereas Singh et al. (2002) build a classifier around only 16 of 12625 features, they do discuss the need to include as many reliable features as possible. Remarkably, for 37 out of 40 matrices, Tellipsoid reports gene lists with no false discoveries at all, while the other techniques fail to obtain a single gene list with an FDR0.5.
Case 2a [, =600, =600, =0.02, and =0.02]. In this set of experiments, is increased, but the differential signal is reduced. This situation also proves to be challenging for the existing techniques. However, Tellipsoid provides the FDR of for =1200, and, again for =300, while it reports most gene lists with no false discoveries at all.
Case 2b [0.1, =600, =600, =0.1, and =0.1]. This subcase is designed to assess the effects of small sample sizes on performance. and are both reduced to 20. We randomly chose 20 columns per group from the original prostate cancer , and then applied the data generation process (including row standardization) detailed in Subsection 4.1.1. Reduction in the number of samples is compensated by increase in the differential signal. The FDRs for Tellipsoid, Fig. 3, are excellent suggesting that Tellipsoid increases power of small sample data sets too.
4.2 Simulated data
Before devising the test data setup of Subsection 4.1.1, Tellipsoid was tested on several simulated data sets. Below we discuss some simulation results that shed further light on the small sample behavior.
Let us denote by the column of a simulated expression matrix . We assume that the random vector is multivariate Gaussian with mean and covariance matrix . Each such column represents =3226 genes with a covariance matrix that introduces roughly the same amount of correlation as found in the BRCA data of Hedenfalk et al. (2001). We choose , and . Figure 4 shows plots of the FDRs for =50 and =100. Table 1 shows results for some realization from Fig. 4. Shown are the top 100 values of and each corresponding original with concomitant rank. With smaller , preeminence of Tellipsoid with respect to existing techniques scales down a bit. Nevertheless, for =50 case, for 25 out of 40 simulated realizations, Tellipsoid achieves a low FDR of 0.1 or less.
–  –  

rank  rank  rank  rank  
1  4.22  5.87  1  51  2.18  3.45  23 
2  4.17  5.55  2  52  2.17  3.05  42 
3  3.93  4.26  5  53  2.16  2.80  82 
4  3.74  4.12  7  54  2.15  2.57  122 
5  3.58  4.49  4  55  2.15  1.96  357 
6  3.49  3.34  28  56  2.14  1.47  751 
7  3.45  4.25  6  57  2.13  2.25  229 
8  3.35  3.87  10  58  2.13  1.77  486 
9  3.33  3.20  35  59  2.13  1.44  785 
10  3.33  3.77  13  60  2.12  2.14  273 
11  3.25  3.42  25  61  2.11  1.48  744 
12  3.16  2.18  260  62  2.10  1.80  453 
13  3.14  4.54  3  63  2.09  2.60  114 
14  3.10  2.87  65  64  2.09  2.05  312 
15  3.08  3.54  17  65  2.09  2.70  96 
16  3.07  2.80  80  66  2.09  2.23  237 
17  3.06  3.49  20  67  2.08  2.34  188 
18  3.02  2.29  213  68  2.08  2.24  232 
19  2.99  3.34  27  69  2.06  2.53  130 
20  2.93  3.13  38  70  2.04  2.11  283 
21  2.92  2.92  57  71  2.04  2.95  54 
22  2.86  3.26  31  72  2.03  3.08  40 
23  2.83  2.82  74  73  2.02  2.30  210 
24  2.82  2.37  180  74  2.01  3.67  15 
25  2.81  2.13  276  75  2.00  2.62  109 
26  2.81  3.48  21  76  1.98  2.38  171 
27  2.79  3.01  47  77  1.98  1.43  795 
28  2.70  2.87  64  78  1.96  1.69  549 
29  2.66  3.15  37  79  1.95  1.47  746 
30  2.58  3.85  11  80  1.95  1.95  361 
31  2.56  2.84  71  81  1.95  2.81  77 
32  2.55  1.72  524  82  1.94  1.41  813 
33  2.54  2.63  106  83  1.94  3.40  26 
34  2.54  2.69  98  84  1.94  1.30  948 
35  2.53  2.30  209  85  1.94  3.27  30 
36  2.48  2.45  148  86  1.93  1.11  1190 
37  2.47  2.29  212  87  1.93  1.37  872 
38  2.46  3.21  33  88  1.93  3.44  24 
39  2.43  2.44  154  89  1.92  3.07  41 
40  2.43  2.71  94  90  1.92  1.50  726 
41  2.40  2.86  66  91  1.90  3.62  16 
42  2.34  2.60  115  92  1.90  2.82  75 
43  2.34  2.98  50  93  1.89  1.25  1007 
44  2.33  3.80  12  94  1.87  3.89  8 
45  2.32  2.06  306  95  1.86  3.49  19 
46  2.29  1.81  444  96  1.86  2.08  300 
47  2.27  1.17  1110  97  1.85  1.20  1074 
48  2.26  1.97  347  98  1.83  2.90  60 
49  2.24  3.75  14  99  1.83  1.39  833 
50  2.20  3.88  9  100  1.82  1.95  367 
Interestingly, with a smaller , SAM outperforms the other two techniques. This is not entirely surprising as a smaller can make the noise in the per gene pooled variance (and possibly the equivalent quantity in the EDGE algorithm) more prominent. Nevertheless, SAM does mitigate this issue in some measure by using the exchangeability factor to adjust the effective pooled variance (Tusher et al., 2001).
4.3 Discussion
By allowing researchers to examine the simultaneous expressions of enormous numbers of genes, microarrays promised to revolutionize the understanding of complex diseases and usher in an era of personalized medicine. However, the shift in perception of that promise is palpable in the literature. A 1999 Nature Genetics article (Lander, 1999) is entitled “Array of hope,” but a 2005 Nature Reviews article (Frantz, 2005) is entitled “An array of problems.” It is not unusual for impacts of new technologies to be overestimated when first deployed, then to have the expectations moderated as the technologies reveal new complexities in the problems they are designed to solve. In the study of microarray data, the need for exceeding care in the design and regularization of experiments and data collection are understood to be critical, but the biggest hindrance to progress has been the data interpretation. In particular, as Efron (2007) and Owen (2005) point out, the biggest challenge seems to be the treatment of intrinsic intergene correlation.
In most microarray data there are at least three vital resources: (i) identifiability (ii) immense parallel structure, and (iii) intergene correlation itself. Thoughtful analyses in the papers by Efron (Efron, 2005, 2000) have suggested this view of the rich information structure inherent in the data. In this light, Tellipsoid can be viewed as exploiting more than correlation as a means of sharing information across tests, as it also involves identifiability.
A crucial step in formulating Tellipsoid was the comprehension of the effects of intergene correlation on . In light of Observations 1–3, the choice of the Mahalanobis distance was intuitive, as it is already known to give computationally attractive solutions through the matrix inversion lemma.
Limited time and resources – and perhaps also the necessity for scientific focus – often require biomedical researchers to work on only a small number of “hot (gene) prospects.” Even under such highly conservative conditions, however, misleading results can occur, as evident in the results of Figs. 1–4. For all their careful development and statistical power, even stateoftheart tools like EDGE and SAM can report spurious gene lists. The extra statistical power of Tellipsoid promises to further guard against anomalous results that can have serious consequences for the trajectory of a study of gene function, causation, and interaction.
5 Conclusion
This paper has reported the development and testing of a novel framework for the detection of differential gene expression. The framework combines the exploitation of intergene correlation to share information across tests, with identifiability – the fact that in most microarray data sets, a large proportion of genes can be identified a priori as nondifferential. When applied to the widely used twosample statistic approach, this viewpoint yielded an elegant differential analysis technique, Tellipsoid, which requires as inputs only a gene expression matrix, related twosample labels, and the size of desired genelist . Tellipsoid was tested on the prostate cancer data of Singh et al. (2002) and some simulated data. Compared to SAM (Tusher et al., 2001), EDGE (Storey et al., 2007), and the raw statistic approach itself, Tellipsoid shows substantial improvement in statistical power. Usually, with increase in microarray samples, power tends to increase considerably, but, even for small sample sizes, Tellipsoid’s performance improvement is noticeable. The software (coded in MATLAB and in R) and test data sets are available at www.egr.msu.edu/~desaikey.
Funding
KD is supported by a graduate research fellowship from the Quantitative Biology Initiative at Michigan State University.
Acknowledgement
We are grateful to the anonymous reviewers of an earlier publication (Desai et al., 2007) for their insightful remarks that seeded this work. We also appreciate the support of the Michigan State University High Performance Computing Center.
Appendix
Suppose that
(A1) 
We also have . Substituting these in Eqn. 2.3 yields:
(A2) 
In Eqn. A2, by setting the gradient w.r.t to 0, we obtain:
(A3) 
Now for , we can appeal to the matrix inversion lemma (Golub and Van Loan, 1996):
where . Plugging this in Eqn. A3 yields:
(A4) 
Combining Eqn. A4 with Eqn. A1 provides the desired expression:
References
 Benjamini and Hochberg (1995) Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological, 57(1):289–300, 1995.
 Cui et al. (2005) X. Cui, J.T.G. Hwang, J. Qiu, N.J. Blades, and G.A. Churchill. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics, 6(1):59–75, 2005.
 Deller Jr et al. (1993) J.R. Deller Jr, J.G. Proakis, and J.H. Hansen. Discrete Time Processing of Speech Signals. Prentice Hall PTR Upper Saddle River, NJ, USA, 1993.
 Desai et al. (2007) K. Desai, J.R. Deller, Jr., and J.J. McCormick. The distribution of the number of false discoveries in highly correlated dna microarray data. Submitted to the Annals of Applied Statstics, preprint at http://www.egr.msu.edu/~desaikey, 2007.
 Devijver and Kittler (1982) P.A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Englewood Cliffs, New Jersey etc, 1982.
 Dudoit et al. (2002) S. Dudoit, Y.H. Yang, M.J. Callow, and T.P. Speed. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12(1):111–139, 2002.
 Efron (2000) B. Efron. RA Fisher in the 21st Century. Statistics for the 21st Century: Methodologies for Applications of the Future, 1:09, 2000.
 Efron (2004) B. Efron. LargeScale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. Journal of the American Statistical Association, 99(465):96–105, 2004.
 Efron (2005) B. Efron. Bayesians, Frequentists, and Scientists. Journal of the American Statistical Association, 100(469):1–6, 2005.
 Efron (2006) B. Efron. Microarrays, Empirical Bayes, and the TwoGroups Model. Preprint, Dept. of Statistics, Stanford University, 2006.
 Efron (2007) B. Efron. Correlation and LargeScale Simultaneous Significance Testing. Journal of the American Statistical Association, 102(477):93–103, 2007.
 Efron et al. (2001) B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher. Empirical Bayes Analysis of a Microarray Experiment. Journal of the American Statistical Association, 96(456):1151–1161, 2001.
 Frantz (2005) S. Frantz. An array of problems. Nat Rev Drug Discov, 4(5):362–3, 2005.
 Golub and Van Loan (1996) G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
 Hedenfalk et al. (2001) I. Hedenfalk, D. Duggan, Y. Chen, et al. GeneExpression Profiles in Hereditary Breast Cancer. New England Journal of Medicine, 344(8):539–548, 2001.
 Irizarry et al. (2003) R.A. Irizarry, B.M. Bolstad, F. Collin, L.M. Cope, B. Hobbs, and T.P. Speed. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research, 31(4):e15, 2003.
 Kerr et al. (2000) M.K. Kerr, M. Martin, and G.A. Churchill. Analysis of Variance for Gene Expression Microarray Data. Journal of Computational Biology, 7(6):819–837, 2000.
 Lander (1999) E.S. Lander. Array of hope. Nature Genetics, 21(1):3–4, 1999.
 Langaas et al. (2005) M. Langaas, E. Ferkingstad, and B.H. Lindqvist. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J Roy Stat Soc, Series B, 67:555–72, 2005.
 Lee et al. (2000) M.L.T. Lee, F.C. Kuo, GA Whitmore, and J. Sklar. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proceedings of the National Academy of Sciences of the United States of America, 97(18):9834, 2000.
 Leek et al. (2006) J.T. Leek, E. Monsen, A.R. Dabney, and J.D. Storey. EDGE: extraction and analysis of differential gene expression, 2006.
 Lonnstedt and Speed (2002) I. Lonnstedt and T. Speed. Replicated microarray data. Statistica Sinica, 12(1):31–46, 2002.
 Mahalanobis (1936) P.C. Mahalanobis. On the generalized distance in statistics. Proc Natl Inst Sci India, 2(1):49–55, 1936.
 Newton et al. (2001) MA Newton, CM Kendziorski, CS Richmond, F.R. Blattner, and KW Tsui. On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data. Journal of Computational Biology, 8(1):37–52, 2001.
 Owen (2005) A.B. Owen. Variance of the number of false discoveries. Journal of the Royal Statistical Society Series B, 67(3):411–426, 2005.
 Pawitan et al. (2005) Y. Pawitan, K.R.K. Murthy, S. Michiels, A. Ploner, and O. Journals. Bias in the estimation of false discovery rate in microarray studies. Bioinformatics, 21(20):3865–3872, 2005.
 Singh et al. (2002) D. Singh, P.G. Febbo, K. Ross, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002.
 Storey (2002) J.D. Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B(Statistical Methodology), 64(3):479–498, 2002.
 Storey et al. (2007) J.D. Storey, J.Y. Dai, and J.T. Leek. The optimal discovery procedure for largescale significance testing, with applications to comparative microarray experiments. Biostatistics, 8(2):414, 2007.
 Tibshirani and Wasserman (2006) R. Tibshirani and L. Wasserman. Correlationsharing for detection of differential gene expression. Arxiv preprint math.ST/0608061, 2006.
 Tsai et al. (2003) C.A. Tsai, Y.J. Chen, J.J. Chen, and O. Journals. Testing for differentially expressed genes with microarray data. Nucleic Acids Research, 31(9):e52, 2003.
 Tusher et al. (2001) V. Tusher, R. Tibshirani, and C Chu. Significance analysis of microarrays applied to ionizing radiation response. Proceedings of the National Academy of Sciences, 98:5116–5121, 2001.