# Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics\thanksrefT1

## Abstract

Testing of the disjunction hypothesis is appropriate when each gene or location studied is associated with multiple -values, each of which is of individual interest. This can occur when more than one aspect of an underlying process is measured. For example, cancer researchers may hope to detect genes that are both differentially expressed on a transcriptomic level and show evidence of copy number aberration. Currently used methods of -value combination for this setting are overly conservative, resulting in very low power for detection. In this work, we introduce a method to test the disjunction hypothesis by using cumulative areas from the Voronoi diagram of two-dimensional vectors of -values. Our method offers much improved power over existing methods, even in challenging situations, while maintaining appropriate error control. We apply the approach to data from two published studies: the first aims to detect periodic genes of the organism Schizosaccharomyces pombe, and the second aims to identify genes associated with prostate cancer.

10.1214/13-AOAS707 \volume8 \issue2 2014 \firstpage801 \lastpage823

Simultaneous inference with bivariate data \thankstextT1Supported from the National Science Foundation Grant (Award Number ABI-1262538).

and

Multiple testing \kwdfalse discovery rates \kwdVoronoi tessellations \kwdempirical null distributions

## 1 Introduction

In current genetics and biological research, frequently thousands of hypothesis tests are performed simultaneously. With such a large number of tests, control of the family-wise error rate (FWER) is overly conservative, resulting in low power for detection of true alternative signals. For this reason, False Discovery Rate (FDR) control is an intensely studied topic of interest. Methods such as the Benjamini–Hochberg (B–H) procedure [Benjamini and Hochberg (1995)] control FDR when each hypothesis test is associated with a single -value. Many refinements have been proposed to increase power [Ghosh (2011); Benjamini and Hochberg (2000); Benjamini, Krieger and Yekutieli (2006); Storey (2002)]. Pounds (2006) gives a good summary of these approaches. However, when multiple -values are associated with each hypothesis, these methods alone are insufficient for declaring significance while controlling FDR.

Multiple -values can be considered for each hypothesis test in settings such as repeated experiments in the same study, meta-analysis across multiple studies and measurement of multiple aspects of a single underlying process of interest. To perform a single hypothesis test using multiple -values, techniques for -value combination are frequently used to create a single summarized value. Multiple comparison adjustments can then be applied to these summarized values.

With this move from single -values to vectors of -values, which we refer to here as -vectors, clear specification of the null and alternative hypotheses is critical. If the goal is to pool information, for example, when testing -vectors from repeated experiments or from multiple studies, then conjunction or partial conjunction hypotheses are appropriate. The conjunction null hypothesis is that all -values contained in a -vector are from a null distribution, and rejection is possible when at least one -value shows evidence of being from a nonnull distribution. Rejection of the partial conjunction hypothesis requires at least of -values to show evidence of being from nonnull distributions. There are scenarios, however, when the hypothesis associated with each -value of the -vector is of interest individually, and rejection should be possible only when there is evidence that all such hypotheses are nonnull. In this case, the disjunction hypothesis is of primary interest. Distinctions between the conjunction, partial conjunction and disjunction hypotheses are further described in Section 2 of this paper.

Testing of the disjunction hypothesis is appropriate when multiple aspects of a single underlying biological process are measured. For example, there is interest in detection of genes related to cancer progression that are both differentially expressed on a transcriptomic level and show evidence of copy number aberrations in cancerous tissue [Kim et al. (2007); Tsafrir et al. (2006); Fritz et al. (2002); Pollack et al. (2002); Tonon et al. (2005)]. Another motivating example is detection of periodic genes as explored by de Lichtenberg et al. (2005). In this case, the disjunction hypothesis is considered through one -value for periodicity and a second for regulation of expression. The most commonly used summary method for the disjunction hypothesis uses the maximum of all -values for each test [Wilkinson (1951)] and typically has very low power.

This paper presents an approach for -value combination appropriate for testing the disjunction hypothesis when there are two -values associated with each gene or location. The approach considers -vectors as locations on the unit square, where certain challenges absent in the case of single -values arise. First, the strict ordering of -values on the real line is lost. Second, relationships between -vectors are complicated and, third, their components may be correlated. In light of these challenges, a method for large-scale simultaneous testing of the disjunction hypothesis must accomplish three objectives. It must account for the relative positioning of the -vectors in the plane, allow for multiple ordering schemes and, finally, allow for FDR control under any correlation structure of the test statistics used to calculate the -vectors’ components.

The approach proposed here addresses these challenges through the use of Voronoi diagrams, flexible incorporation of ordering schemes and empirical null distributions [Efron (2004)]. This paper is organized as follows. Section 2 details the disjunction hypothesis framework while Section 3 describes background on control of FDR and existing univariate procedures. Sections 4 and 5 introduce Voronoi diagrams and multiple ordering schemes for -vectors. Section 6 describes a technique for summarizing -vectors to a single value and details how these values can be used to control FDR. We explore properties of the procedure through simulations in Section 7. A possible extension for higher dimensional -vectors is discussed in Section 8. We apply the procedure to two genomic studies in Sections 9 and 10.

## 2 The disjunction of null hypotheses

When multiple -values are associated with a hypothesis test, the interpretation of significance depends on the specification of null and alternative hypotheses. Consider -vectors, each of length , denoted

(1) |

In the context of a genomic study, is the index of the individual
gene, while is the number of -values associated with each gene. We
employ notation used by Benjamini and Heller (2008) to describe null and
alternative hypotheses. Testing the global null hypothesis, also known
as the conjunction of the null hypotheses, is equivalent to testing
that *at least one* of the -values is significant:

-value combination methods for testing the conjunction null include the well-known Fisher’s and Stouffer’s methods for combining -values [Fisher (1932); Stouffer et al. (1949)]. A comparison of these and other methods is presented by Loughin (2004). Rejection of the conjunction null can result from the influence of a single highly significant -value even when all other -values show no evidence for the alternative hypothesis. In this setting, the scientific conclusion from rejection is not as strong it would be if a level of increased consistency across -values was enforced.

Fail to reject | Reject | Total | |
---|---|---|---|

True null | |||

True alternative | |||

Total |

Benjamini and Heller proposed techniques for addressing this weakness through testing of the partial conjunction hypothesis:

This hypothesis requires a level of consistency of evidence across studies that is unnecessary in the conjunction framework, while still allowing lack of significance for some associated -values. It can be considered a compromise between the conjunction and disjunction hypotheses. The disjunction hypothesis is also referred to as the disjunction of the null hypotheses and can be expressed as follows:

This hypothesis is desirable when considering multiple -values per test that are each of individual interest. The established -value combination approach for testing the disjunction hypothesis is to simply select the maximum -value of each -vector [Wilkinson (1951)]. Error control procedures can then be applied to these maximum values. This approach is generally conservative and exhibits low power. The procedure described in this paper is suitable for testing the disjunction hypothesis and results in a gain of power over the maximum method of the -value combination.

## 3 The False Discovery Rate and review of existing procedures

To define FDR we consider Table 1. The False
Discovery Rate is defined to be the *expected proportion of false
rejections to total rejections*, or for , and
0 if . The FWER is defined to be the *probability of at least
one false rejection*, . Particularly in studies testing
thousands or tens of thousands of hypotheses simultaneously, control of
FDR grants additional power for rejection relative to control of FWER.
Allowance of a controlled proportion of false positives enables
increased detection of more true signals relative to limiting
rejections based on the probability of at least one false rejection as
required by FWER control.

We next provide a brief review of two existing procedures that control FDR when each hypothesis test has exactly one -value: the Benjamini–Hochberg (B–H) procedure and the Generalized Benjamini–Hochberg procedure. The latter motivates our approach to -value combination for two-dimensional -vectors.

### 3.1 The Benjamini–Hochberg procedure and its generalization

Proposed by Benjamini and Hochberg (1995), the B–H procedure works as follows. Assume continuous -values and that low values indicate evidence against the null. Order them as and compare to the thresholds . Define

(2) |

If the set in (2) is nonempty, reject the hypotheses associated with through , otherwise reject nothing. Benjamini and Hochberg show that this procedure controls the FDR at a nominal level .

Ghosh (2011) proposed a family of testing procedures based on the spacings of -values. We present the basic procedure here. Again assume independent -values and order them as . Define the spacings [Pyke (1965)] as

(3) |

where and . Under the null hypothesis the original -values are distributed as , whence the spacings are marginally distributed as . It is also simple to calculate certain expectations. Specifically, .

Recall that the B–H procedure defines by comparing to . These quantities can be redefined as

(4) |

By substituting these quantities in the B–H procedure, the original can be rewritten in terms of the spacings as follows [Ghosh (2011; 2012)]:

(5) |

According to Benjamini and Hochberg, this procedure controls FDR at, where is the number of -values associated with the null hypothesis. In equation (5) there is an extra factor . Elimination of this factor results in a slightly more conservative procedure that preserves FDR control. Thus, is defined as

(6) |

The definition of in (6) hinges on a comparison of the average spacings between ordered -values to the value . When there are numerous significant -values their spacings are small in comparison to the expectation of a spacing under the null hypothesis. Detecting this change from the small spacings of significant -values to the larger spacings of null -values is also the motivation for the procedure described in this paper.

## 4 The Voronoi diagram

In the plane the spacings of two-dimensional -vectors are more difficult to characterize. One generalization of “spacing” is the Voronoi diagram. It is a partition of the plane generated by an input set of points that creates a cell around each input consisting of the set of all points closer to that input than to any other. The basic properties are described by Okabe et al. (2010).

In the setting of two-dimensional -vectors, the Voronoi diagram partitions the unit square. For each -vector, , the diagram creates a cell, , consisting of all points closer to than to any other -vector. As Jiménez and Yukich (2002) discuss, Voronoi diagrams are suitable for extension of the concept of one-dimensional spacings into higher dimensions. An illustration of the diagram for a sample set of points is presented in Figure 1(b). We follow this example in Section 5 before switching to a larger sample for subsequent sections.

Voronoi cells have many desirable properties that extend spacings to the plane. Their area and shape reflect the relative positioning of the input points. For example, clusters of inputs will have smaller cell areas than uniformly distributed inputs. Similarly, if the inputs have correlated components, there will be an increased concentration of -vectors along the diagonal of the unit square. The Voronoi cells for the inputs along this diagonal will be smaller than those near the edge of the clustering. Our procedure uses the areas of the Voronoi cells generated by the set of -vectors to account for their relative positions. To compute each diagram and calculate the cell areas, we use the R package deldir developed by Turner (2013) and available through CRAN (http://cran.r-project.org/web/packages/deldir/index.html).

## 5 Multiple ordering schemes

Recall from Section 3.1 that spacings were defined as the difference between consecutive -values. This definition is dependent upon an ordering of the -values that is unique on the real line. However, this uniqueness is lost when -vectors are considered as bivariate locations in the unit square. We present and test multiple ordering schemes for the plane, while continuing to assume that small values of and indicate evidence against the null. For this reason the orderings begin at the origin, and each -vector is ranked according to increasing values of , its distance from the origin. Each scheme defines differently. Thus, for each definition is the -vector with the smallest value of , and with the largest. Here we describe the definition of for each scheme: {longlist}[1.] 1. Euclidean ordering results in a movement from the origin in contours with the shape of circles. Define as the Euclidean distance from the origin (7)

Maximum ordering results in contours with the shape of squares. Define as

(8) |

Summation ordering is equivalent to beginning at the origin and moving out in contours of right isosceles triangles. In this case, is defined as

(9) |

de Lichtenberg ordering is a ranking scheme proposed by de Lichtenberg et al. (2005). The scheme defines

(10) |

Note that consists of four multiplicative factors. The first two weight according to the value of each individual component, and the last two penalize -vectors that have only one very small component. For typical -vectors the values for are very large as a result of division by 0.001 of both and . This magnitude is not a concern, as the interest is only in their relative values for the purpose of ranking and the values themselves are not of particular interest. The contour lines for this ordering scheme move from the origin in lines approximating an inverse function such as .

Figure 2 illustrates these four ordering schemes using the sample set of 200 -vectors from Section 4. Table 2 presents a numerical example using five -vectors.

0.99 | (3) | 0.85 | (3) | 1.36 | (4) | (4) | ||

1.21 | (5) | 0.91 | (4) | 1.71 | (5) | (5) | ||

1.00 | (4) | 0.97 | (5) | 1.20 | (3) | (3) | ||

0.71 | (2) | 0.62 | (1) | 0.96 | (2) | (2) | ||

0.63 | (1) | 0.63 | (2) | 0.79 | (1) | (1) |

It is noteworthy that three of the four ranking schemes described have concave contour lines: Euclidean, Maximum and Summation. The remaining scheme, de Lichtenberg, has convex contour lines. As we will see, these characteristics have important implications for error control.

## 6 Summarizing -vectors and declaring significance

The described ranking schemes can be combined with Voronoi cell areas to summarize each ranked -vector as a single value in the interval . Define as the area of the Voronoi cell associated with the ordered -vector and the cumulative sum of these ordered areas as

(11) |

These cumulative sums serve as combined -values in an analogous manner that cumulative spacings comprise -values in one dimension. These reflect both the relative positioning of the -vectors in space and their distance from the origin. They can be used to make decisions in the hypothesis testing framework. Figure 3 illustrates a sample set of 1000 -vectors and a histogram of their cumulative areas. In this example the components of the -vectors are independent, and 10% of -vectors are associated with an alternative hypothesis.

Euclidean | Maximum | Summation | de Lichtenberg | Existing procedure | |
---|---|---|---|---|---|

0.200 | 0.189 | 0.216 | 0.205 | 0.005 | |

0.772 | 0.760 | 0.788 | 0.796 | 0.098 | |

0.976 | 0.975 | 0.977 | 0.979 | 0.744 |

### 6.1 Multiple hypothesis testing under independence

When the components of the -vectors are assumed to be independent, standard multiple comparisons procedures such as B–H can be applied to the cumulative spacings with very good results. Simulation studies were performed to test the properties of FDR control and the power of this method. For each study 100 sets of test statistics were generated according to

10% of statistics were associated with an alternative hypothesis (), while the remaining 90% were null (). -vectors were formed from 2-sided -values according to

for . The proposed method was applied to each data set, using the four described ordering schemes from Section 5. Additionally, the existing -value combination technique based on the maximum was applied. After applying the B–H procedure to each set of summary values, the FDR and 1-non discovery rate (NDR) was recorded. Using notation from Table 1, 1-NDR is defined as . This quantity can be viewed as a measure of power.

Tables 3 and 4 summarize the results for studies where , respectively. The results show that, under all ordering schemes, the proposed combination method results in greatly increased power. All concave schemes (Euclidean, Maximum and Summation) control FDR at the desired level , but the convex de Lichtenberg scheme does not. This difference becomes more pronounced when -vectors with correlated components are considered. Additional simulations using correlated test statistics show that application of the B–H procedure to combined values is insufficient to control FDR when the correlation between test statistics surpasses 0.2. This loss of FDR control is a result of the increased concentration of -vectors along the diagonal of the unit square under correlation, which changes the characteristics of the cumulative areas. In Section 6.2 we discuss approaches appropriate for multiple testing in these conditions.

Euclidean | Maximum | Summation | de Lichtenberg | Existing procedure | |
---|---|---|---|---|---|

0.041 | 0.037 | 0.048 | 0.041 | 0.000 | |

0.042 | 0.038 | 0.049 | 0.056 | 0.000 | |

0.042 | 0.040 | 0.045 | 0.053 | 0.000 |

### 6.2 Multiple hypothesis testing under dependence

In certain settings the individual components of -vectors may be correlated. For example, correlation between components may occur when different but related aspects of an underlying biological process are measured. Any technique used for testing the disjunction hypothesis in this setting should be robust to this structure. Using an empirical null approach [Efron (2004; 2007)] in the place of the B–H procedure results in FDR control for all positive correlation structures, although the trade-off is decreased power in the case of independent components.

The use of an empirical null for determining statistical significance was proposed by Efron (2004). We consider a transformation of the summarized cumulative areas as defined in (11):

(12) |

where is the cumulative distribution function for the standard normal random variable. In the empirical null framework, these transformed values are assumed to be from a mixture distribution

(13) |

where is the null, or “uninteresting” distribution, and is the alternative, or “interesting” distribution. Under the theoretical null hypothesis, is the distribution, however, in large-scale multiple testing problems the majority of values may behave differently. When this is the case, use of an empirically determined in place of has important implications for the resulting inference. For example, if is estimated to be , then inference based on the assumptions of would result in elevated type I error. Similarly, if the empirical null has smaller variance, then its use results in a gain of power without sacrificing type I error control.

Figure 4 illustrates the effect of the transformation from to for a sample set of 1000 -vectors. The transformation makes it much easier to detect deviations from the null hypothesis, as true alternative hypotheses are presented as a second, smaller peak to the left of the null distribution instead of in a single spike for the original values. The bivariate test statistics used to calculate these -vectors had a correlation of 0.7. The histogram of transformed values in Figure 4(c) shows evidence of a null distribution that differs from , as the dependence structure of the -vectors results in thicker tails than the theoretical null predicts. For this reason, it is desirable to use an empirical null as a basis for our inference when the components of the -vectors show evidence of correlation.

Given the choice of an empirical null, there are several options for
declaring significance while controlling FDR. Efron (2004) defined
the *local false discovery rate* (fdr) as the posterior
probability of a value being from the null distribution, given the
value of :

(14) |

Another important probability that can be estimated using an empirical null is the left-tail False Discovery Rate. For each value the corresponding left-tail FDR is defined as

(15) |

Inference can be made based on estimated fdr or FDR values. A variety of approaches have been developed for estimation of empirical null distributions and related values [Muralidharan (2010); Efron (2004); Pounds and Morris (2003); Strimmer (2008); Jin and Cai (2007)]. We use an R package (mixFdr) developed by Muralidharan (2010) and available through CRAN (http://cran.r-project.org/web/packages/mixfdr), which uses an empirical Bayes mixture method to fit an empirical null, estimate effect sizes, fdr and FDR. The use of other packages or techniques is certainly possible. The function we used in simulation studies, mixFdr, includes two tuning parameters: , the number of distributions to be estimated, and , a penalization parameter. A higher value of encourages estimation of a larger null group and closer estimation of the central peak.

Careful calibration of and , and even experimentation with other techniques for empirical null estimation, are desirable when a single data set is under consideration. The function mixFdr estimates the left-tail False Discovery Rate for each , and we declare significant all -vectors with these estimates of left-tail FDR less than 0.05. This approach results in appropriate error control that is robust to correlation in the components of -vectors. Section 7 describes a simulation study performed to explore properties of power and FDR control when this approach is applied to the cumulative areas from ordered -vectors.

## 7 Simulation study when components are correlated

To illustrate properties of the procedure, we ran three simulation studies: one each for strong, moderate and weak alternative signals. We were interested in evaluating FDR control and power. For each simulated data set we set and generated test statistics by

for strong, moderate and weak alternative signals , respectively. For null test statistics, . 10% of test statistics for each data set were generated from the alternative distribution. -vectors were formed from 2-sided -values.

For each simulation study varied from 0 to 0.8 in increments of 0.1, and 100 data sets were simulated for each value of . We performed the procedure using all four ordering schemes on each simulated data set, using mixFdr to estimate empirical null distributions and left-tail FDR for each data set, rejecting all hypotheses associated with -vectors whose estimated left-tail FDR was less than 0.05. We set for all data sets, and after calibrating the fit of several example empirical nulls for weak, moderate and strong signals, we set , 800 and 1000 for the respective simulations. Additionally, the B–H procedure was applied to the maximum values from each -vector to compare the proposed method to an existing approach. Figure 5 summarizes the results of all three simulation studies. We include the R code used to perform these simulations as a supplementary file [Phillips and Ghosh (2014a)].

The proposed technique for combining -values has improved power when compared to the existing procedure. This improvement is greatest for weak and moderate alternatives. These simulations further show that ordering scheme matters, although the differences between the three convex (Euclidean, Maximum, Summation) ordering schemes are small compared to the difference between them and the de Lichtenberg ordering. The de Lichtenberg ordering displays characteristics that differ considerably from the other three. Specifically, it shows a tendency to lose FDR control as the components of the -vectors become more correlated. We present simulation results for this ordering for the sake of completeness, but we do not recommend using it for data analysis when testing the disjunction null. The nature of its contour lines suggests this ordering scheme is in fact more appropriate for testing the conjunction or partial conjunction hypothesis, as these lines resemble the contour lines for Fisher’s or Stouffer’s -value combination techniques from Figure 1 of Owen (2009).

Further simulations using the empirical null allowed evaluation of the proposed approach in the presence of -vectors constructed from test statistics with means () or ). These -vectors should not be found significant in the disjunction setting, but should be under the conjunction framework. In the presence of up to 20% of such vectors, and for correlation structures from to , FDR control was maintained below the nominal 0.05 level, and power properties showed a clear advantage over existing methods. Detailed descriptions and results are included in supplementary materials [Phillips and Ghosh (2014b)].

## 8 Extension to higher dimensions

The approach described in this paper is suitable when there are two -values associated with each hypothesis test, however, in many situations three or more -values will be available. In theory, the procedure can be extended to higher dimensions by replacing cumulative areas with cumulative volumes, hypervolumes, etc. In practice, however, the computation complexity for Voronoi cells increases quickly with dimension. Average time complexity is as low as in the plane, but is at least in 3-space [Okabe et al. (2010)]. To avoid this disadvantage, we consider an alternative extension using the sets of all possible pairs of components. Consider a set of 3-dimensional -vectors:

(16) |

Then define three sets of 2-dimensional -vectors constructed via a pairwise combination of components of :

(17) |

For each of these sets of two-dimensional -vectors the Voronoi diagram is computed and cell areas saved. Thus, each -vector is associated with three individual cell areas, and , as well as an average area . This average area can then be used in conjunction with an ordering scheme to create the summarized areas used for inference. Define to be the -vectors ranked according to a specified ordering scheme such as Euclidean distance from the origin, and to be the corresponding average areas. Then the cumulative average areas are defined as .

Multiple testing can then be performed on these summarized cumulative average areas using the methods described in Sections 6.1 and 6.2. Further investigation into the properties of this approach is necessary, as well as research on other possible extensions for higher dimensions. A preliminary simulation study using three-dimensional -vectors with independent components was conducted with weak, moderate and strong alternative test statistics. For each data set, test statistics were generated according to

Three-dimensional -vectors were formed from 2-sided -values. The resulting -vectors were ordered according to Euclidean distance from the origin. Hypothesis testing was performed using the B–H procedure on the summarized cumulative average areas. The existing technique of applying the B–H procedure to the set of maximum -values from each -vector was also performed for comparison. Table 5 summarizes the findings of the simulations.

Proposed extension | Existing approach | |||||
---|---|---|---|---|---|---|

FDR | 0.023 | 0.004 | 0.023 | 0.000 | 0.000 | 0.000 |

1-NDR | 0.098 | 0.730 | 0.986 | 0.005 | 0.007 | 0.610 |

## 9 Application to Schizosaccaromyces pombe data

In 2004 and 2005, three papers were published investigating the periodicity of genes in the fission yeast cell Schizosaccharomyces pombe. Specifically, Oliva et al. (2005) produced three data sets including time points for three complete cell cycles using two different synchronization techniques. In their paper they identified 750 genes determined to be periodically expressed based on a ranking scheme and cutoff. We apply our approach to test for periodicity in a hypothesis testing framework using Fisher’s exact G statistic to measure evidence of periodicity.

### 9.1 The data

Three microarray data sets from Schizosaccharomyces pombe from Oliva et al. were used: Elutriation a, Elutriation b and Cdc25. The first two were produced using Elutriation synchronization, and the last using a Cdc25 block-release synchronization technique. We apply our technique on the two Elutriation sets, using Fisher’s exact G statistic to calculate the -vector for each gene. This test statistic requires evenly spaced time points, necessitating omission of any measurements that occur at uneven intervals. The Elutriation a data set includes 50 time points, however, only 33 are at regular intervals of 8 minutes. For Elutriation b, many of the time points are technical repeats. We keep the first measurement in each case, leaving 31 evenly spaced time points for each gene taken at intervals of 10 minutes. The Cdc25 data set has a total of 51 evenly spaced time points, taken at intervals of 15 minutes. Only genes with complete measurements for all selected time points are considered.

Elutriation a | Elutriation b | Cdc25 | |

Complete genes | 3050 | 2394 | 3724 |

Evenly spaced time points | 33 | 31 | 51 |

Genes (%) with -values 0.05, | 868 (22.8%) | 546 (26.4%) | 2458 (66.0%) |

Significant genes (%) using B–H | 527 (17.3%) | 155 (6.5%) | 2252 (60.5%) |

### 9.2 Results using existing procedures

The data show evidence of widespread periodicity. Considered separately, Elutriation a, Elutriation b and Cdc25 have 22.8%, 26.4% and 66% of -values less than 0.05. Even controlling FDR using the B–H procedure on each set independently results in a very high rate of rejection. Table 6 presents a summary of the data and marginal analysis of all three data sets. Figure 6 presents histograms of the -values when the data sets are considered independently.

Consider the -vectors formed using -values generated by Elutriation a data and Elutriation b data. Note that these two Elutriation data sets were both generated using the same synchronization technique, and the -values generated by each repetition have roughly comparable marginal distributions. To test the disjunction hypothesis for Elutriation a and Elutriation b, using an existing technique the maximum -value for each gene is preserved. The B–H procedure is then applied to these maximum values. The resulting number of rejections is 15, which is surprisingly low. Figure 7(a) helps to explain this result. The -vectors’ components do not show evidence of correlation, thus considering only the maximum of each -vector’s components gives a distribution that is very different from either of the marginal distributions. Our proposed approach uses information from both -values and gives a different result.

### 9.3 Results using Voronoi -value combination on Elutriation data

We apply our -value combination method using the Euclidean, Maximum and Summation ordering schemes to the -vectors formed from the Elutriation a and b experiments. The -vectors are plotted in Figure 7(a). The components of the -vectors do not show evidence of high correlation, and we apply the B–H procedure to the cumulative areas generated from each ordering scheme. This application results in 225, 213 and 249 rejections of the disjunction hypothesis using Euclidean, Maximum and Summation orderings, respectively.

Application of an empirical null approach to the combined areas yields a very different result. Because of the high amount of periodicity detected in the experiments, the empirical null is estimated to have a negative mean. This shift to the left of up to results in rejection of far fewer genes: 15, 12 and 11 for the three concave ordering schemes. These genes could be considered significantly more periodic than the rest, although other genes also show evidence of periodicity.

The two considerations of the combined values reflect two different scientific questions. By using the B–H procedure on the combined areas, the genes found are those that show significant periodic expression in both elutriation experiments. The genes found using the empirical null procedure are those genes that are significantly periodic in both experiments relative to the majority of genes. This distinction explains the difference in numbers of genes found significant.

### 9.4 Extension of procedure to include Cdc25

The extension to three dimensions described in Section 8 can be applied to the 3-dimensional -vectors formed from Oliva et al. data. We order the three-dimensional -vectors according to their Euclidean distance from the origin and calculate the three Voronoi cell areas associated with each -vector. From these we calculate each -vector’s average cell area and then cumulative average areas. Figure 7(c) presents a histogram of these values. Note that many of these cumulative areas are quite small as a result of the high number of very small -values from the Cdc25 experiment.

Application of the B–H procedure to the cumulative average areas formed using the Euclidean ordering scheme results in rejection of 165 disjunction hypotheses. These 165 genes are those that show significant evidence of periodic expression in all three of the experiments performed by Oliva et al. The existing procedure using the maximum values yields a mere 12 rejections for these experiments. Using an empirical null approach on the transformed cumulative average areas yields results similar to those discussed in Section 9.3. Because of the evidence of widespread periodicity throughout the experiment, only 8 genes show behavior that is significantly more periodic in comparison to the majority of genes when all three experiments are considered.

## 10 An application related to prostate cancer

Identification of genes implicated in cancer progression is a research topic of great interest. Several studies have shown interest in identifying genes that show both alterations in copy number and evidence of differential expression in cancerous tumors [Kim et al. (2007); Tsafrir et al. (2006); Fritz et al. (2002); Pollack et al. (2002); Tonon et al. (2005)]. We applied our method to data produced by Kim et al. in a study on prostate cancer progression. Data on copy number and gene expression was gathered for 7534 genes using prostate cell populations from low-grade and high-grade samples of cancerous tissue. For details on data acquisition and cleaning see the Kim et al. paper.

We calculated -statistics for genetic expression and copy number aberrations comparing tissue types. For each of 7534 genes we compute a two-dimensional -vector from the resulting 2-sided -values based on the -statistics. Figures 8(a) and 8(b) present histograms of the expression and copy number -values, while Figure 8(c) presents a representation of the resulting -vectors. Upon close inspection, it is revealed that the smallest copy number -values are much smaller than the smallest gene expression -values. Thus, application of the B–H procedure to copy number -values yields 62 significant genes, while application to the expression -values fails to yield any. It is unsurprising then that application of the B–H procedure to the set of all maximum -values for each gene also produces no significant results.

Using the Voronoi -value combination followed by the B–H procedure on the summarized values at gives 12, 14 and 25 rejections for Euclidean, Maximum and Summation orderings. Guided by the results of simulation, we consider the rejections made using the Summation ordering. Of these 25, four were mapped to official gene names, and all four were listed in the COSMIC database of cancer genes [Forbes et al. (2011)]. These four genes are CABLES2, PAK1IP1, CAMKV and TSHZ1. The COSMIC results suggest that there are mutations in these four genes that are found in a variety of cancers, thus strengthening the evidence of these genes being putative oncogenes in prostate cancer.

To use DAVID [Huang, Sherman and Lempicki (2008; 2009)] for further investigation of our results, a larger gene list was necessary. For this purpose, we performed the B–H procedure on the combined -values at . Under the summation ordering, this yielded 306 rejections, 102 of which could be mapped to recognized genes by DAVID. The functional annotation tool found significant enrichment (adjusted -value of 0.019) in the Fibrinolysis pathway. Fibrinolysis has been associated with prostate cancer for decades [Tagnon, Whitmore and Shulman (1952)]. Tumor classifications for different malignancies have been proposed based on the behavior of this pathway [Zacharski et al. (1992)]. Results for functional classification of the 102 genes are summarized in Table 7.

Number | Enrichment | |
---|---|---|

of genes | score | Keywords |

16 | 0.73 | Peptidase; Serine; Endopeptidase; Kringle |

11 | 0.58 | Transmembrane; Membrane; Extracellular; Cytoplasmic |

3 | 0.50 | Transport: protein, intracellular, vesicle; Golgi apparatus |

3 | 0.36 | Catabolic process; Proteosome; Proteolysis |

12 | 0.27 | Lumen: nuclear, intracellular, organelle, membrane-enclosed; |

Phosphoroprotein; nucleolus; ATP binding | ||

3 | 0.22 | GTP-binding; Nucleotide binding: guanyl, purine; |

Ribonucleotide binding: guanyl, purine | ||

20 | 0.12 | Transmembrane; Membrane; Glycoprotein; |

## 11 Discussion

In this paper we have presented a novel approach to -value combination for testing the disjunction hypothesis when two -values are considered for each test. The approach uses an extension of one-dimensional spacings, Voronoi cell areas, in combination with concave ordering schemes to define cumulative areas usitable for multiple testing techniques. When the majority of -vectors have independent components, techniques such as the B–H procedure can be directly applied. If the components are correlated, empirical null techniques are more suitable. Simulation studies showed that the approach has appropriate error control properties and results in a gain of power over the existing method. This increased power is of particular interest for detection of genes related to biological processes or implicated in cancer progression.

Four candidate ordering schemes were described, and simulations were used to test their performance in several settings. The concave up ordering proposed by de Lichtenberg et al. (2005) failed to control FDR in the paradigm of the disjunction hypothesis. As discussed in Section 6.1, we suspect that concavity of an ordering’s contour lines is vital to its FDR control characteristics. Specifically, as contours become increasingly concave down, the procedure is more conservative. The reverse applies when considering concave up schemes. For this reason, we recommend using the summation ordering in practice, as it represents the boundary case between concave up and down. This offers the least conservative, and thus most powerful, procedure that retains appropriate FDR control.

This approach can be extended in several meaningful directions. The conjunction or partial conjunction hypotheses could be tested by defining suitable ordering schemes such as the minimum, or product. Extension to higher dimensions is also of utmost interest, particularly considering the scale of current biological and genomic experiments. In Section 8 we described a potential extension to three or more dimensions, but further investigation of this and other techniques is necessary.

## Acknowledgments

The authors would like to thank the area Editor and an anonymous referee whose comments greatly improved the quality of this paper.

[id=suppA]
\snameSupplement A
\stitleSummarized results of additional simulation
studies

\slink[doi]10.1214/13-AOAS707SUPPA \sdatatype.pdf
\sfilenameaoas707_supp.pdf
\sdescriptionWe present the summarized results of the proposed procedure’s
performance in the challenging situations described in Section 7. Results include estimated FDR and 1-NDR for each of the
two settings.

[id=suppA]
\snameSupplement B
\stitleSupplementary code and data

\slink[doi,text=10.1214/13-AOAS707SUPPB]10.1214/13-AOAS707SUPPB \sdatatype.zip
\sfilenameaoas707_supp.zip
\sdescriptionR code including
the functions required to perform the procedure described in this
paper, to replicate the described simulation studies and to perform the
described data analysis. The relevant data sets are also included.

### References

- {barticle}[mr] \bauthor\bsnmBenjamini, \bfnmYoav\binitsY. \AND\bauthor\bsnmHeller, \bfnmRuth\binitsR. (\byear2008). \btitleScreening for partial conjunction hypotheses. \bjournalBiometrics \bvolume64 \bpages1215–1222. \biddoi=10.1111/j.1541-0420.2007.00984.x, issn=0006-341X, mr=2522270 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmBenjamini, \bfnmYoav\binitsY. \AND\bauthor\bsnmHochberg, \bfnmYosef\binitsY. (\byear1995). \btitleControlling the false discovery rate: A practical and powerful approach to multiple testing. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume57 \bpages289–300. \bidissn=0035-9246, mr=1325392 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmBenjamini, \bfnmY.\binitsY. \AND\bauthor\bsnmHochberg, \bfnmY.\binitsY. (\byear2000). \btitleOn the adaptive control of the false discovery rate in multiple testing with independent statistics. \bjournalJ. Educ. Behav. Statist. \bvolume25 \bpages60–83. \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmBenjamini, \bfnmYoav\binitsY., \bauthor\bsnmKrieger, \bfnmAbba M.\binitsA. M. \AND\bauthor\bsnmYekutieli, \bfnmDaniel\binitsD. (\byear2006). \btitleAdaptive linear step-up procedures that control the false discovery rate. \bjournalBiometrika \bvolume93 \bpages491–507. \biddoi=10.1093/biomet/93.3.491, issn=0006-3444, mr=2261438 \bptokimsref\endbibitem
- {barticle}[pbm] \bauthor\bparticlede \bsnmLichtenberg, \bfnmUlrik\binitsU., \bauthor\bsnmJensen, \bfnmLars Juhl\binitsL. J., \bauthor\bsnmFausbøll, \bfnmAnders\binitsA., \bauthor\bsnmJensen, \bfnmThomas S.\binitsT. S., \bauthor\bsnmBork, \bfnmPeer\binitsP. \AND\bauthor\bsnmBrunak, \bfnmSøren\binitsS. (\byear2005). \btitleComparison of computational methods for the identification of cell cycle-regulated genes. \bjournalBioinformatics \bvolume21 \bpages1164–1171. \biddoi=10.1093/bioinformatics/bti093, issn=1367-4803, pii=bti093, pmid=15513999 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmEfron, \bfnmBradley\binitsB. (\byear2004). \btitleLarge-scale simultaneous hypothesis testing: The choice of a null hypothesis. \bjournalJ. Amer. Statist. Assoc. \bvolume99 \bpages96–104. \biddoi=10.1198/016214504000000089, issn=0162-1459, mr=2054289 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmEfron, \bfnmBradley\binitsB. (\byear2007). \btitleCorrelation and large-scale simultaneous significance testing. \bjournalJ. Amer. Statist. Assoc. \bvolume102 \bpages93–103. \biddoi=10.1198/016214506000001211, issn=0162-1459, mr=2293302 \bptokimsref\endbibitem
- {bmisc}[author] \bauthor\bsnmFisher, \bfnmSir Ronald Aylmer\binitsS. R. A. (\byear1932). \bhowpublishedStatistical Methods for Research Workers, 4th ed. Oliver & Boyd, Edinburgh. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmForbes, \bfnmSimon A.\binitsS. A., \bauthor\bsnmBindal, \bfnmNidhi\binitsN., \bauthor\bsnmBamford, \bfnmSally\binitsS., \bauthor\bsnmCole, \bfnmCharlotte\binitsC., \bauthor\bsnmKok, \bfnmChai Yin\binitsC. Y., \bauthor\bsnmBeare, \bfnmDavid\binitsD., \bauthor\bsnmJia, \bfnmMingming\binitsM., \bauthor\bsnmShepherd, \bfnmRebecca\binitsR., \bauthor\bsnmLeung, \bfnmKenric\binitsK., \bauthor\bsnmMenzies, \bfnmAndrew\binitsA. \betalet al. (\byear2011). \btitleCOSMIC: Mining complete cancer genomes in the catalogue of somatic mutations in cancer. \bjournalNucleic Acids Res. \bvolume39 \bpagesD945–D950. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmFritz, \bfnmBjörn\binitsB., \bauthor\bsnmSchubert, \bfnmFalk\binitsF., \bauthor\bsnmWrobel, \bfnmGunnar\binitsG., \bauthor\bsnmSchwaenen, \bfnmCarsten\binitsC., \bauthor\bsnmWessendorf, \bfnmSwen\binitsS., \bauthor\bsnmNessling, \bfnmMichelle\binitsM., \bauthor\bsnmKorz, \bfnmChristian\binitsC., \bauthor\bsnmRieker, \bfnmRalf J.\binitsR. J., \bauthor\bsnmMontgomery, \bfnmKate\binitsK., \bauthor\bsnmKucherlapati, \bfnmRaju\binitsR. \betalet al. (\byear2002). \btitleMicroarray-based copy number and expression profiling in dedifferentiated and pleomorphic liposarcoma. \bjournalCancer Res. \bvolume62 \bpages2993–2998. \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmGhosh, \bfnmDebashis\binitsD. (\byear2011). \btitleGeneralized Benjamini–Hochberg procedures using spacings. \bjournalJ. Indian Soc. Agricultural Statist. \bvolume65 \bpages213–220, 262. \bidissn=0019-6363, mr=2868128 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmGhosh, \bfnmDebashis\binitsD. (\byear2012). \btitleIncorporating the empirical null hypothesis into the Benjamini–Hochberg procedure. \bjournalStat. Appl. Genet. Mol. Biol. \bvolume11 \bpagesArt. 11, front matter+19. \biddoi=10.1515/1544-6115.1735, issn=1544-6115, mr=2958610 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmHuang, \bfnmDa Wei\binitsD. W., \bauthor\bsnmSherman, \bfnmBrad T.\binitsB. T. and \bauthor\bsnmLempicki, \bfnmRichard A.\binitsR. A. (\byear2008). \btitleSystematic and integrative analysis of large gene lists using DAVID bioinformatics resources. \bjournalNature Protoc. \bvolume4 \bpages44–57. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmHuang, \bfnmDa Wei\binitsD. W., \bauthor\bsnmSherman, \bfnmBrad T.\binitsB. T. and \bauthor\bsnmLempicki, \bfnmRichard A.\binitsR. A. (\byear2009). \btitleBioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. \bjournalNucleic Acids Res. \bvolume37 \bpages1–13. \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmJiménez, \bfnmR.\binitsR. \AND\bauthor\bsnmYukich, \bfnmJ. E.\binitsJ. E. (\byear2002). \btitleAsymptotics for statistical distances based on Voronoi tessellations. \bjournalJ. Theoret. Probab. \bvolume15 \bpages503–541. \biddoi=10.1023/A:1014819112010, issn=0894-9840, mr=1898817 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmJin, \bfnmJiashun\binitsJ. \AND\bauthor\bsnmCai, \bfnmT. Tony\binitsT. T. (\byear2007). \btitleEstimating the null and the proportional of nonnull effects in large-scale multiple comparisons. \bjournalJ. Amer. Statist. Assoc. \bvolume102 \bpages495–506. \biddoi=10.1198/016214507000000167, issn=0162-1459, mr=2325113 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmKim, \bfnmJung H.\binitsJ. H., \bauthor\bsnmDhanasekaran, \bfnmSaravana M.\binitsS. M., \bauthor\bsnmMehra, \bfnmRohit\binitsR., \bauthor\bsnmTomlins, \bfnmScott A.\binitsS. A., \bauthor\bsnmGu, \bfnmWenjuan\binitsW., \bauthor\bsnmYu, \bfnmJianjun\binitsJ., \bauthor\bsnmKumar-Sinha, \bfnmChandan\binitsC., \bauthor\bsnmCao, \bfnmXuhong\binitsX., \bauthor\bsnmDash, \bfnmAtreya\binitsA., \bauthor\bsnmWang, \bfnmLei\binitsL. \betalet al. (\byear2007). \btitleIntegrative analysis of genomic aberrations associated with prostate cancer progression. \bjournalCancer Res. \bvolume67 \bpages8229–8239. \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmLoughin, \bfnmThomas M.\binitsT. M. (\byear2004). \btitleA systematic comparison of methods for combining -values from independent tests. \bjournalComput. Statist. Data Anal. \bvolume47 \bpages467–485. \biddoi=10.1016/j.csda.2003.11.020, issn=0167-9473, mr=2086483 \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmMuralidharan, \bfnmOmkar\binitsO. (\byear2010). \btitleAn empirical Bayes mixture method for effect size and false discovery rate estimation. \bjournalAnn. Appl. Stat. \bvolume4 \bpages422–438. \biddoi=10.1214/09-AOAS276, issn=1932-6157, mr=2758178 \bptokimsref\endbibitem
- {bbook}[author] \bauthor\bsnmOkabe, \bfnmAtsuyuki\binitsA., \bauthor\bsnmBoots, \bfnmBarry\binitsB., \bauthor\bsnmSugihara, \bfnmKokichi\binitsK. \AND\bauthor\bsnmChiu, \bfnmSung Nok\binitsS. N. (\byear2010). \btitleSpatial Tessellations: Concepts and Applications of Voronoi Diagrams, \bedition2nd ed. \bpublisherWiley, \blocationChichester. \bptokimsref\endbibitem
- {barticle}[pbm] \bauthor\bsnmOliva, \bfnmAnna\binitsA., \bauthor\bsnmRosebrock, \bfnmAdam\binitsA., \bauthor\bsnmFerrezuelo, \bfnmFrancisco\binitsF., \bauthor\bsnmPyne, \bfnmSaumyadipta\binitsS., \bauthor\bsnmChen, \bfnmHaiying\binitsH., \bauthor\bsnmSkiena, \bfnmSteve\binitsS., \bauthor\bsnmFutcher, \bfnmBruce\binitsB. \AND\bauthor\bsnmLeatherwood, \bfnmJanet\binitsJ. (\byear2005). \btitleThe cell cycle-regulated genes of Schizosaccharomyces pombe. \bjournalPLoS Biol. \bvolume3 \bpagese225. \biddoi=10.1371/journal.pbio.0030225, issn=1545-7885, pii=04-PLBI-RA-0992R2, pmcid=1157095, pmid=15966770 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmOwen, \bfnmArt B.\binitsA. B. (\byear2009). \btitleKarl Pearson’s meta-analysis revisited. \bjournalAnn. Statist. \bvolume37 \bpages3867–3892. \bptokimsref\endbibitem
- {bmisc}[author] \bauthor\bsnmPhillips, \bfnmDaisy\binitsD. \AND\bauthor\bsnmGhosh, \bfnmDebashis\binitsD. (\byear2014a). \bhowpublishedSupplement to “Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics.” DOI:\doiurl10.1214/13-AOAS707SUPPA. \bptokimsref\endbibitem
- {bmisc}[author] \bauthor\bsnmPhillips, \bfnmDaisy\binitsD. \AND\bauthor\bsnmGhosh, \bfnmDebashis\binitsD. (\byear2014b). \bhowpublishedSupplement to “Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics.” DOI:\doiurl10.1214/13-AOAS707SUPPB. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmPollack, \bfnmJonathan R.\binitsJ. R., \bauthor\bsnmSørlie, \bfnmTherese\binitsT., \bauthor\bsnmPerou, \bfnmCharles M.\binitsC. M., \bauthor\bsnmRees, \bfnmChristian A.\binitsC. A., \bauthor\bsnmJeffrey, \bfnmStefanie S.\binitsS. S., \bauthor\bsnmLonning, \bfnmPer E.\binitsP. E., \bauthor\bsnmTibshirani, \bfnmRobert\binitsR., \bauthor\bsnmBotstein, \bfnmDavid\binitsD., \bauthor\bsnmBørresen-Dale, \bfnmAnne-Lise\binitsA.-L. \AND\bauthor\bsnmBrown, \bfnmPatrick O.\binitsP. O. (\byear2002). \btitleMicroarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. \bjournalProc. Natl. Acad. Sci. USA \bvolume99 \bpages12963–12968. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmPounds, \bfnmS. B.\binitsS. B. (\byear2006). \btitleEstimation and control of multiple testing error rates for microarray studies. \bjournalBrief. Bioinf. \bvolume7 \bpages25–36. \bptokimsref\endbibitem
- {barticle}[pbm] \bauthor\bsnmPounds, \bfnmStan\binitsS. \AND\bauthor\bsnmMorris, \bfnmStephan W.\binitsS. W. (\byear2003). \btitleEstimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of -values. \bjournalBioinformatics \bvolume19 \bpages1236–1242. \bidissn=1367-4803, pmid=12835267 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmPyke, \bfnmRonald\binitsR. (\byear1965). \btitleSpacings. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume27 \bpages395–449. \bptokimsref\endbibitem
- {barticle}[mr] \bauthor\bsnmStorey, \bfnmJohn D.\binitsJ. D. (\byear2002). \btitleA direct approach to false discovery rates. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume64 \bpages479–498. \biddoi=10.1111/1467-9868.00346, issn=1369-7412, mr=1924302 \bptokimsref\endbibitem
- {bbook}[author] \bauthor\bsnmStouffer, \bfnmSamuel A.\binitsS. A., \bauthor\bsnmSuchman, \bfnmEdward A.\binitsE. A., \bauthor\bsnmDeVinney, \bfnmLeland C.\binitsL. C., \bauthor\bsnmStar, \bfnmShirley A.\binitsS. A. \AND\bauthor\bsnmWilliams Jr., \bfnmRobin M.\binitsR. M. (\byear1949). \btitleThe American Soldier: Adjustment During Army Life. \bseriesStud. Soc. Psychol. World War II \bvolume1. \bpublisherPrinceton Univ. Press, \blocationPrinceton. \bptokimsref\endbibitem
- {barticle}[pbm] \bauthor\bsnmStrimmer, \bfnmKorbinian\binitsK. (\byear2008). \btitleA unified approach to false discovery rate estimation. \bjournalBMC Bioinformatics \bvolume9 \bpages303. \biddoi=10.1186/1471-2105-9-303, issn=1471-2105, pii=1471-2105-9-303, pmcid=2475539, pmid=18613966 \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmTagnon, \bfnmHenry J.\binitsH. J., \bauthor\bsnmWhitmore, \bfnmWillet F.\binitsW. F. \AND\bauthor\bsnmShulman, \bfnmN Raphael\binitsN. R. (\byear1952). \btitleFibrinolysis in metastatic cancer of the prostate. \bjournalCancer \bvolume5 \bpages9–12. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmTonon, \bfnmGiovanni\binitsG., \bauthor\bsnmWong, \bfnmKwok-Kin\binitsK.-K., \bauthor\bsnmMaulik, \bfnmGautam\binitsG., \bauthor\bsnmBrennan, \bfnmCameron\binitsC., \bauthor\bsnmFeng, \bfnmBin\binitsB., \bauthor\bsnmZhang, \bfnmYunyu\binitsY., \bauthor\bsnmKhatry, \bfnmDeepak B.\binitsD. B., \bauthor\bsnmProtopopov, \bfnmAlexei\binitsA., \bauthor\bsnmYou, \bfnmMingjian James\binitsM. J., \bauthor\bsnmAguirre, \bfnmAndrew J.\binitsA. J. \betalet al. (\byear2005). \btitleHigh-resolution genomic profiles of human lung cancer. \bjournalProc. Natl. Acad. Sci. USA \bvolume102 \bpages9625–9630. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmTsafrir, \bfnmDafna\binitsD., \bauthor\bsnmBacolod, \bfnmManny\binitsM., \bauthor\bsnmSelvanayagam, \bfnmZachariah\binitsZ., \bauthor\bsnmTsafrir, \bfnmIlan\binitsI., \bauthor\bsnmShia, \bfnmJinru\binitsJ., \bauthor\bsnmZeng, \bfnmZhaoshi\binitsZ., \bauthor\bsnmLiu, \bfnmHao\binitsH., \bauthor\bsnmKrier, \bfnmCurtis\binitsC., \bauthor\bsnmStengel, \bfnmRobert F.\binitsR. F., \bauthor\bsnmBarany, \bfnmFrancis\binitsF. \betalet al. (\byear2006). \btitleRelationship of gene expression and chromosomal abnormalities in colorectal cancer. \bjournalCancer Res. \bvolume66 \bpages2129–2137. \bptokimsref\endbibitem
- {bmisc}[author] \bauthor\bsnmTurner, \bfnmRolf\binitsR. (\byear2013). \bhowpublisheddeldir: Delaunay triangulation and Dirichlet (Voronoi) tessellation. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmWilkinson, \bfnmBryan\binitsB. (\byear1951). \btitleA statistical consideration in psychological research. \bjournalPsychol. Bull. \bvolume48 \bpages156. \bptokimsref\endbibitem
- {barticle}[author] \bauthor\bsnmZacharski, \bfnmLeo R.\binitsL. R., \bauthor\bsnmWojtukiewicz, \bfnmMarek Z.\binitsM. Z., \bauthor\bsnmCostantini, \bfnmVincenzo\binitsV., \bauthor\bsnmOrnstein, \bfnmDeborah L.\binitsD. L., \bauthor\bsnmMemoli, \bfnmVincent A.\binitsV. A. \betalet al. (\byear1992). \btitlePathways of coagulation/fibrinolysis activation in malignancy. \bjournalSemin. Thromb. Hemost. \bvolume18 \bpages104. \bptokimsref\endbibitem