Subset Testing and Analysis of Multiple Phenotypes (STAMP)

# Subset Testing and Analysis of Multiple Phenotypes (STAMP)

Andriy Derkach Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville MD 20850, USA Ruth M. Pfeiffer pfeiffer@mail.nih.gov Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville MD 20850, USA
###### Abstract

Meta-analysis of multiple genome-wide association studies (GWAS) is effective for detecting single or multi marker associations with complex traits. We develop a flexible procedure (“STAMP”) based on mixture models to perform region based meta-analysis of different phenotypes using data from different GWAS and identify subsets of associated phenotypes. Our model framework helps distinguish true associations from between-study heterogeneity. As a measure of association we compute for each phenotype the posterior probability that the genetic region under investigation is truly associated. Extensive simulations show that STAMP is more powerful than existing approaches when the proportion of truly associated outcomes is below 70%. For other settings, the power of STAMP is similar to that of existing methods. We illustrate our method on two examples, the association of a region on chromosome 9p21 with risk of fourteen cancers, and the associations of expression of quantitative traits loci (eQTLs) from two genetic regions with their cis-SNPs measured in seventeen tissue types using data from The Cancer Genome Atlas (TCGA). Meta-analysis; Gene based test; Mixture model; Heterogeneity.

\externaldocument

SuppMaterial

## 1 Introduction

Sometimes it is of interest to assess the association of genetic variation within a pre-specified region with different, possibly related, phenotypes, and to quantify heterogeneity of the associations. For example, in a recent analysis, Li and others [12] studied the associations of single nucleotide polymorphisms (SNPs) in a chromosome 9p21 region that includes interferon (IFN) genes and several tumor suppressor genes, from eight genome-wide association (GWAS) studies with seven cancers. The authors conducted SNP-level analyses for each cancer and used a subset-based statistical approach (ASSET) [3] to combine SNP-level P-values across cancers. In another example, several recent studies reported single SNP associations between with expression quantitative trait loci (eQTL) expression measured in multiple tissues [5].

Standard meta-analytic approaches to combine summary information from a single SNP are not powerful when the SNP has an effect in only a subset of phenotypes or in opposite directions for some phenotypes. Multiple methods are available to assess the association of common genetic variants such as GWAS SNPs with risk of multiple phenotypes measured on the same samples [26, 17, 23] but only few methods are available based on the summary statistic. Methods such as ASSET and CPBayes [14] use summary statistics from each study to identify subsets of studies associated with a particular SNP, but they do not allow one to readily combine information from multiple SNPs in a locus. Information stemming from linkage disequilibrium (LD) is not utilized when analyzing each SNP in a region separately. Several adaptive gene-based approaches are available to study multiple SNPs simultaneously [20, 22, 9] and accommodate heterogeneous SNP effects, or effects that are only present in subset of studies. However, these approaches only give global measures of association and do not identify the subset of associated studies.

We therefore propose a new approach to identify a subset of phenotypes that are associated with a particular genomic region and to explore genetic heterogeneity of associations for that region with different outcomes. First, for each outcome separately, we combine the SNP specific estimates using an aggregated level test statistic. We then assume that the test statistics arise from a mixture distribution with two components, one under the null hypothesis of no association of the study specific outcome with the genetic region, and one assuming that there is an association. We use a hierarchical model to describe SNP effects (Section 2) that can accommodate varying levels of between-phenotype heterogeneity. We then test if the mixture distribution provides a better fit to the data than a single component density, estimate the parameters of the mixture and compute posterior probabilities that a particular outcome is associated with the genomic region (Section 3). To illustrate our method, we analyzed the association of the 9p21 region (using GWAS SNPs) with various cancers, and the genetic associations of expression of quantitative traits loci (eQTLs) from two genetic regions measured in seventeen different tissue types (Section 4). We study our method in simulations (Section 5) before closing with a discussion (Section 6).

## 2 Data and models

### 2.1 Association models

We now describe the model assumed to govern the association between a particular phenotype and genotypes for SNPs measured in a genomic region, where or denotes the number of minor alleles at locus . We consider the generalized linear model (GLM) setting [see e.g. 15], and assume that the conditional expectation of given is

 EF(Ys|Xs,\boldmathγs)=h(γs0+p∑i=1γsiXsi)=h(γs0+\boldmathγ′sXs), (1)

where is a known function and a vector of parameters. If the k-th SNP is not associated with , . Additional covariates can easily be accommodated in model (1) through We assume that is a probability density or mass function from the exponential family for some known functions , and , where is a dispersion parameter. is called the canonical parameter and is related to the moments of through and [15]. Letting , in (1) is given by . The subscript indicates the multi-variable mean model in (1).

### 2.2 Properties of estimates obtained from marginal SNP models

In GWAS studies the estimate for the association of the jth SNP with outcome is typically obtained by maximizing a marginal likelihood that only includes the genotype for the jth SNP in the specification of the mean function instead of the whole vector ,

 EG(Ys|X,βs0j,βsj)=h(βs0j+βsjXsj), (2)

where denotes the same function as in (1). We use the subscript to denote the misspecified marginal mean probability model that uses only individual SNP genotypes. We show in the Appendix LABEL:sec:bias that, conditional on , the estimate based on (2) converges to that satisfies the equation

 βsj=∑pi=1γiCov(Xsi,Xsj)Var(Xsj), (3)

where is the true associate parameter for SNP in (1), when is the identity link function and the logistic link, under both, prospective and retrospective sampling, i.e. for case-control data, under the assumption of a rare outcome. As can be seen directly from (3), when there is no association, i.e. for all SNPs , then also for all , and when the SNPs are uncorrelated, then .

Letting

 Ωs =⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Cov(Xs1,Xs1)Var(Xs1)⋯Cov(Xs1,Xsp)Var(Xs1)⋮⋮⋮Cov(Xsp,Xs1)Var(Xsp)⋯Cov(Xsp,Xsp)Var(Xsp)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (4)

conditional on the vector of true effects , the estimates from the marginal model (2) have the following limiting distribution

 ^\boldmathβs|\boldmathγs∼N(% \boldmathβs,Σs)=N(Ωs\boldmathγs,Σs), (5)

where which is typically not known for the marginal estimates. For small effects , following [8], we thus use the approximation

 Σs≈diag(σs1,...,σsp)Υsdiag(σs1,...,σsp), (6)

where is the correlation matrix between the SNPs that is assumed to be known. Letting , can also be written as where is a diagonal matrix of standard error estimates of , . Yang and others [25] derived similar results to (3) using a least square approach for the linear model and extended it to case-control data using a liability threshold model.

### 2.3 Hierarchical model for SNP effects

To accommodate heterogeneity between associations for different phenotypes , we assume that the true association parameters in (1) for a particular phenotype arise from normal distributions,

 γsj∼N(μj,τj) for j=1,…,p, (7)

where and are independent identically distributed random variables that are also independent of each other. Thus conditionally on and , the , are independent. We do not assume any specific distributions for the s and s, but require that and .

The parameters of interest that characterize the strength of associations between and the genetic region are and . We consider two different assumptions for and , termed “null hypotheses”, when there is no association of the region with a particular phenotype. Under the first one, the “strong null hypothesis ()”, that has been used in approaches proposed for meta analysis of single or multiple SNPs [6, 10, 21, 19], , and for all SNPs in a region, and thus , , without any variation. The alternative hypothesis ( appropriate for the strong null hypothesis, and studied previously for variance component testing in random effect models [e.g. 13, 10], is that . Under the second, weaker set of assumptions , we only assume that for all SNPs . Thus, under , for some SNPs due to different LD in different populations, measurement error or other sources of variation. Under the alternative hypothesis, that has been used in the context of meta-analysis [ e.g. 6, 21, 19], .

Based on equations (5) and (7), the distribution of for phenotype conditional on the vectors and is

 ^\boldmathβs|\boldmathμ,\boldmathτ∼N(Ωs\boldmathμ,ΩsΛΩ′s+Σs), (8)

where is a diagonal matrix with for and is given in (6). To recover the true association parameters and , we rotate the estimates, to obtain with distribution

 ^\boldmathβ∗s|\boldmathμ,\boldmathτ=Ω−1s^\boldmathβs|\boldmathμ,\boldmathτ∼N(\boldmathμ,Λ+Ω−1sΣsΩ′−1s)=N(\boldmathμ,Λ+Σ∗s), (9)

where . Under local alternatives, i.e. small effects , [21, 25] showed that , where denotes the residual variance and the sample size of study . Under the null hypotheses the distributions of the rotated estimates of effect sizes in (9) simplify to

 ^\boldmathβ∗s|\boldmathμ,\boldmathτ\lx@stackrelHst0∼N(0,Σ∗s) and ^\boldmathβ∗s|% \boldmathμ,\boldmathτ\lx@stackrelHw0∼N(0,Λ+Σ∗s), (10)

respectively.

## 3 Assessing the association of a genetic region with multiple phenotypes

For each phenotype we combine the linearly transformed values using a linear or quadratic statistic , which are asymptotically equivalent to variance component tests to assess high dimensional alternatives [4, 21, 11]

Linear tests have good power if a large proportion of SNPs in the region under consideration are associated and have effects in the same direction , while quadratic test statistics are robust to different signs of effect estimates and are more powerful when the proportion of associated SNPs in the region is small [4, e.g.] Under heterogeneity of associations of phenotypes , , we assume that arises from a mixture model that we present next.

### 3.1 Mixture model

If only a proportion of the phenotypes are associated with the genetic region under investigations, we assume the observed test statistics arise from a mixture distribution

 f(Ts)∼(1−π)φ0(Ts)+πφ1(Ts). (11)

In (11), denotes the density of under the null hypothesis of no association of that particular genetic region with , and is the density when the region is associated with the phenotype. The mixing proportion can be interpreted as the prior probability of a phenotype having genetic associations. Functional information can be incorporated into , e.g. by using a covariate that captures biologically relevant data through .

For both, our linear and our quadratic summary statistics , and can be approximated by normal densities. We discuss the parameterization of the first two moments and of and the estimation of model (11) in detail in Sections 3.2 and 3.3, respectively.

The basic steps for assessing heterogeneity of associations for phenotypes estimating the association parameters and , and for identifying the subset of outcomes associated with a genomic region are as follows.

1. Test if the statistics , arise from the mixture model in (11) or from a single density that could be either or . Details on the testing are given in Section 3.4.

2. If there is no evidence of heterogeneity, use a standard region based meta analysis [10, 21] to combine evidence across studies.

3. If there is evidence of heterogeneity, use the mixture model to compute the probability that the region is associated with a particular phenotype , i.e. the posterior probability

 ^ps=P(Ts arises from component 1|T1,…,TS)=^π^φ1(Ts)(1−^π)^φ0(Ts)+^π^φ1(Ts).
4. If for some prespecified threshold , e.g. , then phenotype is considered to be associated with the region.

### 3.2 A linear summary test statistic, TLs

We first propose and study a linear test statistic to combine transformed SNP effects across all SNPs in a region,

 TLs=\boldmath1′Σ∗−1s^% \boldmathβ∗s{\boldmath1′diag(Σ∗−1s)\boldmath1}1/2. (12)

As is asymptotically normally distributed, conditional on and is normally distributed with moments

 E\boldmathμ,Λ(TLs)=\boldmath1′Σ∗−1s\boldmathμ{\boldmath1′diag(Σ∗−1s)\boldmath1}1/2 and Var%\boldmath$μ$,Λ(TLs)=\boldmath1′Σ∗−1s\boldmath1+\boldmath1′Σ∗−1sΛΣ∗−1s\boldmath1{\boldmath1′diag(Σ∗−1s)\boldmath1}1/2. (13)

The unconditional mean and variance of are

 E(TLs)=\boldmath1′Σ∗−1sE(μ){\boldmath1′diag(Σ∗−1s)\boldmath1}1/2 and Var(TLs)=\boldmath1′Σ∗−1s\boldmath1+{E(τ)+Var(μ)}\boldmath1′Σ∗−2s\boldmath1{\boldmath1′diag(Σ∗−1s)\boldmath1}1/2. (14)

The numerator of the variance of under the strong null is and under the weak null hypothesis it is . Under the alternative hypothesis and in (14) do not simplify further. To test the weak null hypothesis, we need to estimate four parameters under the alternative hypothesis and one under the null model (see Table 1 for parameters estimated under the various hypotheses).

### 3.3 A quadratic summary test statistic, TQs

The linear test statistic in the Section 3.2 has the disadvantage that is sensitive to the directions of the associations, i.e. the signs of the , and is not powerful when signal is sparse in a region, i.e. comes from only a few SNPs. To overcome these limitations we also combine the SNP estimates for phenotype using a quadratic form,

 TQs=^\boldmathβ∗′sΣ∗−1sWsΣ∗−1s^\boldmathβ∗s, (15)

where is a preselected weight matrix. Since the have an asymptotically multivariate normal distribution, is a linear combination of independent non-central chi-squared random variables [4, 24] where the non-centrality parameters depend on and . Within the normal mixture framework in Section 3.1 we utilize that if the number of SNPs in a region is large, is approximately normally distributed with mean and variance . Note that for , corresponds to the Hotelling’s test statistic [4, 21]. Here, we let , where denotes identity matrix. This choice may improve power because it assigns bigger weights to the largest principal components of , which are likely to explain a large proportion of the phenotypic variation. For small , is asymptotically equivalent to the C-alpha test for rare variants under local alternatives [16]. Other choices of based on MAFs were proposed in Wu and others [24] and Basu and Pan [2] in the context of rare variant analysis.

Conditional on and the diagonal matrix , the moments of (omitting the subscript ) are

 E\boldmathμ,Λ(TQ)=\boldmathμ′Σ∗−2\boldmathμ+tr(Σ∗−1)+tr(ΛΣ∗−2) and (16)
 Var\boldmathμ,Λ(TQ)=2tr{(Σ∗−1+Σ∗−1ΛΣ∗−1)2}+4\boldmathμ′Σ∗−1(Σ∗−1+Σ∗−1ΛΣ∗−1)Σ∗−1\boldmathμ. (17)

The unconditional moments are

 E(TQ)=tr(Σ∗−1)+eψtr(Σ∗−2)+{E(μ)}2\boldmath1′Σ∗−2\boldmath1, (18)

where and , and

 (19)

Letting , (18) and (19) depend on the following moments of and : , , , , and . The moments of for a general matrix are given in the Appendix LABEL:sec:moments.

Under the strong null hypothesis (18) and (19) simplify to , and under the weak null hypothesis to and

The identifiability of the parameters in the first two moments of under either null hypothesis can be seen immediately. Here we thus discuss identifiability of , , and from (18) and (19) under the alternative hypotheses. The signs of and are not identifiable. The identifiability of and is ensured from the form of if there are at least two studies with different matrices . Similarly and are identifiable from the second moments of if there are at least two studies with different matrices . If (e.g. the SNPs are independent), we cannot distinguish between effects of and . This special case is further discussed in Appendix LABEL:sec:indep_case.

### 3.4 Testing for heterogeneity of associations among studies

Testing for heterogeneity of associations among phenotypes corresponds to assessing if or arise from a single density or a mixture of densities. We thus use a likelihood ratio test statistic and propose two parametric bootstrap procedures to compute p-values, one under the strong and one under the weak null. Under the strong null, for each bootstrap replication , we generate rotated estimates for . Then we recalculate the test statistic and obtain a new value of based on the vector of or .

Under the weak null, however, the replication procedure is more complicated, as the distribution of the marginal estimates depends on the diagonal matrix , i.e. the second moment of the . We consider two different procedures for and . For the linear statistic, we directly generate from a normal distribution with mean and covariance matrix where is the estimate obtained from moments of the linear statistic (14).

We do not generate directly from a normal distribution, because when , the number of SNPs is small, or LD is high in the region, the normal approximation may not be appropriate. Instead, we generate the estimates of the effect sizes as functions of as follows. We estimate and by solving two unbiased estimation equations under the restriction that the estimates cannot be negative,

 S∑i=1p∑j=1⎧⎪⎨⎪⎩^β2ijσ2sj−1−E(τ)σ2sj⎫⎪⎬⎪⎭=0 and S∑i=1p∑j=1⎧⎨⎩^β4ijσ4sj−1−3E(τ2)σ4sj−6E(τ)σ2sj⎫⎬⎭=0, (20)

to obtain . We then draw the elements of the diagonal matrix from an inverse-gamma distribution with the first two moments equal to and . Transformed marginal estimates are then generated and used to calculate the quadratic statistics .

For both procedures, p-values are calculated as .

## 4 Data examples

We illustrate our method on one data example that uses binary and one based on continuous .

### 4.1 Association of a chromosome 9p21 region with multiple cancers

We used data from GWAS studies in dbGaP to assess the association of a region on chromosome 9p21 with fourteen different cancers (see Supplemental Table LABEL:tab:cancers) We applied LD pruning of the SNPs in the region with LD thresholds (e.g. pairwise LD) 0.25, 0.5 and 0.75. As we had access to individual level data from all studies we first estimated the log-odds ratio and standard error for each SNP for each cancer separately, from logistic regression models adjusted for gender, age, study and 10 principle component scores to control for population stratification. SNPs were coded as 0, 1,or 2 minor alleles in these models. Additionally we obtained study-specific estimates in (4). We then computed and using the cancer specific , standard error estimates and , under the mixture model assumption. We also computed tests and under the assumption of a single density, given by

 TLMeta=∑Ss=1TLs/Var0(TLs)√∑Ss=11/Var0(TLs), (21)

where is calculated under the strong null hypothesis and

 TQMeta=S∑s=1^\boldmathβ∗′sΣ∗−2s^\boldmathβ∗s=S∑s=1TQs. (22)

To test the weak null hypothesis, we used pancreatic cancer as a negative control as there is no known association between this cancer and SNPs in the region of interest.

Results from the various methods are presented in Tables 2 and 3 for LD thresholds 0.5 and 0.75. The lowest single study p-value for the linear statistic was 0.21, observed for breast cancer. When we tested the strong null with the linear under the mixture and single density assumption (), we did not detect statistical significant associations between the genetic region and any of the cancers, and the overall p-values were and , respectively. In contrast, the quadratic test detected an statistically significant association between the region and esophageal cancer, with p-values 0.0001 and 0.001 for LD threshold 0.5 and 0.75, respectively, and suggestive p-values for stomach cancer and glioma. Using standard meta analysis with , we did not detect an overall association, regardless of LD threshold. However, the quadratic test under the mixture model detected an association between the region and esophageal cancer, with a posterior probability estimated close to 1, while providing suggestive evidence for stomach cancer (posterior probability ). The same results were observed for SNPs selected using the LD threshold of 0.25 (see Supplemental Tables LABEL:tab:R2).

Lastly, we tested the weak null hypothesis with and under mixture model using pancreatic cancer as a negative control outcome. For the LD threshold 0.5, the parameters in mixture model were , , and . The small value of corresponding to , indicates that signal is likely sparse in the region. We observed extremely low estimates of the heterogeneity parameters and because only one cancer outcome had a strong association with the region.

Similarly to the results for the strong null, only under the mixture model detected the association with esophagus cancer and provided suggestive evidence for stomach cancer (See Supplemental Table LABEL:tab:WR1).

### 4.2 Associations of two genetic regions with expression of quantitative trait loci (eQTL) data from multiple tissues

To illustrate our method for continuous phenotypes , we used genotype and total gene expression data based on RNA sequencing for 17 tumor tissues from The Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/). Normalization for the RNA-seq data and genotype QC are described in Supplementary Materials of [7]. Here we focused on eQTL data from two genes, CTSW and LARS2, and the association with SNPs in their cis region (i.e. less than 1000,0000 base pairs from the target gene).

We first estimated linear regression coefficients and standard errors for each SNP for each tumor tissue using linear regression models, adjusted for sex, age and the top five principle component scores, and obtained outcome-specific estimates for genotype correlations in (4). We then computed standard meta analytic tests, , and and under the mixture model based on the tissues specific , and .

Results for the eQTL data for the seventeen tissues and SNPs from LARS2 gene are presented in Table 5 for the LD thresholds 0.5 (see Supplemental Table LABEL:tab:R2Sb for results with the LD threshold 0.75). The linear tests an did not detect an association between tissues and the genetic region, while both quadratic tests did and based on the posterior probabilities for all tissues were equal to one. The parameter estimates in the mixture model were for the proportion of associated studies, and for the mean values of the genetic effect sizes, and and for the mean values of the heterogeneity parameters. Large values of these parameters indicate that a single density with heavy tails is the best fit to the data. Therefore, our approach may have may have lower specificity when proportion of associated studies and estimated effects are heterogeneous as indicated by a large posterior probability for PAAD tissue, which had a marginal p-value of 0.41.

For this example, we did not test the weak null hypothesis as we did not have knowledge about a negative control study.

## 5 Simulations

### 5.1 Setup

We assessed the power of our approach for both binary and continuous outcomes, . To generate realistic patterns of LD, we used genotypes of common SNPs (e.g. MAF 5%) on chromosome 6 observed in the 4631 controls from the glioma study [18] also used in Section 4.1. We applied LD pruning to ensure that the maximal pairwise LD between SNPs was no larger than 0.5. For each outcome setting we generated studies, of which and studies had SNPs associated with (termed “causal SNPs”). We investigated the type 1 error and the power of the mixture method for two LD patterns. For the “high LD pattern” setting we used genotypes for 210 common SNPs in the region from 29600054bp to 31399945bp on chromosome 6 (HLA I class region). For the “low LD pattern” commonly observed in a genetic region, we also selected 210 SNPs in the region from 110391bp to 1525603b on chromosome 6 with pairwise LD smaller than 0.5. We also studied the impact of sample size of the studies with no signal on power. For binary , the sample size for studies with causal SNPs was , and the sample sizes of studies with no signal was , and . For continuous outcomes, the sample size for studies with causal SNPs was , and the sample sizes of studies with no signal was , and .

For studies under the strong null, we generated phenotypes from and for binary , we randomly assigned 2500 cases and 2500 controls to 5000 genotypes. For the studies with causal SNPs, we randomly selected from SNPs and generated for in model (1) from , where and where denotes a normal distribution truncated at 0. Continuous phenotypes are generated from , where . For simulations based on case control data, we generate , where with for a large cohort and then sampled cases and controls.

Under the weak null, for null SNPs we generated from and . For the studies with randomly selected causal SNPs, we generated for using the hierarchical structure discussed for the strong null.

For both, the strong and weak null hypotheses, we investigated the type 1 error () and the power of the mixture based approach for and within our framework for and . We used two estimates for the matrix of correlations among SNPs, in (4): 1) a global external estimate obtained from the original 4250 original controls and 2) and internal estimates obtained separately for each study (, ) based on observed genotypes.

We compared the power of and under the mixture model to that of and in and in Section 4.1, and additionally to the sum of Hotelling tests, The asymptotic distributions for these tests are calculated under the strong null hypothesis [21, 10]. For and , we used log-likelihood a ratio test statistic similar to that used for the mixture models for and , but with under the alternative hypothesis.

### 5.2 Simulation results

#### 5.2.1 Type 1 error for testing for heterogeneity of associations

The empirical type 1 error rates for our test statistics under the mixture framework with binary and continuous outcomes are presented in Supplemental Tables LABEL:type1 and LABEL:type1_cont for and , for the strong and weak null hypothesis. The mixture model with had the nominal type 1 error, regardless of the LD pattern, type of estimate of or type of null hypothesis. For the mixture model with when LD was low, the empirical type 1 error was slightly conservative for both internal and external estimates of . However, when LD was high, the empirical type 1 error estimates were more conservative for both null hypothesis for external estimates of that do not capture LD patterns as well as internally estimated .

#### 5.2.2 Power to test the strong null

Here, we focus on findings for binary . Results for continuous phenotypes were qualitatively similar and are presented in Supplemental Figures LABEL:FigC1-LABEL:FigC4. The results from our power studies are summarized in Figures 2-4 and Supplemental Figures LABEL:Fig7 - LABEL:Fig14. The mixture approach had better power than other methods (panels of and of Figures 2 - 4 and Supplemental Figures LABEL:Fig7 - LABEL:Fig14 ) when the proportion of studies with associated SNPs was below 50%. When proportion of studies with signal was above 50%, and based on the single component outperform and based on mixture model (panels of and of Figures 2 - 4 and Supplemental Figures LABEL:Fig7 - LABEL:Fig14). For the same settings the linear tests, and , outperformed other tests when effect sizes were low and in the same directions (panels of and of Figures 2 and 4 and Supplemental Figures LABEL:Fig7 - LABEL:Fig10). But, as expected and were not powerful when the genetic effects were heterogeneous (Figures 2 and 4 and Supplemental Figures LABEL:Fig11 - LABEL:Fig14). The empirical power of and under the mixture model was not significantly affected by the sample size of the studies not associated with the region. Similarly, was not affected by the sample size of the null studies, because it is explicitly assigns the same weight to each study. In contrast, the power of was higher when the sample sizes of the studies with associated SNPs were larger than those of studies with no signal (Supplemental Figures LABEL:Fig7 - LABEL:Fig10). In summary, the power of our test statistic did not strongly depend on the sample sizes of the studies with and without outcome associated SNPs. When LD in a region was high, using external estimates for resulted in more conservative Type 1 error and thus decreased of power of and slightly lower power of . When was estimated from study specific data, the power of the tests was similar for both, the high and low LD pattern (see Figures 2-4 and Supplemental Figures LABEL:Fig7 - LABEL:Fig14 ).

#### 5.2.3 Power to test the weak null

Similarly to testing under the strong null hypothesis, the LD pattern did not noticeably impact the power when was estimated internally (Figures 6 and 6 and Supplemental Figures LABEL:Fig15-LABEL:Fig18). The quadratic test statistic under the mixture also had higher power than all other tests when the proportion of studies with associated SNPs was small (panel and of Figures 6 and 6). However, the power of dropped noticeably when more than 50% studies had associated SNPs. Additionally, the power of under the mixture model was depended on sample sizes of the studies with causal SNPs (Figures 6, 6 and Supplemental Figures LABEL:Fig15-LABEL:Fig18). The reason for this loss of power is that when is large, the variance of the component density of the mixture (11) is much larger than the variance for , which makes it challenging to identify heterogeneity of associations, as a single component density may fit the observed data as well as the mixture. We saw similar results for the eQTL data and SNPs from the LARS2 gene (see Section 4.2). Similarly, has low power for all simulation scenarios under the weak null (Figures 6 and 6). The power of the linear tests under both, the mixture and single component density models, increased as the number of studies with signal increased (see Figures 6 and 6), because , the density of linear test statistic has mean equal to 0 under null hypothesis. Lastly, the power for testing the weak null was much higher when the null studies had larger sample sizes, because they provide more informative on the true amount of heterogeneity captured by .

## 6 Discussion

We proposed a novel approach based on a mixture model to assess the heterogeneity of associations of genetic variation in a pre-specified region with different phenotypes, and to identify the subset of phenotypes associated with the region. Our simulations and the eQTL data example show that when the proportion of phenotypes with associations in the region is not too large, e.g. , combining region specific estimates using a quadratic test statistic under the mixture model assumption had much better power to identify truly associated outcomes than standard meta analytic approaches. However, when the proportion of associated outcomes was high, standard meta analytic methods proved more powerful than the mixture model approach. Similar conclusions were reached in the context of testing rare variants, where using linear tests with data driven weights [4, e.g.] worked well when the proportion of variants with signal was low, but a simple sum test had better power when the proportion was high.

There are many tests for associations between a genetic region and a single phenotype for common [e.g. 27, 22] and rare SNPs [ e.g. 16, 11]. Aggregated level methods for common variants for testing gene- and pathway level associations typically are based on p-values. Few methods exist to assess cross-phenotype associations using summary statistics. Bhattacharjee and others [3] extended fixed effects meta analysis for a single SNP by allowing some subsets of outcomes to have no associations. Our method expands this work in two ways. First, we aggregate association estimates from multiple SNPs measured in a region, and thus utilize information stemming from LD. We also quantify heterogeneity between associations for different phenotypes. Another advantage of our approach is that it allows one to incorporate prior or external information on the likelihood that a phenotype exhibits associations with a region via the mixing proportion, which can improve identification of associated outcomes. Our framework also extends a recently proposed Bayesian method (CPBayes) for testing the association between a single SNP and multiple phenotypes [14]. CPBayes imposes a spike and slab prior on the genetic SNP effect and uses a mixture of two normal distributions to represent the SNP effect under the null and alternative hypotheses. When a single SNP is analyzed (), the mixture set up is the same as that of our method. However, we estimate the amount of heterogeneity between outcome specific associations, captured by the parameter , directly from the data, while in [14] it is pre-specified. Mis-specifying the amount of heterogeneity will lower power, sensitivity and specificity of the procedure in [14].

Our approach also differs from other recently proposed methods for gene-based testing with multiple phenotypes that require phenotypes to be measured on the same individuals to estimate between phenotype correlations [22, 20, 9]. For primary cancer outcomes one could relax this assumption, as it is exceedingly unlikely to be diagnosed with two primary cancers, and simply assume outcomes are uncorrelated. As a result, one could apply these methods to the summary statistics from multiple studies to test whether there is at least one study that shows associations. However, these methods cannot identify which particular outcomes are associated with the SNPs in a gene/region.

Our work extends beyond only testing the presence of any association between SNPs in a region and outcomes. Using the weak null hypothesis, we also assess if associations are due to common signal or due to heterogeneity. A limitation is that to test this hypothesis, we require availability of a study without association, i.e. a negative control known to arise from the null hypothesis. The control phenotype study helps distinguish between-study heterogeneity from true underlying associations. Another limitation of our method is that if study specific estimates of are not available, one would need to use publicly available genetic data such as [1] to estimate , which results in somewhat lower power.

Several problem remain to be addressed in future work. Majumdar and others [14] and Bhattacharjee and others [3] extended their approaches to accommodate shared controls between studies. While it is straight forward to extend our framework for linear test statistics to shared controls, the extension for the quadratic test requires more work. To extend region specific calculations to a genome-wide setting, more efficient permutation approaches are needed to compute p-values for our model.

## 7 Software

Software in the form of R code, together with a sample input data set and complete documentation is available at https://github.com/derkand/STAMP.

## Acknowledgments

We used the computational resources of the NIH HPC Biowulf cluster and thank Drs. Mitchell Gail and Joshua Sampson for helpful comments.

Conflict of Interest: None declared.

## References

• 1000 Genomes Project Consortium [2010] 1000 Genomes Project Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073.
• Basu and Pan [2011] Basu, S. and Pan, W. (2011). Comparison of statistical tests for disease association with rare variants. Genetic Epidemiology 35(7), 606–619.
• Bhattacharjee and others [2012] Bhattacharjee, S., Rajaraman, P., Jacobs, K.B., Wheeler, W.A., Melin, B.S., Hartge, P., Yeager, M., Chung, C.C., Chanock, S.J. and Chatterjee, N. (2012). A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. The American Journal of Human Genetics 90(5), 821–835.
• Derkach and others [2014] Derkach, A., Lawless, J. F. and Sun, L. (2014). Pooled association tests for rare genetic variants: A review and some new results. Statistical Science 29(2), 302–321.
• Flutre and others [2013] Flutre, T., Wen, X., Pritchard, J. and Stephens, M. (2013). A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genetics 9(5), e1003486.
• H. and E. [2011] H., Buhm and E., Eleazar. (2011). Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. The American Journal of Human Genetics 88(5), 586 – 598.
• Heller and others [2017] Heller, R., Chatterjee, N., Krieger, A. and Shi, J. (2017). Post-selection inference following aggregate level hypothesis testing in large scale genomic data. bioRxiv.
• Hu and others [2013] Hu, Y-J., Berndt, S. I., Gustafsson, S., Ganna, A., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Hirschhorn, J., North, K. E., Ingelsson, E. and Lin, D-Y. (2013). Meta-analysis of gene-level associations for rare variants based on single-variant statistics. The American Journal of Human Genetics 93(2), 236 – 248.
• Kwak and Pan [2017] Kwak, I-Y. and Pan, W. (2017). Gene- and pathway-based association tests for multiple traits with GWAS summary statistics. Bioinformatics 33(1), 64.
• Lee and others [2013] Lee, S., Teslovich, T. M., Boehnke, M. and Lin, X. (2013). General framework for meta-analysis of rare variants in sequencing association studies. The American Journal of Human Genetics 93(1), 42 – 53.
• Lee and others [2012] Lee, S., Wu, M. C. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4), 762–775.
• Li and others [2014] Li, W.-Q., Pfeiffer, R.M., Hyland, P.L., Shi, J., Gu, F., Wang, Z., Bhattacharjee, S., Luo, J., Xiong, X., Yeager, M., Deng, X., Hu, N., Taylor, P.R., Albanes, D., Caporaso, N.E., Gapstur, S.M., Amundadottir, L., Chanock, S.J., Chatterjee, N., Landi, M.T., Tucker, M.A., Goldstein, A.M. and others. (2014). Genetic polymorphisms in the 9p21 region associated with risk of multiple cancers. Carcinogenesis 35(12), 2698–2705.
• Lin [1997] Lin, X. (1997). Variance component testing in generalised linear models with random effects. Biometrika 84(2), 309–326.
• Majumdar and others [2017] Majumdar, A., Haldar, T., Bhattacharya, S. and Witte, J. (2017). An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations. bioRxiv.
• McCullagh and Nelder [1989] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models (2nd ed). London: Chapman and Hall.
• Neale and others [2011] Neale, B.M., Rivas, M. A., Voight, B. F., Altshuler, D., Devlin, B., Orho-Melander, M., Kathiresan, S., Purcell, S. M., Roeder, K. and Daly, M. J. (2011). Testing for an unusual distribution of rare variants. PLoS Genetics 7(3), e1001322.
• O’Reilly and others [2012] O’Reilly, P. F., Hoggart, C. J., Pomyen, Y., Calboli, F. C. F., Elliott, P., Jarvelin, M.-Ri. and Coin, L. J. M. (2012). Multiphen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7(5), e34861+.
• Rajaraman and others [2012] Rajaraman, P., Melin, B. S., Wang, Z., McKean-Cowdin, R., Michaud, D. S., Wang, S. S., Bondy, M., Houlston, R., Jenkins, R. B., Wrensch, M. and others. (2012). Genome-wide association study of glioma and meta-analysis. Human Genetics 131(12), 1877–1888.
• Shi and Lee [2016] Shi, J. and Lee, S. (2016). A novel random effect model for GWAS meta-analysis and its application to trans-ethnic meta-analysis. Biometrics 72(3), 945–954.
• Tang and Ferreira [2012] Tang, C. S. and Ferreira, M. A. R. (2012). A gene-based test of association using canonical correlation analysis. Bioinformatics 28(6), 845.
• Tang and Lin [2014] Tang, ZZ. and Lin, DY. (2014). Meta-analysis of sequencing studies with heterogeneous genetic associations. Genetic Epidemiology 38(5), 389–401.
• Van der Sluis and others [2015] Van der Sluis, S., Dolan, C. V., Li, J., Song, Y., Sham, P., Posthuma, D. and Li, M.-X. (2015). Mgas: a powerful tool for multivariate gene-based genome-wide association analysis. Bioinformatics 31(7), 1007.
• van der Sluis and others [2013] van der Sluis, S., Posthuma, D. and Dolan, C.V. (2013). Tates: Efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genetics 9(1), e1003235+.
• Wu and others [2011] Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics 89(1), 82–93.
• Yang and others [2012] Yang, J., Ferreira, T., Morris, A. P., Medland, S.E., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden, P. A. F., Heath, A. C., Martin, N. G., Montgomery, G. W., Weedon, M.l N., Loos, R. J., Frayling, T. M., McCarthy, M. I., Hirschhorn, J. N., Goddard, M. E. and others. (2012). Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics 44(4), 369â75, S1â3.
• Yang and others [2016] Yang, J.J., Li, J., Williams, L. K. and Buu, A. (2016). An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function. BMC Bioinformatics 17(1).
• Zaykin and others [2002] Zaykin, D. V., Zhivotovsky, Lev A., Westfall, P. H. and Weir, B. S. (2002). Truncated product method for combining p-values. Genetic Epidemiology 22(2), 170–185.