Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data\thanksrefTITL

Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data\thanksrefTitl

[ [    [ [    [ [ Pennsylvania State University, Yale University and Harvard University Y. Zhang
Department of Statistics
Pennsylvania State University
422A Thomas
University Park, Pennsylvania 16802
J. Zhang
Department of Statistics
Yale University
24 Hillhouse Ave.
New Haven, Connecticut 06511
J. Liu
Department of Statistics
Harvard University
Science Center
1 Oxford St.
Cambridge, Massachusetts 02138
\smonth4 \syear2010\smonth12 \syear2010
\smonth4 \syear2010\smonth12 \syear2010
\smonth4 \syear2010\smonth12 \syear2010

Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case–control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case–control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.


10.1214/11-AOAS469 \volume5 \issue3 2011 \firstpage2052 \lastpage2077


Block-based Bayesian epistasis association mapping


TITLThis study makes use of data generated by the Wellcome Trust Case–Control Consortium. A full list of the investigators who contributed to the generation of the data is available from Funding for the project was provided by the Wellcome Trust under Award 076113. All the chromosomal positions are in NCBI build 35 coordinates.


A]\fnmsYu \snmZhang\correflabel=e1]\thanksrefaut1, B]\fnmsJing \snmZhanglabel=e2]\thanksrefaut2 and C]\fnmsJun S. \snmLiulabel=e3]\thanksrefaut2


aut1Supported by NIH Grant R01-HG004718. \thankstextaut2Supported in part by the NIH Grant R01-HG02518-02 and the NSF Grant DMS-07-06989.

Disease association study \kwdepistasis \kwdLD block \kwdBayesian methods.

1 Introduction

A recent genome-wide association (GWA) study of14,000 cases of seven human genetic diseases and 3,000 shared controls by the Welcome Trust Case Control Consortium [WTCCC (2007)] represents a thorough validation of the GWA approach. By testing hundreds of thousands of single nucleotide polymorphisms (SNPs) in the human population (Affymetrix 500k SNP), the study has identified many SNPs associated with seven complex diseases [WTCCC (2007)]. Each SNP consists of two alleles taking values 0 or 1, and there are three possible combinations of the two alleles: , , , disregarding the order. Each combination is called a genotype, representing wild homozygote, heterozygote, and mutant homozygote, respectively. In a case–control study, a SNP is said to be associated with the disease if the genotype (or allele) distribution at the SNP is different between cases and controls. In addition to testing individual SNPs, it has also been anticipated that epistatic interactions among SNPs, defined as multiple SNPs jointly associated with the disease, may be responsible for significantly elevating the risks of some human complex diseases. Due to computational and methodological limitations, however, efforts on detecting disease-related epistatic interactions among SNPs in the WTCCC study have been limited.

In the past few years, many approaches have been developed for case–control studies to detect epistasis associations. Most methods cannot be applied to GWA studies due to their computational limitations except for some recently developed methods, such as the stepwise logistic regression method [Marchini, Donnelly and Cardon (2005)] and the Bayesian epistasis association mapping (BEAM) algorithm [Zhang and Liu (2007)]. It has been demonstrated that BEAM is capable of detecting high-order interactions in GWA studies and is more powerful than other existing methods [Zhang and Liu (2007)]. A limitation of BEAM, however, is its model assumption that a Markov chain can capture the dependence structure of the SNPs in the data. It is well known that linkage disequilibrium (LD) between adjacent SNPs exhibits block-wise structure in the human genome [The International HapMap Consortium (2005), Reich et al. (2001)]. SNPs within blocks are highly correlated and the correlation is broken down by recombination events at block boundaries. A simple Markov model cannot capture this important block structure when analyzing dense SNPs.

Previous studies have shown that using haplotypes, defined as allele combinations over multiple nearby SNPs inherited from one of the parents, cannot only reduce the high computation cost in GWA studies, but also improve the detection power of association mapping [Kuno et al. (2004); Zöllner and Pritchard (2005); Johnson et al. (2001); Zhang et al. (2002a); de Bakker et al. (2005)]. In particular, Nielsen et al. (2004) showed that when moderate to high levels of LD exist, haplotype tests tend to be substantially more powerful. Kuno et al. (2004) demonstrated in a real disease study that single-SNP tests were not significant even at SNP loci close to the mutation site (APRT*J), whereas the haplotype block data yielded sufficient statistical significance.

Similar to haplotypes, we define diplotypes as genotype combinations over multiple nearby SNPs. Diplotypes are directly observed in GWA studies, whereas haplotypes have to be inferred using computationally expensive algorithms. Throughout the paper, we discuss our method and results on diplotype associations with the disease, although the method is directly applicable to haplotype data. Our focus on diplotype association is mainly due to the computational concern, where inferring unobserved haplotypes will be extremely time consuming. It is also possible that testing diplotype associations could be more powerful than testing haplotype associations, depending on the underlying disease model. On the other hand, if haplotype associations are of the interests, users can first infer haplotypes using an available haplotype inference algorithm, and then input the inferred haplotypes into our method. The degrees of freedom of our model will automatically accommodate the different inputs.

In this paper we extend the BEAM model to address the block structures in the human genome. We refer to SNP block structures as LD-blocks. By partitioning SNPs into LD-blocks, a naïve extension of BEAM is to treat each LD-block as a genetic marker, with diplotypes in the block being treated as alleles. This approach, however, may not be optimal for association mapping. First, criteria utilized in existing block-partitioning algorithms do not directly aim at optimizing the power of association mapping. Second, many regions in the human genome demonstrate vague structural patterns, of which a measure of uncertainty in block structures should be provided. Simulation studies have shown that LD-block structures can be affected by marker density [Wang et al. (2002); Pillips et al. (2003); Wall and Pritchard (2003)], population structure [Wang et al. (2002); Stumpf and Goldstein (2003); Zhang et al. (2003); Anderson and Slatkin (2004)], and gene conversion [Przeworski and Wall (2001)].

We propose a Bayesian model to simultaneously infer LD-blocks and select SNPs within blocks for disease association mapping. The model partitions the genome into discrete blocks, within which the diversity of diplotypes is limited. Block structures are iteratively updated such that disease associations are detected and summarized from a variety of likely block partitions. This approach takes into account the uncertainty in block structures and optimizes the detection power by searching for the best partitions around disease-associated SNPs. Our method detects combinations of SNPs within and between blocks for marginal and epistatic associations to the disease status. Using LD-blocks, our method also automatically filters out artificial associations created merely by LD with nearby authentically associated SNPs. We show that the new method, BEAM2, is more powerful than the original BEAM (renamed BEAM1 henceforth).

By applying BEAM2 to the type 1 diabetes (T1D) data from WTCCC (2007), we obtained all the previously identified single SNP associations in the WTCCC T1D data. We further observed some interesting two-way joint associations not detectable by single-SNP methods. The strongest T1D associations occur in the well-known Major Histocompatibility Complex (MHC) region, in which we observed long-distance joint association patterns over several millions of base pairs (Mb). Since this pattern may be partially caused by the extended LD from the MHC class II region, we further controlled the structure of MHC class II using a logistic regression model, and tested additional effects of SNPs over the extended MHC region as well as the MHC class I and class III regions. We observed strong associations in the MHC class I and class III regions, and found significant interaction associations between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes.

The article is organized as follows. In Section 2 we first introduce a LD-block model for the genotype distribution among multiple SNPs. We model the genotype distribution within a block of SNPs by multinomial distributions and assume that the joint distribution of all SNPs is the product of the distributions of individual blocks (i.e., assuming block independence). In Section 3 we extend the LD-block model to incorporate disease associated SNPs and epistasis and describe Monte Carlo Markov chain (MCMC) algorithms to make inference from the joint model for both LD-blocks and disease associations. In Section 4 we briefly review our previously developed Bayes factor-based test statistic, which is used to further evaluate the statistical significance of candidate epistasis detected by BEAM2, and discuss extensions of BEAM and BEAM2 for general classification problems. In Section 5 we demonstrate the superior performance of BEAM2 by simulation studies and real data applications. In Section 6 we report our results from applying BEAM2 to the T2D data from WTCCC (2007). We conclude the article with a short discussion in Section 7. More implementation details can be found in Supplemental Material [Zhang, Zhang and Liu (2011)].

2 A Bayesian model for LD-block inference

The data of interest consist of genotypes at a total of SNP markers (or covariates, each taking on 3 possible values) observed in case and control individuals. Let denote a matrix of case genotypes, and denote a matrix of control genotypes, where and denote vectors of genotypes observed at SNP across individuals.

2.1 Bayesian LD-block model

Here we introduce a Bayesian LD-block partition model without considering disease association. Hence, we treat cases ( and controls ( as coming from the same population and use the combined data () to describe the model. A diplotype for an LD-block of SNPs is defined as a particular genotype combination of all the SNPs in that block. We seek to partition the markers into consecutive blocks, so that the number of observed diplotypes within each block is small (strong correlation), and correlation between SNPs in different blocks is weak. Block partitions can be quite ambiguous in many genomic regions and can vary across samples in details. Diplotype block structures obtained from available software are often based on ad hoc criteria that neither result in a proper uncertainty measure nor optimize the association mapping power.

The block variable in our model consists of binary indicators corresponding to the SNPs in the data. An indicator is equal to 1 if the corresponding SNP is the start position of a block, and 0 otherwise. As a result, uniquely defines a partition of SNPs into consecutive blocks. For a diplotype block [ consisting of SNPs , we let ( denote the counts of a particular diplotype in cases and controls, respectively. There are 3 possible diplotypes in the block. We assume that the diplotype of each individual follows independently from a multinomial distribution with frequency parameters {, and { follows a Dirichlet prior distribution, Dir({, where { denotes a hyper-parameter (i.e., pseudo-counts). More precisely, letting { denote the combined counts of diplotype observed in cases and controls, we have

where the subscript ‘’ denotes the sum of values over all subscripts. We can integrate out by

Noting that the integrant on the right-hand side is proportional to the density function of Dir( up to its normalizing constant, we obtain the marginal probability of the combined data of block [ as


We set for diplotype , and by default, we let . Based on the likelihood equivalence principle [Heckerman, Geiger and Chickering (1995)], is chosen to be inversely proportional to the total number of possible diplotypes in a block, and, hence, the sum remains a constant over different block sizes. Note that formula (2.1) can also be used to model case data ( or control data ( only, which is simply done by letting , or , respectively, for all . Further assuming independence between blocks (which is not entirely true, but serves as a good approximation), the probability function of genotypes of all blocks can be expressed as the product of individual block probabilities defined in formula (2.1).

2.2 Bayesian inference of SNP association based on blocks

A saturated test of disease association for a diplotype block of z consecutive SNPs involves free parameters. When is moderately large (, the power of the test becomes exceedingly low. We propose a Bayesian model of disease associations using only a subset of markers in the block. Here, we only discuss joint associations for SNPs within a diplotype block, and we will address epistatic interactions in the next section. It therefore suffices to describe our model for only one block.

Let denote the set of SNPs in the block. We assume that only a subset of SNPs of size (often is 0 or 1) are truly associated with the disease, and the SNPs in are not associated with the disease given {. Thus, the diplotypes of SNPs in { are distributed differently and hence are modeled by two different distributions for cases and controls, respectively. Conditional on the diplotypes of SNPs in {, the diplotypes of SNPs in {, however, follow a common distribution between cases and controls. The joint probability of the block data can therefore be expressed as


Here, and are modeled by two independent multinomial-Dirichlet distributions specified in formula (2.1), treating all SNPs in { jointly as a block. To model , we combine the diplotypes of SNPs in in cases and controls together. These diplotypes are not directly associated with the disease given {, and thus have the same conditional distributions between cases and controls. Conditional on each possible diplotype of SNPs in {, we model the conditional diplotype distribution of SNPs in { again by a multinomial-Dirichlet distribution. It is straightforward to derive the expression

where and are specified in formula (2.1), treating and as blocks, respectively. We note, however, that the marker set is unknown a priori, and needs to be inferred jointly with other parameters in our Bayesian model.

3 Joint inference of diplotype blocks and disease association

3.1 The joint model

To further incorporate epistatic interactions in formula (2), and to identify which SNPs are associated with the disease (response), we partition all SNPs (not blocks) into three groups as in BEAM [Zhang and Liu (2007)]. We introduce a latent -dimensional indicator variable ( to represent the group memberships of the markers. For each marker , denotes three possible group memberships. SNPs belonging to group-2 are assumed to be jointly associated with the disease, that is, epistasis, which are modeled by two joint multinomial distributions on the diplotypes over all group-2 SNPs—one for cases and one controls. SNPs belonging to group-1 are assumed to be marginally associated with the disease if they belong to different blocks, and are modeled by mutually independent multinomial distributions conditional on the case–control status and the block structure. If multiple group-1 SNPs fall into one block, we model their diplotypes jointly, that is, group-1 SNPs within blocks become dependent of each other. If there are both group-1 and group-2 SNPs within one block, we model the diplotypes of group-1 SNPs within the block conditional on the diplotypes of the group-2 SNPs within the block. SNPs belonging to group-0 are the remaining SNPs unrelated to the disease status. We again model the distribution of group-0 SNPs within a block by multinomial distributions, with common parameters for cases and controls. We further assume conditional independence of group-0 SNPs between blocks, conditional on the group-1 and group-2 SNPs. More precisely, within each block, we let denote the set of group-2 SNPs, let { denote the union of the group-1 and group-2 SNPs, and let { denote all SNPs. We revise formula (2) to take the form of a conditional probability function:


Thus, group-0 and group-1 SNPs are no longer mutually independent as in BEAM1, but are related to each other via the block structure. With epistasis considered, the mutual independence between blocks in model (2) becomes conditional independence given group-2 SNPs. For notational simplicity, we omit variable in (3.1), but both and are determined by .

Given a particular block partition and SNP group memberships , we express the joint probability function of the entire case–control data as

where and denote the case and control genotypes of group-2 SNPs, respectively, and the product term is defined in formula (3.1).

3.2 Choice of prior distributions

We set the prior distribution of the block variable as the product of independent Bernoulli probabilities , where denotes the sum of indicators in . According to the block distributions estimated in European and Asian populations by Gabriel et al. (2002), we assume that there are 50,000 blocks in the human genome a priori, and thus we set . Here,  denotes the length of the region spanned by the SNPs, and is the length of the human genome. A smaller value of will help the method identify larger blocks, and a larger will tend to identify smaller blocks. As the sample size (number of individuals) increases, however, the impact of the prior choices diminishes quickly. To avoid overfitting the blocks, we further impose a restriction that the maximum number of observed distinct diplotypes in a block must be smaller than (.

We set the prior distribution of the SNP membership variable as a product of independent multinomial distributions, , where denote the prior probability of each SNP belonging to group 0, 1, and 2, respectively. By default, we set , and . That is, we assume there are 10 SNPs associated with the disease a priori, where 5 are marginally associated with the disease, and 5 are associated through epistasis. Our choice of the prior reflects that there are just a few SNPs truly associated with the disease in a GWA study (where many other significant SNPs are due to LD effects). Increasing this prior (and also increasing the significance level) in the BEAM2 program may help identify additional SNPs of moderate to low effects. To avoid overfitting in interaction mapping, we further set an upper bound to the order of interactions by . For example, when the sample size is 1,000, our method can detect up to 4-way interactions. Overall, changing the values of  may affect the posterior distribution of SNPs in groups 1 and 2, but the effects will diminish as the sample size increases.

Finally, the joint model of the observed genotype data in cases () and controls (), the block variable (), and the SNP membership (), is written as


where the conditional distribution of () given () is specified in formula (3.1).

3.3 MCMC updates

The parameters of interest in our model are the block partition and SNP membership . We develop Metropolis–Hastings (MH) algorithms [Liu (2001)] to update , and, simultaneously, we develop a mix of a Gibbs sampler and Metropolis–Hastings algorithm to update . The posterior distribution of ( is then output for further analysis.

To explore all possible block partitions, we propose the following MH-moves: given a current block configuration , we randomly select a block and {longlist}[(1)]

divide the block into two new blocks at a random position;

merge two adjacent blocks into one block; and

randomly shift a block boundary to either left or right by SNPs, where the shifting amount is constrained by other block boundaries. The proposed move produces a new block partition , and the move is accepted with probability , where denotes the probability of updating from to , and is calculated from the full model (5). In our implementation, we chose the three types of MH moves with probabilities 0.1, 0.1, and 0.8, respectively, and we require a block to contain at least one SNP.

To update the SNP membership variable , we updated the membership  of SNP by calculating the posterior distribution of given all other model parameters and the data. We also propose a MH-move to switch the group memberships of two SNPs and accept the move based on MH-ratios. Per MCMC iteration, we first run the Gibbs sampler to update the memberships of all SNPs once, and then we run the MH-sampler to switch each SNP in group-1 and group-2 once with SNPs in other groups.

4 Follow-up tests and generalization of the method

4.1 A test of significance based on the Bayes factor

Although inference can be directly made from the posterior probabilities output by BEAM2, the users may want to further evaluate the statistical significance of the results in a frequentist way. In BEAM [Zhang and Liu (2007)], we developed a novel Bayes factor, called B-stat, to evaluate whether a SNP or a set of SNPs are significantly associated with the disease, where the SNP set is selected by BEAM2 in our case.

For a set of SNPs to be tested, the null hypothesis is that all SNPs are not associated with the disease. Here, represents single-SNP, 2-way, and 3-way interactions, etc. B-stat for the set of SNPs is defined as


Here, denotes the null genotype distribution (i.e., no disease association) at the SNPs in cases and controls, and denotes the alternative genotype distribution (disease association). Under the null model, we assume that the genotypes in both cases and controls follow the same distribution, whereas under the alternative model, they follow different distributions. We choose both and as an equal mixture of two distributions: one that assumes independence among the SNPs in controls (and also in cases under the null model), which yields the product terms in formula (6), and the other that assumes a saturated joint distribution of all the SNPs. Note that the form of each term in formula (6) is defined in formula (2.1).

An interesting feature of B-stat is that it uses a mixture model to accommodate the possibility that the SNPs may or may not be in linkage equilibrium (independence). As a result, using B-stat will be more powerful than using a standard likelihood ratio test or a chi-square test of associations when the SNPs under testing are in LD in controls.

We have previously shown that, under the null hypothesis of no disease association, B-stat follows asymptotically a shifted chi-square distribution with degrees of freedom [Zhang and Liu (2007)]. The shifting parameter can be computed explicitly, which is determined by the sample size (, the interaction size , and the Dirichlet hyper-parameter . Briefly speaking, the shifting parameter is proportional to , and, thus, the larger the number of individuals collected, or the more SNPs involved in an interaction, the smaller the shifting parameter will be. In addition, if large hyper-parameters for the diplotype frequency parameters are used, the shifting parameter will be large too. Note that we want the B-stat to be small (e.g., 0) when the SNPs are not associated with the disease, and, hence, the users should use small values for , such as the default values we used in our model.

4.2 Generalization to classification problems with discrete covariates

Let be the binary response vector, and let be the covariates matrix, with each covariate taking on discrete (ordinal or categorical) values. The standard case–control genetic study setting can be viewed as using response variables (i.e., case–control status) to fish out relevant predictors (i.e., SNPs). The epistasis mapping methods BEAM attempt to find those ’s that interactively affect . Both BEAM and the block-based method BEAM2 can be easily extended to infer a classification model.

The idea of both BEAM [Zhang and Liu (2007)] and BEAM2 is to partition the covariates in into three nonoverlapping groups, such that one group contains covariates unrelated with , and the other groups contain covariates either independently or jointly related with . The partition of the covariates is an unobserved latent structure. Given a particular group partition of the covariates , we can compute as in Zhang and Liu (2007), which is analogous to that in a naïve Bayes model. BEAM2 further segments the covariates into “blocks” of highly correlated variables, and treats blocks as mutually independent. This is achieved by introducing a block indicator variable , which is updated iteratively together with the variable selection indicator .

To predict the classification of a new observation with covariates based on the training data , we compute and , respectively, using the BEAM (or BEAM2) algorithm, and obtain the odds ratio

The prior can be estimated from the prior knowledge of class distribution, such as the prevalence of a particular disease in the population, which then leads to the posterior predictive probability for . A computationally more attractive way to do the computation is to output the latent variable partition and block partition structures ( and in our case) from their joint posterior distribution inferred by the MCMC procedure of BEAM, and then average the conditional odds over all the sampled  and :

We tested this latter approach in a preliminary study and found the results quite satisfactory.

The effect of BEAM2 is somewhat analogous to that of elastic net [Zou and Hastie (2005)] and group lasso [Yuan and Lin (2006)]. All methods attempt to address the phenomena that groups of covariates tend to demonstrate associations with the response together, and within groups the covariates are highly correlated. Different from elastic net and group lasso, BEAM2 infers the covariate groups and also the informative covariates within groups jointly in a coherent probability framework. As a consequence, BEAM2 allows sparse variable selection at both the group level and the individual variable level within groups, whereas elastic net and group lasso do sparse selection only at the group level. In a recent technical report, Friedman, Hastie and Tibshirani (2010) attempt to achieve a similar sparse selection effect as BEAM2 (sparse selection at both group and individual levels) by introducing an additional penalty term.

Other important distinctions between BEAM2 (or BEAM) and those lasso-based methods are the following: (a) the use of the naïve Bayes framework to model given to greatly alleviate the overfitting problem; (b) the ability to incorporate interaction terms without incurring a huge computational burden (with MCMC iterations); and (c) the adoption of the Bayesian variable selection principle, which is equivalent to using a more desirable penalty. The cost of these advantages is that both BEAM and BEAM2 have to compute via MCMC without a guarantee of always finding the optimal solution. Empirically, however, the computational speed of BEAM and BEAM2 is no worse than that required by lasso-type algorithms when the number of covariates is large.

5 Simulation studies and algorithm comparisons

5.1 Block partition of HLA data

We first used the HLA region on human chromosome 6 to evaluate the block partitions inferred by our method. The HLA region is one of the few regions in the human genome in which recombination hotspots have been experimentally verified [Jeffreys, Ritchie and Neumann (2000); Jeffreys, Kauppi and Neumann (2001)]. We downloaded the genotype data of 50 unrelated UK Caucasian semen donors from Jeffreys AJ’s website. The data covers a 216 kb region with 296 genotyped biallelic markers spanning from the upstream of gene HLA-DNA to gene TAP2 in the MHC Class II region. It is known that this region contains several prominent recombination hotspots [Jeffreys, Ritchie and Neumann (2000); Jeffreys, Kauppi and Neumann (2001)]. We therefore examined the relationship between the experimentally verified recombination hotspots and the SNP-block boundaries inferred by BEAM2. We used both haplotypes and genotypes to evaluate our method, where the HLA haplotypes were first inferred by CHB [Zhang, Niu and Liu (2006)] from the genotype data. We also simulated 1,000 individuals from the inferred HLA haplotypes using HAPGEN [Marchini et al. (2007)] to evaluate the performance of BEAM2 with a larger sample size. As a comparison, we applied HapBlock [Zhang et al. (2002b)], a dynamic programming-based algorithm of block partitioning, to the same sets of data.

With 100 haplotypes inferred from the 50 individuals, BEAM2 produced accurate block partitions that correspond well to the visual blocks displayed by Haploview [Barrett et al. (2005)]. The block boundaries also coincide with the known recombination hotspots within the HLA region (Figure 1). It is further observed that, for the haplotype data, the blocks inferred by BEAM2 are very similar to those obtained by HapBlock. Unlike our model-based method, HapBlock requires the user to specify ad hoc block partition rules, which can result in undesirable partitions. We used three different rules to define blocks: (1) common haplotypes, defined as a haplotype 5% in the sample, cover 80% of samples in a block; (2) at least 80% SNP pairs with in a block; or (3) four-gamete test on common haplotypes (5%) in a block. The blocks partitioned by HapBlock using each rule are also shown in Figure 1.

Figure 1: Block partition results on the HLA data. The top panel shows the pairwise SNP LD calculated from 100 HLA haplotypes by Haploview. The second, third, and forth panels show the block partition results by BEAM2 (BM) and HapBlock (HB1, HB2, HB3) on three HLA data sets, respectively. HB1 uses rule 1: common haplotypes; HB2 uses rule 2: pairwise ; and HB3 uses rule 3: four-gamete test. Experimentally verified recombination rates are shown as dashed red lines (the unit cM/Mb in the natural logarithm scale is shown on the right).

Using the genotype data of the 50 individuals, we obtained very different results between BEAM2 and HapBlock, and between the three different rules of block partitions. Except for the first rule of HapBlock, all other methods produced a large number of small blocks. Small blocks generated by BEAM2 are due to the small sample size of 50 individuals, based on which the correlation between SNPs is hard to detect using our likelihood model. The posterior probabilities of block boundaries output by BEAM2, however, can be used as a measure of uncertainty in block partitions. In comparison, HapBlock and many other block partitioning methods only provide a single partition solution without measuring block uncertainty.

Using the simulated genotypes of 1,000 individuals, we again observed very different results shown in Figure 1. BEAM2 produced the cleanest block partitions that corresponded well to the visual block boundaries and to the known recombination hotspots. The rule and the four-gamete test rule via HapBlock again failed to produce reasonable partitions, of which most blocks were singletons.

Data BEAM2 BEAM2-10p HapBlock-1 HapBlock-2 HapBlock-3
100 haplotypes 32 39 20 030 034
50 genotypes 81 87 46 147 106
1,000 genotypes 35 36 56 184 268

[]“BEAM2-10p” denotes BEAM2 applying a 10 times larger prior probability than the default prior on the block boundary variable. HapBlock-1, 2, 3 denotes HapBlock applying three different block partition rules.

Table 1: Number of blocks inferred in HLA data sets

We further show in Table 1 the number of blocks inferred by each method in the three data sets. We observed that BEAM2 performed well (by which we roughly mean that the number of estimated blocks is small, as is true in the HLA region) in both the haplotype data and the 1,000 individuals’ genotype data. Because modeling genotypes (diplotypes) requires a much larger set of parameters than modeling haplotypes, BEAM2 is expected to perform worse in the 50 individuals’ genotype data. As the number of individuals increased to 1,000, however, our model-based approach produced very similar partitions as that obtained in the haplotype data. In comparison, HapBlock only preformed reasonably well in the haplotype data, but produced many small blocks and singletons in the other two data sets for all three block partition rules applied. HapBlock performed the worst in the 1,000 individuals’ genotype data, indicating that the ad hoc rules applied by HapBlock do not produce consistent block partitions as sample size increases. We further show in Table 1 additional results by BEAM2 using a 10 times larger prior on the block boundary variable, that is, we expect 10 times more blocks a priori. We observed that the estimated posterior number of blocks did not increase much, particularly in the 1,000 individuals’ genotype data, indicating that BEAM2 is insensitive to the prior choice of block boundary variables. We also ran multiple MCMC chains to ensure proper convergence.

5.2 Simulation study using HapMap data

To mimic real genetic data observed in human populations, we first randomly select a region in the human genome that contains 1,000 Illumina HapMap 300k tagSNPs. The region also contains about the same number of additional SNPs from HapMap PhaseII tagged by these tagSNPs, which we refer to as nontagging SNPs. Two nontagging SNPs in the region are randomly selected as disease SNPs. Given a disease model, we set the marginal effect size (log odds ratio minus 1) per disease SNP at 0.5 and choose a disease minor allele frequency (MAF) per locus from . Given a marginal effect size and a choice of MAF, we then calculate the diplotype frequencies over the disease SNPs in cases and controls, respectively [this is similarly done as presented in Zhang and Liu (2007)]. According to the case–control diplotype frequencies over the disease SNPs, we randomly sample 1,000 cases and 1,000 controls from a pool of individuals without replacement. The pool consists of 10,000 control individuals generated by HAPGEN [Marchini et al. (2007)] using HapMap European sample (parents only) at odds , that is, no disease association. Our simulation procedure is more economical than a direct approach that generates one individual at a time and determines its disease status conditional on the disease genotypes and penetrance, because the direct approach may generate many more controls before obtaining enough cases. Finally, we remove all nontagging SNPs from the data including the two disease SNPs (which are typically unobserved in a GWA study), and obtain a case–control data set containing 1,000 Illumina HapMap tagSNPs.



Table 2: Disease models used in simulation study
Model 1
Model 2
Model 3

[]Each table cell lists the relative risk of the corresponding genotype combination. Genotypes with risks equal to 1 have no effects to the disease. The parameter is computed according to the specified marginal effects (0.5 in our simulation) and disease MAFs .

To evaluate the association mapping performance of our method, we simulated case–control data sets based on the HapMap sample under three disease models shown in Table 2. Each disease model assumes 2 loci in the genome contributing to the disease risk. While the first model assumes no interactions, the other two models assume different types of interactions. Using the simulated data sets, we compared the performance of BEAM2 to BEAM1. We also implemented a third method that maps associations and interactions based on predetermined block structures. This third method serves as an intermediate method between BEAM1, which is not block-based, and BEAM2, which infers block structures and maps associations simultaneously. We used three different levels of parameters (from stringent to liberal) to define blocks using existing software, and we treated the diplotypes within each inferred block as genetic alleles. The third method using three different block definitions are hereafter referred to as Block1, Block2, and Block3, respectively [more details of the third method can be found in the Supplementary Material, Zhang, Zhang and Liu (2011)]. To compare the performance of all methods, we ranked SNPs according to the association posterior probabilities output by each method estimated for each data set. We then calculated how often a method ranked the disease related SNPs among the top SNPs. A SNP is regarded as being correctly identified as disease related if it is within 5 SNPs on either side of a true disease locus.

As shown in Figure 2, under all parameter settings, BEAM2 performed the best among all tested methods, where Block1, Block2, Block3, and BEAM1 all performed similarly. When disease allele frequency was low (), the power curves of all methods looked similar, but a closer inspection of the top 5 ranked SNPs showed that BEAM1 only had 50% chance to capture disease related SNPs relative to BEAM2. When disease alleles were common in the population (), the advantage of the BEAM2 model becomes obvious. Comparing the power curves for Model 2 and Model 3, we observed that the power of BEAM2 increased much faster than that of BEAM1 among the top 2 or 3 SNPs. We did not observe this behavior in Model 1, which has no interactions. It thus indicates that using SNP-blocks can increase the power of mapping both single SNP and multi-SNP interaction associations. All methods compared here are Bayesian methods that output posterior probabilities of disease associations. It therefore indicates that our treatment of LD in BEAM2 is more appropriate than using either predetermined blocks or a Markov chain model (as in BEAM1).

Figure 2: Power comparison for BEAM1, BEAM2, Block1, Block2, and Block3 using simulated data from the HapMap data. Under each simulation setting and from 50 data sets, power (y-axis) is calculated as the proportion of disease-associated SNPs (within 5 SNPs of true disease loci) among top SNPs (x-axis), ranked by the posterior probability of association. Each data set contains 1,000 candidate SNPs in 1,000 cases and 1,000 controls. The disease allele frequency is 0.05, 0.10, 0.20, and 0.50, respectively. The marginal effect size of each disease SNP (unobserved) is 0.5. Pink: BEAM1; Red: BEAM2; Blue: Block1; Green: Block2; Black: Block3.

To further declare statistical significance, existing significance estimation methods adjusting for multiple comparisons should be used, such as the Bonferroni correction applied to B-stat introduced in BEAM1 [Zhang and Liu (2007)]. We compared BEAM2 with single SNP chi-square tests using the above disease models [Supplementary Table S1, Zhang, Zhang and Liu (2011)], and observed that BEAM2 performed better than the chi-square test for interaction Model 2 and Model 3, but performed the same for the noninteractive Model 1.

Figure 3: Performance of MCMC. (a) Autocorrelation plot obtained by running BEAM2 on a simulated data set of 1,000 cases, 1,000 controls, and 1,000 SNPs under disease Model 2. A total of 1,000 iterations after burn-in are used to calculate the plot. (b) Posterior probabilities (in logarithm scale) of disease association (marginal and epistatic) per SNP compared across 4 independent runs of BEAM2.

We also checked the performance of our MCMC sampling algorithm. As shown in Figure 3, using a simulated data set from disease Model 2, the lag of autocorrelation of our Markov chain is short, indicating fast convergence of the Markov chain. We further compared the posterior distribution of SNP associations from 4 independent runs of BEAM2, and we observed close agreement between runs. In practice, the Markov chain could converge to local modes, particularly if the data contain many SNPs with complicated block structures. If block structures are of primary interest, we suggest running BEAM2 in several runs to check if the block partition results obtained in different runs are consistent. More advanced MCMC algorithms, such as parallel tempering [Liu (2001)], could further alleviate the local mode problems in MCMC sampling.

(a) (b)
Figure 4: Estimated number of disease associated SNPs (-axis) plotted against (a) the number of significant SNPs (0.05 family-wise) and (b) the maximum single SNP chi-square statistics (-axis) within a 100 kb neighborhood per disease locus. In (a), the number of loci in each box plot is further shown on the top. The plots are computed from 200 simulated data sets of disease Model 1.

Using BEAM2, we can estimate the number of disease associated SNPs around a disease locus by the sum of posterior probabilities of associations over all SNPs within a neighborhood of a candidate locus. Given our block-based association model, the number of associated SNPs does not include SNPs whose disease association is merely created by LD, and, hence, our estimates are more appropriate than a naïve count of significant SNPs within the neighborhood. As shown in Figure 4, around a 100-kb neighborhood of every simulated disease locus in disease Model 1, the estimated number of disease associated SNPs by BEAM2 is around 1 when the association signal is sufficiently strong, even if there are many significant SNPs in the neighborhood. The extra significant SNPs created by LD make the localization of disease locus difficult. This result highlights the importance of BEAM2 that performs automatic variable selection within blocks. Rather than reporting diluted small posterior probability of association over many neighboring SNPs in LD, BEAM2 was able to select the strongest contributing SNP within blocks (with large posterior probabilities of association) in our simulation study.

Figure 5: SNP-wise posterior probabilities (-axis) of T1D associations in 22 autosomal chromosomes (-axis, chromosomes are separated by grey dashed lines). Previously reported T1D associated genes are highlighted in green, and candidate T1D associated genes, as defined at T1Dbase, are highlighted in red.

6 Application to WTCCC type 1 diabetes data

We applied BEAM2 to analyze the T1D data set from the WTCCC project [WTCCC (2007)]. The data set contains 2,000 T1D patients, 1,504 controls from 1958 Birth Cohort (58C), and 1,500 additional controls from the National Blood Service (NBS). Given our limited computation resources (computation time, which would require several days to analyze half a million SNPs in this data set; and memory usage, which would require 4 Gb for half million SNPs), we applied BEAM2 to the top 10% SNPs ranked by marginal associations with T1D on all autosomes. We further filtered out SNPs with bad clustering, SNPs violating Hardy–Weinberg equilibrium in controls at 10 significance, and “almost nonpolymorphic” SNPs of which 95% samples have the same genotypes. The final data set contained 42,470 SNPs in 5,004 individuals. We ran BEAM2 with 5 independent MCMC chains. Figure 5 shows the averaged posterior probabilities of T1D associations. Note that selecting top 10% SNPs does not imply sparse and less correlated SNPs, because SNPs in LD tend to be in or out of the top 10% list together. Two alternative ways to reduce the number of SNPs are to run BEAM2 on each individual chromosome for chromosome-wise epistasis, or on previously detected disease associated regions.

6.1 Result summary and highlights

We first compared our results with the SNPs reported in the original WTCCC paper (2007). As expected, we found that the SNPs reported by BEAM2 and by the original WTCCC analysis are highly consistent. All strongly associated SNPs reported in WTCCC are significant in our analysis, and all strongly and moderately associated SNPs reported in WTCCC have posterior probability0.1 by BEAM2 [Supplementary Table S2, Zhang, Zhang and Liu (2011)]. We further compared our results with known T1D associations obtained from T1Dbase ( Among the 55 SNPs (or cluster of SNPs) output by BEAM2 with posterior probabilities greater than 0.1, 17 (31%) overlapped with known T1D associated regions, including some well-known genes such as PTPN22 (1p13), CTLA4 (2q33), MHC (6p21), and IL2RA (10p15).

SNP Position -value T1Dbase Gene
rs6679677\tabnotereftab31 0 Yes PTPN22
rs9405484 1.33e8 No FOXC1
MHC\tabnotereftab31 0 Yes MHC
rs6592988 2.87e7 Yes COBL
rs11984645 7.06e11 No MRPL15
rs11782342 5.70e12 No KCNB2
rs11052552 2.61e7 Yes CLEC2D
rs11171739\tabnotereftab31 2.35e11 Yes ERBB3
rs17696736\tabnotereftab31 0 Yes CCDC63, NAP1
rs12924729\tabnotereftab31 1.01e7 Yes CLEC16A

[]SNPs showing strong associations (e7) with T1D by single SNP test. -value: nominal -value of associations. T1Dbase: whether the locus is documented in T1Dbase. Gene: nearest gene. \tabnotetexttab31Additional SNPs in its neighborhood also show strong marginal associations.

Table 3: Strong single SNP associations with T1D

In addition to the previously reported T1D genes, BEAM2 reported some novel T1D associated loci. A list of likely T1D associations detected by BEAM2, for both single SNPs (if e7) and two-way joint associations (if e10), is shown in Tables 3 and 4, respectively. For example, we detected 7 loci, among which two SNPs in short distance form strong joint associations with T1D (e10). These loci are not identifiable using single SNP tests, but captured by BEAM2 as multi-SNP associations. We also found some likely long-distance and cross-chromosomal interaction associations with T1D (e11). One example is the joint association between SNP rs3132676 in the classic MHC region on chromosome 6 and SNP rs9376523, which is 111 Mb away on the same chromosome. This SNP pair is likely interacting because their genotypes are strongly correlated in cases (nominal -value 8e6 by test of independence), but not in controls (nominal -value 0.91). Although most two-way associations did not pass the Bonferroni adjusted significance level in the genome scale, the short-distance two-way associations are significant if only considering local joint associations.

SNP1 Pos1 Pval1 Gene SNP2 Pos2 Pval2 Gene JointP
rs7525703 2e2 PRKAB2 rs2077749 3e6 PRKAB2 2e12
rs6809441 4e2 ULK4 N/A 7e2 ULK4 5e11
rs330483 4e6 ADAM29 rs330504 1e1 ADAM29 2e11
rs6906469 1e3 DFCC1 rs659964\tabnotereftab41 2e6 ACAD10 4e11
rs3132676\tabnotereftab41 1e5 TRIM40 rs9376523 6e4 TXLNB 3e11
rs9296661 4e4 PKHD1 rs1265566\tabnotereftab41 1e6 CUTL2 5e11
rs13340508 9e3 CCL24 rs17361077 4e5 CCL24 6e12
rs4838140 2e5 NEK6 rs7860360 6e4 SCAI 4e10
rs11104868 1e4 KITLG rs7961663\tabnotereftab41 3e6 CUTL2 1e11
rs1958305 5e5 DHRS2 rs12100601 2e2 DHRS2 0
rs7262414 2e5 PTPRT rs2867064 7e2 PTPRT 0

[]Pairs of SNPs showing strong joint T1D associations (e10) by B-stat. If multiple SNP pairs are located around the same loci, only one pair is shown. Pval1, Pval2, JointP represent nominal -values of SNP1, SNP2, and their joint associations, respectively. \tabnotetexttab41SNP lies in known regions in T1Dbase.

Table 4: Strong two-SNP associations with T1D

We further examined possible confounding effects of population structures in the T1D data using a logistic regression model. The regional information of WTCCC individuals is included as dummy covariates. We observed that the test statistics of the detected SNP associations remained almost unchanged before and after the adjustment of population origins. We further randomly selected 10,000 SNPs genome-wide and compared the distribution of their association statistics with a chi-square distribution. The two distributions agreed well [see Supplementary Figure S1, Zhang, Zhang and Liu (2011)]. We therefore believe that population structure does not incur false positive associations in the WTCCC T1D data.

Figure 6: Frequency (y-axis) of block boundaries between adjacent SNP pairs of certain genetic distance (x-axis), calculated from the T1D data. The genetic distance is obtained from the HapMap CEU sample.

We finally checked the block partition results of BEAM2 on the T1D data. Given the large number of SNPs, we cannot visually inspect the blocks as we did for the HLA data. Alternatively, we computed the genetic distance between adjacent SNP pairs in the T1D data, using a genetic map constructed from the HapMap CEU sample. Then, we checked how frequent a block boundary is inferred between SNP pairs in certain genetic distance. Intuitively, the more distant two SNPs are located, the more likely a block boundary occurs. As shown in Figure 6, our method inferred almost 100% block boundaries between SNP pairs with genetic distance 1 cM, and the frequency of block boundary decreases as the SNP pairs get closer. The block partitions between the 5 runs of BEAM2 are consistent, with an average correlation coefficient of 0.96.

6.2 Joint association patterns in the MHC region

T1D is an autoimmune disease and genes in the MHC region play an important role in the immune system and autoimmunity [Nejentsev et al. (2007); Steenkiste et al. (2007)]. BEAM2 found a large number of SNPs within the MHC region showing extremely strong association signals with T1D. Several multi-locus joint associations within the MHC region are also detected. We therefore examined more closely a 10-Mb MHC region, including the extended MHC region (25–32 Mb) and the classic MHC region (32–35 Mb).

Within this 10 Mb region, we observed that the SNP pairs associated with T1D are more often strongly correlated in cases than in controls [see Supplementary Figure S2, Zhang, Zhang and Liu (2011)]. The joint associations spanned from the classic MHC region to the extended MHC region over a distance as long as 6.5 Mb. It has been previously reported that haplotype blocks containing the most susceptible alleles HLA-DRB1*03 and HLA-DRB1*04 within the HLA-DR-DQ region may extend as long as 2 Mb [Nejentsev et al. (2007)] into the extended MHC region. It is thus arguable that the joint associations observed between the MHC class II region and the extended MHC region are due to extensive LD. We used a more traditional approach, logistic regression, to test two-way interactions among SNPs within MHC conditioning on the HLA-DR-DQ haplotypes. We first used CHB [Zhang, Niu and Liu (2006)] to infer haplotypes over HLA-DR-DQ genes (32.6–32.8 Mb). After collapsing rare haplotypes with frequencies lower than 0.1% into one group and obtaining 91 groups of haplotypes, we regressed the T1D status on the 91 haplotype groups using 90 dummy variables, which resulted in 78 significant haplotype groups over the HLA-DR-DQ genes explaining most of the MHC class II associations with T1D (the 13 insignificant haplotype groups removed from the model only changed the model deviance by 9.7).

Figure 7: Log10 -values (-axis) of joint association test between SNPs in the nonclass II MHC region (-axis) and SNPs in the MHC class II region (32.5–32.8 Mb), with main effects of the MHC class II SNPs subtracted and conditioned on 78 T1D associated haplotypes in HLA-DR-DQ. Each SNP has multiple -values, shown as grey dots, corresponding to interactions with different MHC class II SNPs. The black line for the SNP indicates the -value of the nonclass II SNP’s main effect. A cutoff of 10 for the -values, shown as the dashed line, corresponds to roughly one false positive among all tests.

Conditioning on the 78 haplotype groups, we tested both main and interaction effects between pairs of SNPs, one in the nonclass II region (class I, class III, and the extended MHC region) and one in the class II region. In particular, we regressed the T1D status on every pair of SNPs and the 78 haplotype groups. The association statistic is the change of deviance before and after including the two SNPs in the model. We further subtracted the main effect of the class II SNPs. As shown in Figure 7, we observed several peaks of -values demonstrating significant main and interaction effects within the nonclass II MHC region. For example, we observed significant main effects of gene HIST1H2BD at 26.3 Mb (peak of black lines), interaction effects of gene PRSS16 at 27.3 Mb and ZNF184 at 27.5 Mb (peaks of grey dots), strong marginal effects of region 30–31.4 Mb including class I genes HLA-A and HLA-B (peaks of black lines), and strong interaction effects of gene BAT1 at 31.6 Mb (peaks of grey dots). The associations of PRSS16, ZNF184, and BAT1 have been previously reported [Nejentsev et al. (2007); Viken et al. (2009)], and our analysis further suggested that these genes are associated with T1D mainly through interactive effects with MHC class II genes, controlling the MHC class II haplotypes.

7 Discussion

In this article we proposed a model-based Bayesian method for simultaneous LD-block partitioning and multi-locus epistasis association mapping. Different from many block-based methods, we combined block partitioning and association mapping into a unified Bayesian model, where both block structures and SNP associations are iteratively learned through MCMC sampling. For block partitioning, our simulation study and real data analysis showed that BEAM2 produces accurate and consistent partitions of SNPs that compared favorably with other genetic knowledge than some existing methods. The posterior probabilities of block boundaries output by BEAM2 not only report the most likely block partitions but also measure the uncertainty of blocks. Compared to some existing methods using ad hoc partition rules, BEAM2 has two main advantages: (1) it provides soft partitions rather than hard partitions, that is, it provides a posterior distribution of block partitions given the data, rather than a single block partition result; soft partitions are necessary in regions with no dominant recombination hotspots and block structures; and (2) it scales up well to large sample sizes and produces consistent results. In particular, we showed that our model-based method produces consistent block partitions as the number of individuals increases, whereas the other methods we tested produced drastically different results when more individuals from the same population are included. For association mapping, BEAM2 is more powerful than both the original BEAM algorithm, which accounts for LD using a Markov chain, and the methods using pre-estimated blocks. BEAM2 tests disease associations over uncertain block structures, where most SNPs around disease loci are filtered out, as their associations are created by LD with nearby disease SNPs.

The output of BEAM2 is a list of SNP-wise posterior probabilities of marginal and interaction associations. From a frequentist point of view, the identified SNPs can be further tested for genome-wide statistical significance. We previously introduced a Bayes factor-based statistics, B-stat [Zhang and Liu (2007)], for testing the significance of SNP associations. B-stat performs similarly to a 2-df chi-square test for single SNPs, but is more powerful for testing interactions of multiple SNPs. The same B-stat can be used to test the statistical significance of SNPs selected by the BEAM2 model, as we demonstrated in this paper.

When applied to the WTCCC T1D data, BEAM2 found all 5 statistically significant loci previously reported in Table 3 of WTCCC (2007), and captured 7 moderately (insignificant) associated loci listed in Table 4 of WTCCC (2007) with nontrivial posterior probabilities (0.1). BEAM2 further reported some novel two-way joint associations, including 7 SNP pairs (e10) in short-distance and 4 SNP pairs (e11) in long-distance. The local two-way associations indicate main effects of the related genes, which are, however, not detectable using single SNP tests alone. The long-distance two-way associations did not pass the genome-wide Bonferroni multiple testing control, but they may be justifiable as real interaction associations if a better multiple testing method is used and a replication study is performed. We further analyzed the well-known MHC region, and our analysis conditioning on the MHC class II haplotypes suggested the existence of interaction associations rather than MHC extended linkage alone. Given the complex nature and the important biological role of MHC in the human genome, our analysis is rather limited. Sophisticated analysis is in dire need to further reveal the genetic mechanisms underlying MHC and immune diseases.

The current BEAM model can be further improved in several ways. First, missing genotypes and unobserved SNPs in the case–control sample can be treated via imputation [Zhang (2011)]. Previous studies have shown that imputing untyped SNPs and missing genotypes from a reference panel can improve the power of disease association mapping [Marchini et al. (2007)]. Second, the current model only reports SNP-wise posterior probabilities of associations to the disease without providing a detailed analysis of association structures of the detected SNPs interactions. We have observed in the MHC region a bulk of SNPs exhibiting complex association structures with T1D. A post-analysis of the MHC region is thus needed to delineate fine structures of the selected SNPs and interactions with respect to their disease effects and inter-relationships. Third, from a statistical point of view, it is critical to control the false-discovery rate. This is particularly important for multi-locus tests, which often involve a much larger number of simultaneous comparisons than single SNP tests. We are developing new statistical methods for evaluating genome-wide statistical significance of associations.


Additional Supporting Information and Results
\slink[doi,text=10.1214/11-AOAS469SUPP]10.1214/11-AOAS469SUPP \slink[url] \sdatatype.doc \sdescriptionThe file includes a naïve SNP-block model used in our comparison, verification of population structure in the sample, LD analysis of the MHC region, Chi-square results of our simulation study, and comparison of our results with previous results in the T1D WTCCC1 data.


We are grateful to three anonymous reviewers who helped us substantially improve the manuscript from its original form.


  • Anderson and Slatkin (2004) {barticle}[pbm] \bauthor\bsnmAnderson, \bfnmEric C.\binitsE. C. \AND\bauthor\bsnmSlatkin, \bfnmMontgomery\binitsM. (\byear2004). \btitlePopulation-genetic basis of haplotype blocks in the 5q31 region. \bjournalAm. J. Hum. Genet. \bvolume74 \bpages40–49. \biddoi=10.1086/381040, issn=0002-9297, pii=S0002-9297(07)61943-0, pmcid=1181911, pmid=14681827 \endbibitem
  • Barrett et al. (2005) {barticle}[pbm] \bauthor\bsnmBarrett, \bfnmJ. C.\binitsJ. C., \bauthor\bsnmFry, \bfnmB.\binitsB., \bauthor\bsnmMaller, \bfnmJ.\binitsJ. \AND\bauthor\bsnmDaly, \bfnmM. J.\binitsM. J. (\byear2005). \btitleHaploview: Analysis and visualization of LD and haplotype maps. \bjournalBioinformatics \bvolume21 \bpages263–265. \biddoi=10.1093/bioinformatics/bth457, issn=1367-4803, pii=bth457, pmid=15297300 \endbibitem
  • de Bakker et al. (2005) {barticle}[auto:STB—2011-03-03—12:04:44] \bauthor\bparticlede \bsnmBakker, \bfnmP. I.\binitsP. I., \bauthor\bsnmYelensky, \bfnmR.\binitsR., \bauthor\bsnmPe’er, \bfnmI.\binitsI., \bauthor\bsnmGabriel, \bfnmS. B.\binitsS. B., \bauthor\bsnmDaly, \bfnmM. J.\binitsM. J. \AND\bauthor\bsnmAltshuler, \bfnmD.\binitsD. (\byear2005). \btitleEfficiency and power in genetic association studies. \bjournalNat. Genet. \bvolume37 \bpages1217–1223. \endbibitem
  • Friedman, Hastie and Tibshirani (2010) {bmisc}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmFriedman, \bfnmJ.\binitsJ., \bauthor\bsnmHastie, \bfnmT.\binitsT. \AND\bauthor\bsnmTibshirani, \bfnmR.\binitsR. (\byear2010). \bhowpublishedA note on the group lasso and a sparse group lasso. Technical report, Stanford Univ. \endbibitem
  • Gabriel et al. (2002) {barticle}[pbm] \bauthor\bsnmGabriel, \bfnmStacey B.\binitsS. B., \bauthor\bsnmSchaffner, \bfnmStephen F.\binitsS. F., \bauthor\bsnmNguyen, \bfnmHuy\binitsH., \bauthor\bsnmMoore, \bfnmJamie M.\binitsJ. M., \bauthor\bsnmRoy, \bfnmJessica\binitsJ., \bauthor\bsnmBlumenstiel, \bfnmBrendan\binitsB., \bauthor\bsnmHiggins, \bfnmJohn\binitsJ., \bauthor\bsnmDeFelice, \bfnmMatthew\binitsM., \bauthor\bsnmLochner, \bfnmAmy\binitsA., \bauthor\bsnmFaggart, \bfnmMaura\binitsM., \bauthor\bsnmLiu-Cordero, \bfnmShau Neen\binitsS. N., \bauthor\bsnmRotimi, \bfnmCharles\binitsC., \bauthor\bsnmAdeyemo, \bfnmAdebowale\binitsA., \bauthor\bsnmCooper, \bfnmRichard\binitsR., \bauthor\bsnmWard, \bfnmRyk\binitsR., \bauthor\bsnmLander, \bfnmEric S.\binitsE. S., \bauthor\bsnmDaly, \bfnmMark J.\binitsM. J. \AND\bauthor\bsnmAltshuler, \bfnmDavid\binitsD. (\byear2002). \btitleThe structure of haplotype blocks in the human genome. \bjournalScience \bvolume296 \bpages2225–2229. \biddoi=10.1126/science.1069424, issn=1095-9203, pii=1069424, pmid=12029063 \endbibitem
  • Heckerman, Geiger and Chickering (1995) {barticle}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmHeckerman, \bfnmD.\binitsD., \bauthor\bsnmGeiger, \bfnmD.\binitsD. \AND\bauthor\bsnmChickering, \bfnmD.\binitsD. (\byear1995). \btitleLearning Bayesian networks: The combination of knowledge and statistical data. \bjournalMach. Learn. \bvolume20 \bpages197–243. \endbibitem
  • The International HapMap Consortium (2005) {bmisc}[auto:STB—2011-03-03—12:04:44] \borganizationThe International HapMap Consortium (\byear2005). \bhowpublishedA haplotype map of the human genome. Nature 437 1299–1320. \endbibitem
  • Jeffreys, Kauppi and Neumann (2001) {barticle}[pbm] \bauthor\bsnmJeffreys, \bfnmA. J.\binitsA. J., \bauthor\bsnmKauppi, \bfnmL.\binitsL. \AND\bauthor\bsnmNeumann, \bfnmR.\binitsR. (\byear2001). \btitleIntensely punctate meiotic recombination in the class II region of the major histocompatibility complex. \bjournalNat. Genet. \bvolume29 \bpages217–222. \biddoi=10.1038/ng1001-217, issn=1061-4036, pii=ng1001-217, pmid=11586303 \endbibitem
  • Jeffreys, Ritchie and Neumann (2000) {barticle}[pbm] \bauthor\bsnmJeffreys, \bfnmA. J.\binitsA. J., \bauthor\bsnmRitchie, \bfnmA.\binitsA. \AND\bauthor\bsnmNeumann, \bfnmR.\binitsR. (\byear2000). \btitleHigh resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. \bjournalHum. Mol. Genet. \bvolume9 \bpages725–733. \bidissn=0964-6906, pii=ddd086, pmid=10749979 \endbibitem
  • Johnson et al. (2001) {barticle}[pbm] \bauthor\bsnmJohnson, \bfnmG. C.\binitsG. C., \bauthor\bsnmEsposito, \bfnmL.\binitsL., \bauthor\bsnmBarratt, \bfnmB. J.\binitsB. J., \bauthor\bsnmSmith, \bfnmA. N.\binitsA. N., \bauthor\bsnmHeward, \bfnmJ.\binitsJ., \bauthor\bsnmGenova, \bfnmG. Di\binitsG. D., \bauthor\bsnmUeda, \bfnmH.\binitsH., \bauthor\bsnmCordell, \bfnmH. J.\binitsH. J., \bauthor\bsnmEaves, \bfnmI. A.\binitsI. A., \bauthor\bsnmDudbridge, \bfnmF.\binitsF., \bauthor\bsnmTwells, \bfnmR. C.\binitsR. C., \bauthor\bsnmPayne, \bfnmF.\binitsF., \bauthor\bsnmHughes, \bfnmW.\binitsW., \bauthor\bsnmNutland, \bfnmS.\binitsS., \bauthor\bsnmStevens, \bfnmH.\binitsH., \bauthor\bsnmCarr, \bfnmP.\binitsP., \bauthor\bsnmTuomilehto-Wolf, \bfnmE.\binitsE., \bauthor\bsnmTuomilehto, \bfnmJ.\binitsJ., \bauthor\bsnmGough, \bfnmS. C.\binitsS. C., \bauthor\bsnmClayton, \bfnmD. G.\binitsD. G. \AND\bauthor\bsnmTodd, \bfnmJ. A.\binitsJ. A. (\byear2001). \btitleHaplotype tagging for the identification of common disease genes. \bjournalNat. Genet. \bvolume29 \bpages233–237. \biddoi=10.1038/ng1001-233, issn=1061-4036, pii=ng1001-233, pmid=11586306 \endbibitem
  • Kuno et al. (2004) {barticle}[pbm] \bauthor\bsnmKuno, \bfnmShin-Ichi\binitsS.-I., \bauthor\bsnmTaniguchi, \bfnmAtsuo\binitsA., \bauthor\bsnmSaito, \bfnmAkira\binitsA., \bauthor\bsnmTsuchida-Otsuka, \bfnmSanae\binitsS. \AND\bauthor\bsnmKamatani, \bfnmNaoyuki\binitsN. (\byear2004). \btitleComparison between various strategies for the disease-gene mapping using linkage disequilibrium analyses: Studies on adenine phosphoribosyltransferase deficiency used as an example. \bjournalJ. Hum. Genet. \bvolume49 \bpages463–473. \biddoi=10.1007/s10038-004-0175-y, issn=1434-5161, pmid=15278765 \endbibitem
  • Liu (2001) {bbook}[mr] \bauthor\bsnmLiu, \bfnmJun S.\binitsJ. S. (\byear2001). \btitleMonte Carlo Strategies in Scientific Computing. \bpublisherSpringer, \baddressNew York. \bidmr=1842342 \endbibitem
  • Marchini, Donnelly and Cardon (2005) {barticle}[pbm] \bauthor\bsnmMarchini, \bfnmJonathan\binitsJ., \bauthor\bsnmDonnelly, \bfnmPeter\binitsP. \AND\bauthor\bsnmCardon, \bfnmLon R.\binitsL. R. (\byear2005). \btitleGenome-wide strategies for detecting multiple loci that influence complex diseases. \bjournalNat. Genet. \bvolume37 \bpages413–417. \biddoi=10.1038/ng1537, issn=1061-4036, pii=ng1537, pmid=15793588 \endbibitem
  • Marchini et al. (2007) {barticle}[pbm] \bauthor\bsnmMarchini, \bfnmJonathan\binitsJ., \bauthor\bsnmHowie, \bfnmBryan\binitsB., \bauthor\bsnmMyers, \bfnmSimon\binitsS., \bauthor\bsnmMcVean, \bfnmGil\binitsG. \AND\bauthor\bsnmDonnelly, \bfnmPeter\binitsP. (\byear2007). \btitleA new multipoint method for genome-wide association studies by imputation of genotypes. \bjournalNat. Genet. \bvolume39 \bpages906–913. \biddoi=10.1038/ng2088, issn=1061-4036, pii=ng2088, pmid=17572673 \endbibitem
  • Nejentsev et al. (2007) {bmisc}[pbm] \bauthor\bsnmNejentsev, \bfnmSergey\binitsS., \bauthor\bsnmHowson, \bfnmJoanna M M\binitsJ. M. M., \bauthor\bsnmWalker, \bfnmNeil M.\binitsN. M., \bauthor\bsnmSzeszko, \bfnmJeffrey\binitsJ., \bauthor\bsnmField, \bfnmSarah F.\binitsS. F., \bauthor\bsnmStevens, \bfnmHelen E.\binitsH. E., \bauthor\bsnmReynolds, \bfnmPamela\binitsP., \bauthor\bsnmHardy, \bfnmMatthew\binitsM., \bauthor\bsnmKing, \bfnmErna\binitsE., \bauthor\bsnmMasters, \bfnmJennifer\binitsJ., \bauthor\bsnmHulme, \bfnmJohn\binitsJ., \bauthor\bsnmMaier, \bfnmLisa M.\binitsL. M., \bauthor\bsnmSmyth, \bfnmDeborah\binitsD., \bauthor\bsnmBailey, \bfnmRebecca\binitsR., \bauthor\bsnmCooper, \bfnmJason D.\binitsJ. D., \bauthor\bsnmRibas, \bfnmGloria\binitsG., \bauthor\bsnmCampbell, \bfnmR. Duncan\binitsR. D., \bauthor\bsnmClayton, \bfnmDavid G.\binitsD. G., \bauthor\bsnmTodd, \bfnmJohn A.\binitsJ. A. \AND\borganizationThe Wellcome Trust Case Control Consortium (\byear2007). \bhowpublishedLocalization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A. Nature 450 887–892. \biddoi=10.1038/nature06406, issn=1476-4687, mid=UKMS27162, pii=nature06406, pmcid=2703779, pmid=18004301 \endbibitem
  • Nielsen et al. (2004) {barticle}[pbm] \bauthor\bsnmNielsen, \bfnmDahlia M.\binitsD. M., \bauthor\bsnmEhm, \bfnmMargaret G.\binitsM. G., \bauthor\bsnmZaykin, \bfnmDmitri V.\binitsD. V. \AND\bauthor\bsnmWeir, \bfnmBruce S.\binitsB. S. (\byear2004). \btitleEffect of two- and three-locus linkage disequilibrium on the power to detect marker/phenotype associations. \bjournalGenetics \bvolume168 \bpages1029–1040. \biddoi=10.1534/genetics.103.022335, issn=0016-6731, pii=168/2/1029, pmcid=1448814, pmid=15514073 \endbibitem
  • Phillips et al. (2003) {barticle}[pbm] \bauthor\bsnmPhillips, \bfnmM. S.\binitsM. S., \bauthor\bsnmLawrence, \bfnmR.\binitsR., \bauthor\bsnmSachidanandam, \bfnmR.\binitsR., \bauthor\bsnmMorris, \bfnmA. P.\binitsA. P., \bauthor\bsnmBalding, \bfnmD. J.\binitsD. J., \bauthor\bsnmDonaldson, \bfnmM. A.\binitsM. A., \bauthor\bsnmStudebaker, \bfnmJ. F.\binitsJ. F., \bauthor\bsnmAnkener, \bfnmW. M.\binitsW. M., \bauthor\bsnmAlfisi, \bfnmS. V.\binitsS. V., \bauthor\bsnmKuo, \bfnmF-S\binitsF.-S., \bauthor\bsnmCamisa, \bfnmA. L.\binitsA. L., \bauthor\bsnmPazorov, \bfnmV.\binitsV., \bauthor\bsnmScott, \bfnmK. E.\binitsK. E., \bauthor\bsnmCarey, \bfnmB. J.\binitsB. J., \bauthor\bsnmFaith, \bfnmJ.\binitsJ., \bauthor\bsnmKatari, \bfnmG.\binitsG., \bauthor\bsnmBhatti, \bfnmH. A.\binitsH. A., \bauthor\bsnmCyr, \bfnmJ. M.\binitsJ. M., \bauthor\bsnmDerohannessian, \bfnmV.\binitsV., \bauthor\bsnmElosua, \bfnmC.\binitsC., \bauthor\bsnmForman, \bfnmA. M.\binitsA. M., \bauthor\bsnmGrecco, \bfnmN. M.\binitsN. M., \bauthor\bsnmHock, \bfnmC. R.\binitsC. R., \bauthor\bsnmKuebler, \bfnmJ. M.\binitsJ. M., \bauthor\bsnmLathrop, \bfnmJ. A.\binitsJ. A., \bauthor\bsnmMockler, \bfnmM. A.\binitsM. A., \bauthor\bsnmNachtman, \bfnmE. P.\binitsE. P., \bauthor\bsnmRestine, \bfnmS. L.\binitsS. L., \bauthor\bsnmVarde, \bfnmS. A.\binitsS. A., \bauthor\bsnmHozza, \bfnmM. J.\binitsM. J., \bauthor\bsnmGelfand, \bfnmC. A.\binitsC. A., \bauthor\bsnmBroxholme, \bfnmJ.\binitsJ., \bauthor\bsnmAbecasis, \bfnmG. R.\binitsG. R., \bauthor\bsnmBoyce-Jacino, \bfnmM. T.\binitsM. T. \AND\bauthor\bsnmCardon, \bfnmL. R.\binitsL. R. (\byear2003). \btitleChromosome-wide distribution of haplotype blocks and the role of recombination hot spots. \bjournalNat. Genet. \bvolume33 \bpages382–387. \biddoi=10.1038/ng1100, issn=1061-4036, pii=ng1100, pmid=12590262 \endbibitem
  • Przeworski and Wall (2001) {barticle}[pbm] \bauthor\bsnmPrzeworski, \bfnmM.\binitsM. \AND\bauthor\bsnmWall, \bfnmJ. D.\binitsJ. D. (\byear2001). \btitleWhy is there so little intragenic linkage disequilibrium in humans? \bjournalGenet. Res. \bvolume77 \bpages143–151. \bidissn=0016-6723, pmid=11355570 \endbibitem
  • Reich et al. (2001) {barticle}[pbm] \bauthor\bsnmReich, \bfnmD. E.\binitsD. E., \bauthor\bsnmCargill, \bfnmM.\binitsM., \bauthor\bsnmBolk, \bfnmS.\binitsS., \bauthor\bsnmIreland, \bfnmJ.\binitsJ., \bauthor\bsnmSabeti, \bfnmP. C.\binitsP. C., \bauthor\bsnmRichter, \bfnmD. J.\binitsD. J., \bauthor\bsnmLavery, \bfnmT.\binitsT., \bauthor\bsnmKouyoumjian, \bfnmR.\binitsR., \bauthor\bsnmFarhadian, \bfnmS. F.\binitsS. F., \bauthor\bsnmWard, \bfnmR.\binitsR. \AND\bauthor\bsnmLander, \bfnmE. S.\binitsE. S. (\byear2001). \btitleLinkage disequilibrium in the human genome. \bjournalNature \bvolume411 \bpages199–204. \biddoi=10.1038/35075590, issn=0028-0836, pii=35075590, pmid=11346797 \endbibitem
  • Steenkiste et al. (2007) {bmisc}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmSteenkiste, \bfnmA.\binitsA., \bauthor\bsnmValdes, \bfnmAM\binitsA., \bauthor\bsnmFeolo, \bfnmM.\binitsM., \bauthor\bsnmHoffman, \bfnmD.\binitsD., \bauthor\bsnmConcannon, \bfnmP.\binitsP., \bauthor\bsnmNoble, \bfnmJ.\binitsJ., \bauthor\bsnmSchoch, \bfnmG.\binitsG., \bauthor\bsnmHansen, \bfnmJ.\binitsJ., \bauthor\bsnmHelmberg, \bfnmW.\binitsW., \bauthor\bsnmDorman, \bfnmJ. S.\binitsJ. S., \bauthor\bsnmThomson, \bfnmG.\binitsG. \AND\bauthor\bsnmPugliese, \bfnmA.\binitsA. (\byear2007). \bhowpublished13th IHWS 1 diabetes component participating investigators. 14th International HLA and Immunogenetics Workshop: Report on the HLA component of type 1 diabetes. Tissue Antigens 69 214–225. \endbibitem
  • Stumpf and Goldstein (2003) {barticle}[pbm] \bauthor\bsnmStumpf, \bfnmMichael P H\binitsM. P. H. \AND\bauthor\bsnmGoldstein, \bfnmDavid B.\binitsD. B. (\byear2003). \btitleDemography, recombination hotspot intensity, and the block structure of linkage disequilibrium. \bjournalCurr. Biol. \bvolume13 \bpages1–8. \bidissn=0960-9822, pii=S0960982202014045, pmid=12526738 \endbibitem
  • Viken et al. (2009) {barticle}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmViken, \bfnmM. K.\binitsM. K., \bauthor\bsnmBlomhoff, \bfnmA.\binitsA., \bauthor\bsnmOlsson, \bfnmM.\binitsM., \bauthor\bsnmAkselsen, \bfnmH. E.\binitsH. E., \bauthor\bsnmPociot, \bfnmF.\binitsF., \bauthor\bsnmNerup, \bfnmJ.\binitsJ., \bauthor\bsnmKockum, \bfnmI.\binitsI., \bauthor\bsnmCambon-Thomsen, \bfnmA.\binitsA., \bauthor\bsnmThorsby, \bfnmE.\binitsE., \bauthor\bsnmUndlien, \bfnmD. E.\binitsD. E. \betalet al. (\byear2009). \btitleReproducible association with type 1 diabetes in the extended class I region of the major histocompatibility complex. \bjournalGenes and Immune \bvolume10 \bpages303–333. \endbibitem
  • Wall and Pritchard (2003) {barticle}[pbm] \bauthor\bsnmWall, \bfnmJeffrey D.\binitsJ. D. \AND\bauthor\bsnmPritchard, \bfnmJonathan K.\binitsJ. K. (\byear2003). \btitleAssessing the performance of the haplotype block model of linkage disequilibrium. \bjournalAm. J. Hum. Genet. \bvolume73 \bpages502–515. \biddoi=10.1086/378099, issn=0002-9297, pii=S0002-9297(07)62014-X, pmcid=1180676, pmid=12916017 \endbibitem
  • Wang et al. (2002) {barticle}[pbm] \bauthor\bsnmWang, \bfnmNing\binitsN., \bauthor\bsnmAkey, \bfnmJoshua M.\binitsJ. M., \bauthor\bsnmZhang, \bfnmKun\binitsK., \bauthor\bsnmChakraborty, \bfnmRanajit\binitsR. \AND\bauthor\bsnmJin, \bfnmLi\binitsL. (\byear2002). \btitleDistribution of recombination crossovers and the origin of haplotype blocks: The interplay of population history, recombination, and mutation. \bjournalAm. J. Hum. Genet. \bvolume71 \bpages1227–1234. \biddoi=10.1086/344398, issn=0002-9297, pii=S0002-9297(07)60418-2, pmcid=385104, pmid=12384857 \endbibitem
  • WTCCC (2007) {bmisc}[pbm] \borganizationThe Wellcome Trust Case Control Consortium (\byear2007). \bhowpublishedGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661–678 \biddoi=10.1038/nature05911, issn=1476-4687, mid=UKMS4894, pii=nature05911, pmcid=2719288, pmid=17554300. \endbibitem
  • Yuan and Lin (2006) {barticle}[mr] \bauthor\bsnmYuan, \bfnmMing\binitsM. \AND\bauthor\bsnmLin, \bfnmYi\binitsY. (\byear2006). \btitleModel selection and estimation in regression with grouped variables. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume68 \bpages49–67. \biddoi=10.1111/j.1467-9868.2005.00532.x, issn=1369-7412, mr=2212574 \endbibitem
  • Zhang et al. (2002a) {barticle}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmZhang, \bfnmK.\binitsK., \bauthor\bsnmCalabrese, \bfnmP.\binitsP., \bauthor\bsnmNordborg, \bfnmM.\binitsM. \AND\bauthor\bsnmSun, \bfnmF.\binitsF. (\byear2002a). \btitleHaplotype structure and its applications to association studies: Power and study designs. \bjournalAm. J. Hum. Genet. \bvolume71 \bpages1386–1394. \endbibitem
  • Zhang et al. (2002b) {barticle}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmZhang, \bfnmK.\binitsK., \bauthor\bsnmDeng, \bfnmM.\binitsM., \bauthor\bsnmChen, \bfnmT.\binitsT., \bauthor\bsnmWaterman, \bfnmM. S.\binitsM. S. \AND\bauthor\bsnmSun, \bfnmF.\binitsF. (\byear2002b). \btitleA dynamical programming algorithm for haplotype block partitioning. \bjournalProc. Natl. Acad. Sci. USA \bvolume99 \bpages7335–7339. \endbibitem
  • Zhang et al. (2003) {barticle}[pbm] \bauthor\bsnmZhang, \bfnmKun\binitsK., \bauthor\bsnmAkey, \bfnmJoshua M.\binitsJ. M., \bauthor\bsnmWang, \bfnmNing\binitsN., \bauthor\bsnmXiong, \bfnmMomiao\binitsM., \bauthor\bsnmChakraborty, \bfnmRanajit\binitsR. \AND\bauthor\bsnmJin, \bfnmLi\binitsL. (\byear2003). \btitleRandomly distributed crossovers may generate block-like patterns of linkage disequilibrium: An act of genetic drift. \bjournalHum. Genet. \bvolume113 \bpages51–59. \biddoi=10.1007/s00439-003-0941-5, issn=0340-6717, pmid=12677424 \endbibitem
  • Zhang (2011) {bmisc}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmZhang, \bfnmY.\binitsY. (\byear2011). \bhowpublishedBayesian epistasis association mapping via SNP imputation. Biostatistics 12 211–222. \endbibitem
  • Zhang and Liu (2007) {barticle}[pbm] \bauthor\bsnmZhang, \bfnmYu\binitsY. \AND\bauthor\bsnmLiu, \bfnmJun S.\binitsJ. S. (\byear2007). \btitleBayesian inference of epistatic interactions in case–control studies. \bjournalNat. Genet. \bvolume39 \bpages1167–1173. \biddoi=10.1038/ng2110, issn=1061-4036, pii=ng2110, pmid=17721534 \endbibitem
  • Zhang, Niu and Liu (2006) {barticle}[pbm] \bauthor\bsnmZhang, \bfnmYu\binitsY., \bauthor\bsnmNiu, \bfnmTianhua\binitsT. \AND\bauthor\bsnmLiu, \bfnmJun S.\binitsJ. S. (\byear2006). \btitleA coalescence-guided hierarchical Bayesian method for haplotype inference. \bjournalAm. J. Hum. Genet. \bvolume79 \bpages313–322. \biddoi=10.1086/506276, issn=0002-9297, pii=S0002-9297(07)63138-3, pmcid=1559491, pmid=16826521 \endbibitem
  • Zhang, Zhang and Liu (2011) {bmisc}[auto:STB—2011-03-03—12:04:44] \bauthor\bsnmZhang, \bfnmY.\binitsY., \bauthor\bsnmZhang, \bfnmJ.\binitsJ. \AND\bauthor\bsnmLiu, \bfnmJ. S.\binitsJ. S. (\byear2011). \bhowpublishedSupplement to “Block-based Bayesian epistasis association mapping with application to WTCCC Type 1 diabetes data.” DOI:10.1214/11-AOAS469SUPP. \endbibitem
  • Zöllner and Pritchard (2005) {barticle}[pbm] \bauthor\bsnmZöllner, \bfnmSebastian\binitsS. \AND\bauthor\bsnmPritchard, \bfnmJonathan K.\binitsJ. K. (\byear2005). \btitleCoalescent-based association mapping and fine mapping of complex trait loci. \bjournalGenetics \bvolume169 \bpages1071–1092. \biddoi=10.1534/genetics.104.031799, issn=0016-6731, pii=genetics.104.031799, pmcid=1449137, pmid=15489534 \endbibitem
  • Zou and Hastie (2005) {barticle}[mr] \bauthor\bsnmZou, \bfnmHui\binitsH. \AND\bauthor\bsnmHastie, \bfnmTrevor\binitsT. (\byear2005). \btitleRegularization and variable selection via the elastic net. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume67 \bpages301–320. \biddoi=10.1111/j.1467-9868.2005.00503.x, issn=1369-7412, mr=2137327 \endbibitem
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description