Narrowing the gap on heritability of common disease by direct estimation in casecontrol GWAS
{affiliations}Department of Statistics and Operations Research, TelAviv University.
Summary paragraph
One of the major developments in recent years in the search for missing heritability of human phenotypes is the adoption of linear mixedeffects models (LMMs) to estimate heritability due to genetic variants which are not significantly associated with the phenotype(Yang et al., 2010). A variant of the LMM approach has been adapted to casecontrol studies and applied to many major diseases(Lee et al., 2011; Do et al., 2011; Lee et al., 2012, 2013), successfully accounting for a considerable portion of the missing heritability. For example, for Crohn’s disease their estimated heritability was 22% compared to 5060% from family studies. In this letter we propose to estimate heritability of disease directly by regression of phenotype similarities on genotype correlations, corrected to account for ascertainment. We refer to this method as genetic correlation regression (GCR). Using GCR we estimate the heritability of Crohn’s disease at 34% using the same data. We demonstrate through extensive simulation that our method yields unbiased heritability estimates, which are consistently higher than LMM estimates. Moreover, we develop a heuristic correction to LMM estimates, which can be applied to published LMM results. Applying our heuristic correction increases the estimated heritability of multiple sclerosis from 30%(Lee et al., 2013) to 52.6%.
Main text
The mystery of the “missing heritability” is a term commonly used to denote the gap between the expected heritability of many common diseases, as estimated by family and twin studies, and the overall additive (narrowsense) heritability obtained by accumulating the effects of all singlenucleotide polymorphisms (SNPs) that have been found to be significantly associated with these conditions in genomewide association studies (GWASs)(Bloom et al., 2012; Eichler et al., 2010; Maher, 2008; Manolio et al., 2009). Many diseases which comprise a considerable portion of the healthcare burden display such a gap, including type1 and type2 diabetes, bipolar disorder, schizophrenia, Alzheimer’s disease, multiple sclerosis and Parkinson’s disease.
Researchers have proposed several hypothetical solutions to this mystery. These theories include rare causative variants, which are undetected by the current GWAS methodology, common variants with small effects, which do not pass the significance threshold and are therefore unaccounted for, genegene and geneenvironment interactions which are overlooked by the additive model assumed by the GWAS scheme, epigenetic effects and more(Eichler et al., 2010; Manolio et al., 2009; Zuk et al., 2012).
Clearly, different theories have major implications for our understanding of human disease, and also dictate different strategies for discovery of the underlying genetic causes of disease. For example, identifying rare variants requires a focus on deep sequencing(Nielsen et al., 2011), while detecting small effects of common variants requires increasing the sample sizes dramatically, or conducting massive metaanalyses(Visscher, 2008). Effective planning of genetic research can therefore only be guided by a satisfactory and well founded allocation of missing heritability to its various possible sources and causes. Additionally, accurate estimates of heritability can facilitate better personalized genetic risk predictions(Dudbridge, 2013).
Yang et al.(Yang et al., 2010) pioneered the use of LMMs to estimate heritability of continuous traits (e.g. height) from GWAS data, while accounting both for significantly and insignificantly associated SNPs, thus providing an estimate of the total heritability explained by common SNPs. Their method was applied to numerous phenotypes including human height, body mass index, von Willebrand factor(Yang et al., 2011b), gene expression(Price et al., 2011) and intelligence(Visscher et al., 2008). The LMM method was later adapted to dichotomous disease phenotypes by Lee et al.(Lee et al., 2011), assuming the wellknown liability threshold model(Dempster and Lerner, 1950). They showed that for the case of a random (unascertained) population sample the LMM method can be used to estimate heritability by applying it directly to the observed phenotype as if it were continuous, and correcting the resulting “observed scale” estimate as suggested by Dempster and Lerner(Dempster and Lerner, 1950).
The ascertained casecontrol scenario, which is relevant for most GWAS studies, proved more challenging, as the enrichment of cases violates the assumption of normality of the genetic component critical for LMM estimation. Lee et al. proposed a complex mathematical solution for this case, and demonstrated that it worked in limited simulations. They applied their method to three phenotypes from the Wellcome Trust CaseControl Consortium (WTCCC)(Burton et al., 2007): Crohn’s disease (CD), bipolar disorder (BD) and type1 diabetes (T1D), and it has since been applied to many major diseases including schizophrenia(Lee et al., 2012), Alzheimer’s disease, multiple sclerosis(Lee et al., 2013) and Parkinson’s disease(Do et al., 2011). In all these cases, accounting for insignificant SNPs resulted in much higher estimates of heritability than the estimates obtained by accumulating SNPs which were found significant in GWAS.
The basic idea of heritability estimation methods in GWAS is that individuals who are more correlated genetically are more likely to have similar phenotypes. The strength of this connection depends on the heritability – higher heritability implies stronger connection. The genetic correlation of every pair of individuals can be estimated from their genotypes. Since the individuals in the study are unrelated, the correlations between their genotypes are typically small, but nonzero. By accumulating the information across all pairs of individuals in the study, one can leverage these minor differences to separate the phenotypic variance into its genetic and environmental components, resulting in an estimate of heritability.
Ascertained casecontrol studies pose a considerably harder challenge to deal with theoretically than nonascertained (prospective or observational) studies, as the fact that cases are overrepresented in the study creates a wide range of artifacts. The normality of the genetic component, assumed by the liability threshold model, is violated by oversampling of cases, as is the assumption that the genetic and environmental components are independent. Methods that critically rely on these assumptions, like LMM, can not be applied anymore, without requiring complex and potentially questionable adjustments.
We therefore developed our GCR method for estimating heritability of polygenic phenotypes in ascertained casecontrol studies. The key novelty of our method is that we model the selection process directly and account for ascertainment by conditioning on the selection of observed individuals. This is contrary to the existing methods of Lee et al.(Lee et al., 2011) and Zhou et al.(Zhou et al., 2013), which apply methodologies suited for prospective studies of a continuous normal phenotype, and then attempt to correct the estimates so that they account for ascertainment.
Technically, we adopt the commonly used liability threshold model(Lee et al., 2011; Dempster and Lerner, 1950), and derive analytically the relationship between the genetic correlation of any two individuals, and the similarity between their phenotypes, conditional on the fact that both individuals were selected for the study. While this relationship has no closedform expression, taking the firstorder approximation yields a linear relationship between the expected phenotypic similarity and the genetic correlation which depends on the heritability. Our approach then entails performing a regression of phenotype similarities between pairs of individuals on their genetic correlation, as estimated from GWAS data. The slope of the regression is then transformed to an estimate of the heritability (Online Methods).
We applied GCR to the same three phenotypes from WTCCC: CD, BD and T1D, and four additional phenotypes: type2 diabetes (T2D), coronary artery disease (CAD), rheumatoid arthritis (RA) and hypertension (HT). Our method yields considerably higher estimates than the LMM method(Lee et al., 2011) for CD and BD (Table 1).
To explore the nature of this considerable gap between our estimates and previously published estimates of the additive heritability, we conducted a wide range of realistic simulations. We simulated an endtoend generative model, starting from minor allele frequencies (MAFs) and individual SNP effects, through genotypes and phenotypes, and finally the entire selection process (Online Methods).
Our simulations differ from all previously reported simulations of casecontrol studies in the context of heritability estimation in that they yield realistic, rather than degenerate, correlation structure among the individuals in the study. Since we simulated genotypes, the correlation between individuals is not restricted to a small subset of possible values as in the simulations of Lee et al.(Lee et al., 2011). Additionally, when both heritability and ascertainment are high, cases tend to be more genetically similar than expected by chance, which was not accounted for in Lee et al.’s simulations. In particular, their assumed correlation structure is highly degenerate, with as many as 99.98% of correlations being exactly 0. For further discussion of the different simulation setups see Online Methods and Supp. Material.
Our simulations showed that using the LMM method for heritability estimation in casecontrol studies yielded considerably negativelybiased estimates, while our method consistently generated unbiased estimates (Fig. 1 and Supp. Figs. 17).
The simulations also demonstrated a strong relation between the bias of the LMM approach and the increased variance of genetic effects due to ascertainment. This allowed us to derive a heuristic correction for the LMM heritability estimates, which corrected its bias successfully in all our simulation scenarios (Online Methods, Fig. 2 and Supp. Figs. 1215). To validate our heuristic correction, we applied it to LMM heritability estimates for all seven WTCCC phenotypes and compared the corrected estimates to the ones obtained using GCR. The correlation between the corrected LMM estimates and GCR estimates was , indicating that our heuristic correction generalizes beyond our particular simulations (See Online methods and Supp. material for more details).
We then applied our correction to the results of several published studies which used the LMM approach. As expected, for studies with low ascertainment or phenotypes with low heritability, the corrected estimates were not substantially different from published estimates (Alzheimer’s disease(Lee et al., 2013), endometriosis(Lee et al., 2013), schizophrenia(Lee et al., 2012) and Parkinson’s disease(Do et al., 2011)). However, the corrected heritability estimate of multiple sclerosis – the most ascertained study we inspected – is 52.6%, compared to the uncorrected estimate of 30%. See Supp. material for more details.
An important aspect of heritability estimation is the inclusion of known (fixed) effects that should be accounted for, like known associated SNPs, known environmental effects or sex. However, estimation of fixed effects from ascertained data under the normality assumption might produce biased estimators of both the fixed effects and the heritability(Burton, 2003; Bowden et al., 2007; Noh et al., 2005). Conversely, our method can be rigorously extended to allow for fixed effects (Online Methods). Our simulations suggest that GCR produces accurate estimates of heritability in the presence of fixed effects (see Supp. material for more details). Specifically for the WTCCC phenotypes, previous analysis(Burton et al., 2007) suggested that there’s little population structure (after removing individuals of non British descent). We validated this conclusion using the statistical test described in Patterson et al. (2006)Patterson et al. (2006) (see Supp. material for more details). We therefore included only sex as a fixed effect in our heritability estimation in Table 1. The estimates of both LMM and GCR for most phenotypes without inclusion of a sex fixed effect are not substantially different from the estimates in Table 1, with the exception of the heartdisease related phenotypes (results without fixed effect not shown). For CAD in particular, it is well known that sex is a major risk factor. Accordingly, the GCR estimate without fixed effect is 71.5%, dropping to about 61% once it is added, as seen in Table 1.
GCR is computationally very efficient compared to the LMM approach (running time scales quadratically rather than cubically in the number of individuals), and so the running time is seconds rather than hours on WTCCC cohorts. As the typical size of GWAS continues to increase, this efficiency can prove critical in allowing GCR to remain computationally practical compared to other approaches. Fast running time is also useful in allowing us to use resampling approaches like the jackknife(Efron and Tibshirani, 1994) for estimating confidence intervals and standard errors, rather than relying on complex and potentially inaccurate parametric approximations. All standard errors and confidence intervals we report are based on the jackknife.
We also experimented with a more involved version of our approach that included the second order term of the Taylor series expansion. This did not yield more accurate results than our first order GCR in our simulations (Supp. material).
Recently, Zuk et al.(Zuk et al., 2012) have shown an intriguing general result on the connection between the derivative of the dependence of phenotypic similarity on the proportion of identity by descent (IBD) at the average population IBD, and the narrowsense heritability, regardless of population structure and phenotypegenotype architecture. While Zuk et al.’s result uses IBD and our method relies on estimated genetic correlations, the latter is, in fact, an unbiased estimator of the former, which is unknown. Hence, our method can produce unbiased estimators of narrowsense heritability, even in the presence of population structure, by multiplying the estimate by a factor of (1average population IBD).
In conclusion, our new proposed GCR method for estimation of heritability in casecontrol GWAS, which is based on a regression of phenotype similarities on genotype correlations, is shown to be efficient and accurate. It improves on existing methodology and generates substantially higher estimates of heritability for two major diseases inspected: Crohn’s disease and bipolar disorder. Moreover, we provide a heuristic correction for published LMM heritability estimates, which suggests that the heritability of multiple sclerosis is also considerably larger than previously thought.
Methods
Note: An implementation of our code for GCR estimation, simulations, and heuristic correction of LMM can be accessed at: https://sites.google.com/site/davidgolanshomepage/gcr
Heritability estimation using genetic correlation regression
Liability threshold model  notations. Denote the prevalence of a condition in the population and the prevalence in the study.
Under the liability threshold model, we assume that each individual has an unknown liability where is a genetic random effect, which can be correlated across individuals, and is the environmental random effect, which is assumed to be independent of each other and of the genetic effects. Both effects are assumed to follow a Gaussian distribution with variances and respectively. A person is then assumed to be a case if her liability exceeds a threshold , i.e. the phenotype is given by . This definition guarantees that the prevalence in the population is indeed
Selection probabilities. When the study is observational, the probability of being included in the study is independent of the phenotype. However, in a casecontrol study, the proportion of cases is usually greatly ascertained. To model this fact, we define a random indicator variable indicating whether individual was selected to the study.
The commonly used “full” or “complete” ascertainment assumption(Stene, 1977) is . While this assumption can be relaxed, as discussed later, it simplifies subsequent analysis.
Suppose the population is of size and that the expected size of the study is . The expected number of cases in the study is . Additionally, the proportion of cases in the study is , so:
yielding:
denote the probability of a control being included in the study (i.e. ). The expected number of controls in the study is . Additionally, the proportion of controls in the study is so:
Solving for yields:
From here it follows that the probability of being included in the study for a given individual (with unknown phenotype) is:
Our results do not depend strictly on the full ascertainment assumption (that is, ). The latter assumption can be relaxed such that
for any , as long as the probability of being selected as a control is multiplied by the same probability. This can model any step prior to the selection procedure, for example the probability that an individual is approached by the health administration to begin with. For example, a nonascertained study () might involve only a proportion of the population. In this case deriving the selection probability of a given individual yields as expected.
Heritability estimation. Next, consider a pair of individuals in the study, whose genetic effects are correlated and denote by the correlation between their genetic effects.
Denote by the product of the standardized phenotypes:
The variable can take three values:
We write down the expected value of , conditional on the fact that (the individuals are part of the study) and given :
We apply Bayes rule to the first of the three expressions on the right:
Under the full ascertainment assumption , and so
Similarly:
and since a control is selected to the study with probability , this boils down to:
For the case of , one individual is a case, and is automatically selected, while the other is a control and is selected with probability . Hence:
Using these results we get:
Denote the numerator by and the denominator by . We wish to approximate the last equation using a Taylor series around . Since the individuals are unrelated, the correlation is expected to be close to 0, and therefore such an approximation is expected to be good. Such an approximation would take the form:
See Supp. Mat. for a discussion of a second order approximation.
Note that with , the phenotypes of the two individuals are i.i.d. and so . Therefore, the Taylor approximation can be simplified:
Similarly, with the events of being included in the study are i.i.d. for both individuals, so .
All that remains is to find . Towards that end, we are interested in computing the probabilities of the three possible combinations of phenotypes:
and  
where is the multivariate Gaussian density, namely:
with denoting the covariance matrix of the liabilities, given explicitly by:
The determinant of is and its inverse is and so the density function can be written as:
Differentiating requires differentiating each of the three double integrals w.r.t. :
Setting in the last expression yields:
Explanation: we differentiate and set . By the chain rule, the derivative of any expression with is at , and obviously the derivative of any expression which does not depend on is . The only expression whose derivative is therefore not at is in the numerator of the exponent. The denominator of the exponent is at , and so the derivative at is . Similarly:
and
Using these results we can write down :
and so:
Hence, when the error of the approximation is small, the slope obtained by regressing on is an unbiased estimator of , thus dividing it by yields an unbiased estimator of  the liability scale heritability.
Extending the liability threshold model to include fixed effects. It is often desired to include fixed effects in the analysis of a complex phenotype. Such fixed effects might include external information such as sex, diet and exposure to environmental risks, but can also be genetic variants with known effects or estimates of population structure such as projections of several top principal components.
Since the liability threshold model is in fact a probit model, fixed effects can be included in the usual manner:
where is a vector of the values of the relevant covariates and is a vector of their respective effect sizes.
An individual is a case if , as before. However, an equivalent formulation would be to subtract the fixed effect from the threshold, rather than adding it to the liability:
thus keeping the previous formulation of the liability as a sum of genetic and environmental effects.
Heritability estimation with known fixed effects. Assume first that the fixed effects are known, and so the ’s are known. The probability of being a case, and of being included in the study, are no longer equal for all the observed individuals. We denote:
the probability that the ’th individual is a case, and:
the probability that the ’th individual is a case conditional on being selected for the study, where is computed using the fixed effects as described above. We now wish to derive the same first order approximation, while accounting for the newly introduced heterogeneity. We redefine:
and follow the same steps as before (see supplementary materials for a full derivation). Using the first order Taylor approximation we conclude that the slope obtained by regressing on
is an estimator of heritability on the liability scale.
Estimating heritability with unknown fixed effects. More often than not, the effects of relevant fixed effects are unknown and must be estimated from the data. However, estimating effect sizes under ascertainment in casecontrol studies is notoriously problematic. Specifically, under the threshold (probit) model, ignoring the ascertainment yields biased estimators.
A special exception is the case of logistic regression. In their seminal paper, Prentice and Pyke (1979)(Prentice and Pyke, 1979) proved that using a logistic regression to estimate fixed effects from ascertained data yields consistent estimators of these effects in the (unascertained) population, and that the ascertainment only biases the intercept.
We therefore suggest a twostep procedure for estimating heritability. First, we estimate the fixed effects using a logistic regression model. We then correct the effect of the ascertainment, and obtain the individualspecific thresholds. Lastly, we plug the thresholds into the estimation scheme described above.
More elaborately, by Bayes’ formula:
by the complete ascertainment assumption , and according to the selection scheme:
We can thus solve for and express it is a function of :
We then use logistic regression to obtain  a consistent estimator of , and use this estimate to obtain an estimate of , which is in turn used to estimate the threshold:
and the estimates of the individualwise thresholds are used for estimating the variance of the genetic effect.
Estimating the added variance due to fixed effects Lastly, the presence of fixed effects increases the variance of the liability, so no longer equals . The appropriate definition of heritability is now:
where is the variance of the thresholds in the population, and so the estimate of can be transformed to an estimate of the heritability simply by dividing it by . We discuss how can be estimated from the data in the Supplementary material.
Simulations using a generative model
Our goal was to create a realistic simulation, covering a wide range of combinations of disease prevalence , case sampling probability and heritability , and for each one creating “natural” genotypes that realistically recreate the complex correlation structure induced by casecontrol sampling.
Given these parameters, our simulations proceeded as follows:

The MAFs of 10,000 SNPs were randomly sampled from .

SNP effect sizes were randomly sampled from

For each individual, we:

Randomly generated a genotype using the MAFs, and normalized it (according to Yang et al.’s model(Yang et al., 2011a)).

Used the genotype and the effect sizes, to compute the genetic effect.

Sampled an environmental effect from .

Computed liability and phenotype.

If the phenotype was a case  the individual was automatically included in the study. Otherwise the individual was included in the study with probability .


Step (2) was repeated until enough individuals were accumulated (4,000).

The genotypes of all included individuals were used to compute where is the matrix of n5.8alized genotypes.
This matrix was then used to estimate heritability for the LMM using the software(Yang et al., 2011a) and the correction of Lee et al.(Lee et al., 2011); and using GCR as described above.
We note that our choice to work with SNPs that are in linkage equilibrium was motivated by the analysis of Patterson et al.Patterson et al. (2006). They show that for the purpose of generating correlation matrices, using SNPs in linkage disequilibrium (LD) is equivalent to using a smaller number of SNPs in linkage equilibrium. They also suggest a method for estimating the effective number of SNPs (i.e. the number of SNPs in linkage equilibrium leading to the same distribution of correlation matrices as a given set of SNPs in LD). Applying their method to WTCCC data suggests that the effective number of SNPs in linkage equilibrium is roughly one tenth of the actual number of SNPs. Hence, using 10,000 SNPs in our simulations is equivalent to simulating roughly 100,000 SNPs with realistic LD structure.
Heuristic correction for the LMM approach
Denote the estimate of the heritability on the liability scale, obtained through the method of Lee et al.(Lee et al., 2011) by , and the estimates obtained by our method by . Denote the true variance of the genetic effect under ascertainment by:
We note a different analytical expression of is given in(Lee et al., 2011). We validated correctness of our derivation of this expression numerically.
As detailed in the Supplementary Materials, our simulations demonstrated the following properties of these estimators:

They are both unbiased when there is no ascertainment.

Our estimate remains unbiased in all situations. However, in presence of ascertainment, is biased, and is not linear in the true for fixed .

When multiplied by , the estimate becomes linear in for fixed .

The bias of worsens as the ascertainment factor decreases.
We therefore performed an extensive analysis of the relationship between and the bias of the “linearized” estimate in our simulations. Our analysis, as detailed in Supplementary Materials, led us to the following relationship between and the true underlying heritability :
Correcting published results We have derived our heuristic correction using simulations wherein the true underlying heritability is known. However, contrary to previously used corrections, our correction is not a linear transformation of the estimate. When attempting to correct published estimates, we only know and . We define our corrected estimate to be the value of for which
where the expectation is computed using the approximate relationship we derived previously. In other words, the corrected estimate of heritability is the value of heritability for which the expectation of the estimator is the observed estimate, where the expectation is calculated using our heuristic correction.
Confidence intervals are derived by applying the same procedure to the top and bottom limits of a confidence interval based on the published standard deviation of the estimate.
Phenotype  Prevalence (%)  LMM (sd) (%)  GCR (sd) (%)  Family studies (%) 

CD  0.1  23.2 (3)  34.1 (5.8)  5060 
BD  0.5  42.8 (4.1)  53.8 (6.8)  71 
T2D  3  42 (6.3)  47.8 (9.9)  2669 
HT  5  53.1 (7.4)  52.3 (10.6)  3163** 
CAD  30  66.9 (12.8)  61.1 (16.9)  3956** 
RA  0.75  16.5 (4.6)  17.9 (7)  5365 
T1D*  0.5  16.1 (4.2)  17.2 (5.8)  7288 
Analysis does not include chromosome 6.
Estimates might not be comparable to GCR/LMM estimates due to different phenotype definitions.
References
 23andme website. Note: [https://www.23andme.com/. Online; accessed 20April2013] External Links: Link Cited by: Table 1.
 Heritability and familiality of type 2 diabetes and related quantitative traits in the botnia study. Diabetologia 54 (11), pp. 2811–2819. Cited by: Table 1.
 Finding the sources of missing heritability in a yeast cross. arXiv preprint arXiv:1208.2865. Cited by: Main text.
 A twostage approach to the correction of ascertainment bias in complex genetic studies involving variance components. Annals of Human Genetics 71 (2), pp. 220–229. Cited by: Main text.
 BHF website. Note: [http://www.bhf.org.uk/publications/viewpublication.aspx?ps=1002097. Online; accessed 8May2013] External Links: Link Cited by: Table 1.
 Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 (7145), pp. 661–678. Cited by: Main text, Main text, Table 1.
 Correcting for nonrandom ascertainment in generalized linear mixed models (glmms), fitted using gibbs sampling. Genetic epidemiology 24 (1), pp. 24–35. Cited by: Main text.
 Heritability of threshold characters. Genetics 35 (2), pp. 212. Cited by: Main text, Main text.
 Webbased genomewide association study identifies two novel loci and a substantial genetic component for parkinson’s disease. PLoS genetics 7 (6), pp. e1002141. Cited by: Summary paragraph, Main text, Main text.
 Power and predictive accuracy of polygenic risk scores. PLoS Genetics 9 (3), pp. e1003348. Cited by: Main text, Table 1.
 Heritability of bipolar spectrum disorders. unity or heterogeneity?. Journal of affective disorders. Cited by: Table 1.
 An introduction to the bootstrap (chapman & hall/crc monographs on statistics & applied probability). Cited by: Main text.
 Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics 11 (6), pp. 446–450. Cited by: Main text, Main text.
 Genetic liability of type 1 diabetes and the onset age among 22,650 young finnish twin pairs a nationwide followup study. Diabetes 52 (4), pp. 1052–1055. Cited by: Table 1.
 Heritability of daytime ambulatory blood pressure in an extended twin design. Hypertension 45 (1), pp. 80–85. Cited by: Table 1.
 Concordance rates of insulin dependent diabetes mellitus: a population based study of young danish twins. BMJ 311 (7010), pp. 913–917. Cited by: Table 1.
 Estimating the proportion of variation in susceptibility to schizophrenia captured by common snps. Nature genetics 44 (3), pp. 247–250. Cited by: Summary paragraph, Main text, Main text.
 Estimation and partitioning of polygenic variation captured by common snps for alzheimer’s disease, multiple sclerosis and endometriosis. Human molecular genetics 22 (4), pp. 832–841. Cited by: Summary paragraph, Main text, Main text.
 Estimating missing heritability for disease from genomewide association studies. The American Journal of Human Genetics 88 (3), pp. 294–305. Cited by: Summary paragraph, Main text, Main text, Main text, Main text, Main text, Simulations using a generative model, Figure 1, Table 1, Heuristic correction for the LMM approach.
 The case of the missing heritability. Nature 456 (7218), pp. 18–21. Cited by: Main text.
 Finding the missing heritability of complex diseases. Nature 461 (7265), pp. 747–753. Cited by: Main text, Main text.
 Genotype and snp calling from nextgeneration sequencing data. Nature Reviews Genetics 12 (6), pp. 443–451. Cited by: Main text.
 Robust ascertainmentadjusted parameter estimation. Genetic epidemiology 29 (1), pp. 68–75. Cited by: Main text.
 Population structure and eigenanalysis. PLoS genetics 2 (12), pp. e190. Cited by: Main text, Simulations using a generative model.
 Heritability of type ii (noninsulindependent) diabetes mellitus and abnormal glucose tolerance–a populationbased twin study. Diabetologia 42 (2), pp. 139–145. Cited by: Table 1.
 Logistic disease incidence models and casecontrol studies. Biometrika 66 (3), pp. 403–411. Cited by: Heritability estimation using genetic correlation regression.
 Singletissue and crosstissue heritability of gene expression via identitybydescent in related or unrelated individuals. PLoS Genetics 7 (2), pp. e1001317. Cited by: Main text.
 Crohn’s disease: the genetic contribution.. Gut 34 (7), pp. 869–871. Cited by: Table 1.
 Assumptions for different ascertainment models in human genetics. Biometrics, pp. 523–527. Cited by: Heritability estimation using genetic correlation regression.
 Heritability in the genomics eraâconcepts and misconceptions. Nature Reviews Genetics 9 (4), pp. 255–266. Cited by: Main text.
 Sizing up human height variation. Nature genetics 40 (5), pp. 489–490. Cited by: Main text.
 The genetic interpretation of area under the roc curve in genomic profiling. PLoS genetics 6 (2), pp. e1000864. Cited by: Table 1.
 Common snps explain a large proportion of the heritability for human height. Nature genetics 42 (7), pp. 565–569. Cited by: Summary paragraph, Main text, Table 1.
 GCTA: a tool for genomewide complex trait analysis. American journal of human genetics 88 (1), pp. 76. Cited by: (a), Simulations using a generative model, Table 1.
 Genome partitioning of genetic variation for complex traits using common snps. Nature genetics 43 (6), pp. 519–525. Cited by: Main text.
 Polygenic modeling with bayesian sparse linear mixed models. PLoS Genetics 9 (2), pp. e1003264. Cited by: Main text.
 The mystery of missing heritability: genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences 109 (4), pp. 1193–1198. Cited by: Main text, Main text.