Integrating summarized data from multiple genetic variants in Mendelian randomization: bias and coverage properties of inversevariance weighted methods
Abstract
Mendelian randomization is the use of genetic variants as instrumental variables to assess whether a risk factor is a cause of a disease outcome. Increasingly, Mendelian randomization investigations are conducted on the basis of summarized data, rather than individuallevel data. These summarized data comprise the coefficients and standard errors from univariate regression models of the risk factor on each genetic variant, and of the outcome on each genetic variant. A causal estimate can be derived from these associations for each individual genetic variant, and a combined estimate can be obtained by inversevariance weighted metaanalysis of these causal estimates. Various proposals have been made for how to calculate this inversevariance weighted estimate. In this paper, we show that the inversevariance weighted method as originally proposed (equivalent to a twostage least squares or allele score analysis using individuallevel data) can lead to overrejection of the null, particularly when there is heterogeneity between the causal estimates from different genetic variants. Randomeffects models should be routinely employed to allow for this possible heterogeneity. Additionally, overrejection of the null is observed when associations with the risk factor and the outcome are obtained in overlapping participants. The use of weights including secondorder terms from the delta method is recommended in this case.
Summarized data in Mendelian randomization: bias and coverage
Mendelian randomization \kwdinstrumental variables \kwdsummarized data \kwdweak instruments \kwdcausal inference
1 Introduction
Mendelian randomization is the use of genetic variants as instrumental variables to investigate the causal effect of a modifiable risk factor on an outcome using observational data (Burgess and Thompson, 2015). Mendelian randomization analyses are increasingly performed using summarized data, rather than individuallevel data (Burgess et al., 2015). There are various methods for combining the estimates from multiple genetic variants into a single causal estimate (Burgess, Butterworth and Thompson, 2013). In particular, an inversevariance weighted method has been proposed (Johnson, 2013) that is equivalent (for a particular choice of weights) to the standard twostage least squares method usually employed with individuallevel data (Burgess, Dudbridge and Thompson, 2015a). However, different authors have used different formulae for estimating the variances of the estimates that are used as weights (Dastani et al., 2012; Shen and Zhan, 2015). Additionally, some authors have used fixedeffect metaanalysis for the combination of estimates from different genetic variants (Nelson et al., 2015), whereas other authors have used randomeffects metaanalysis (Ahmad et al., 2015).
In this paper, we compare the bias and coverage properties of estimates from the inversevariance weighted method for different choices of weights, and using fixedeffect, additive randomeffects, and multiplicative randomeffects models for combining the estimates. In Section 2, we introduce the inversevariance weighted method, and demonstrate its equivalence to both a twostage least squares analysis and to a weighted linear regression of the association estimates. We also present the different versions of the method that are investigated further in this paper. In Section 3, we provide an example analysis that was the motivation for this work. In this example, subtly different choices in the analysis method result in estimates that differ considerably and lead to substantively different conclusions. In Section 4, we perform a simulation study to compare the bias and coverage properties of the different versions of the method. Finally, in Section 5, we discuss the findings of this paper and their relevance to applied practice.
2 Methods
We provide a brief introduction to Mendelian randomization – the use of genetic variants as instrumental variables; further introductory references to the subject area are available (Davey Smith and Ebrahim, 2003; Lawlor et al., 2008; Schatzkin et al., 2009). The objective of Mendelian randomization is to judge whether intervention on a modifiable risk factor would affect a disease outcome. This is achieved by testing whether genetic variants that satisfy the assumptions of an instrumental variable for the risk factor are associated with the outcome. An instrumental variable is a variable that is associated with the risk factor, but not associated with confounders of the risk factor–outcome association, nor is there any causal pathway from the instrumental variable to the outcome except for that via the risk factor (see Greenland (2000); Martens et al. (2006) for further information on instrumental variables). This means that the genetic variant is an unconfounded proxy for variation in the risk factor, and therefore can be treated as similar to treatment assignment in a randomized trial, where the treatment is to change the level of the risk factor (Nitsch et al., 2006). Similarly to an intentiontotreat analysis in a randomized trial, an association between such a genetic variant and the outcome implies a causal effect of the risk factor (VanderWeele et al., 2014). Additionally, under further parametric assumptions, the magnitude of the causal effect of the risk factor on the outcome can be estimated (Didelez, Meng and Sheehan, 2010). In this paper, we assume that the effect of the risk factor on the outcome is linear with no effect modification, and the associations of the genetic variants with the risk factor and with the outcome are linear without effect modification (Didelez and Sheehan, 2007):
(1)  
where is the risk factor, are the genetic variants, is the outcome, is an unmeasured confounder, is the dooperator of Pearl meaning that the value of the risk factor is set to by intervention (Pearl, 2000), and the causal effect parameter for all . We also assume that the effects of the genetic variants on the risk factor are the same in all individuals. Although these assumptions are not necessary to identify a causal parameter (weaker assumptions have been proposed (Swanson and Hernán, 2013)), alternative assumptions mean that the causal parameters identified by different instrumental variables are likely to be different. While these assumptions are restrictive, a causal estimate has an interpretation as a test statistic for the null hypothesis that the risk factor is not causal for the outcome without requiring the assumptions of linearity and homogeneity of the genetic effects on the risk factor (Burgess, Butterworth and Thompson, 2015).
We assume that summarized data are available in the form of association estimates (betacoefficients and standard errors) with the risk factor and with the outcome for genetic variants that are instrumental variables. The association estimates with the risk factor are denoted with standard error ; association estimates with the outcome are denoted with standard error . The genetic variants are assumed to be independently distributed (that is, not in linkage disequilibrium).
2.1 Standard inversevariance weighted method
The ratio estimate of the causal effect of the risk factor on the outcome based on the th genetic variant is (Lawlor et al., 2008). We refer to this as . The variance of the ratio of two random variables can be calculated using the delta method; the formula including first and secondorder terms for the variance of is:
(2) 
where is the correlation between and (Thomas, Lawlor and Thompson, 2007). This can be rewritten in terms of the causal estimates as:
(3) 
Assuming that the correlation between and is zero (this would be the case if the associations with the risk factor and with the outcome were estimated in nonoverlapping datasets – known as a twosample analysis (Pierce and Burgess, 2013)), the variance is:
(4) 
If only the firstorder term from the delta formula is taken, then the variance is:
(5) 
The inversevariance weighted (IVW) estimate is a weighted mean of the causal estimates from each genetic variant considered individually:
(6) 
This is equivalent to metaanalysing the causal estimates from each genetic variant using the standard inversevariance weighted formula (hence the name “inversevariance weighted estimate”) under a fixedeffect model (Borenstein et al., 2009). Using the firstorder variance estimates (equation 5), the IVW estimate is:
(7) 
This is the same estimate as would be obtained from a weighted linear regression of the coefficients on the coefficients with no intercept term, using the as weights.
Using the firstorder weights and assuming a fixedeffect model (Section 2.3), the standard error is:
(8) 
This is the form of the inversevariance weighted estimate as it was initially proposed (\al@johnson2013vig, ehret2011, dastani2012; \al@johnson2013vig, ehret2011, dastani2012; \al@johnson2013vig, ehret2011, dastani2012).
2.2 Equivalence to twostage least squares estimate
The inversevariance weighted estimate using firstorder weights is also equal to the estimate obtained from the twostage least squares method that is commonly used with individuallevel data (sample size ). If the we write the risk factor as (usually an matrix, although the result can be generalized for multiple risk factors (Burgess, Dudbridge and Thompson, 2015a)), the outcome as (an matrix), and the instrumental variables as (an matrix), then the twostage least squares estimate of causal effects (Baum, Schaffer and Stillman, 2003) is:
This estimate can be obtained by sequential regression of the risk factor on the instrumental variables, and then the outcome on fitted values of the risk factor from the firststage regression.
Regression of on gives betacoefficients with standard errors the square roots of the diagonal elements of the matrix where is the residual standard error. If the instrumental variables are perfectly uncorrelated, then the offdiagonal elements of are all equal to zero. Regression of on gives betacoefficients . Weighted linear regression of the betacoefficients on the betacoefficients using the inversevariance weights gives an estimate:
The assumption of uncorrelated instrumental variables ensures that the regression coefficients from univariate regressions (as in the regressionbased methods) equal those from multivariable regression (as in the twostage least squares method). In practice, the twostage least squares and weighted regressionbased estimates will differ slightly as there will be nonzero correlations between the genetic variants in finite samples, even if the variants are truly uncorrelated in the population. However, these differences are likely to be slight, and to tend to zero asymptotically (Burgess, Dudbridge and Thompson, 2015b).
2.3 Fixed versus randomeffects
A fixedeffect metaanalysis assumes that the causal effects targeted by each genetic variant are all equal. While this would be true if all the genetic variants are valid instrumental variables, and also under the additional linearity assumptions stated above, this may not be true in practice. For instance, genetic variants may affect the exposure via different mechanisms, leading to different magnitudes of effect on the outcome. Alternatively, some variants may have direct effects on the outcome that do not pass via the risk factor, and hence not all genetic variants may be valid instrumental variables. To combat heterogeneity in the causal effects identified by each genetic variant, a randomeffects metaanalysis may be preferred. We outline two ways to model this heterogeneity: an additive randomeffects model, and a multiplicative randomeffects model.
2.4 Additive and multiplicative randomeffects models
In a fixedeffect metaanalysis, we assume that the estimates from each instrumental variable can be modelled as normally distributed with common mean and variance . In a randomeffects metaanalysis, the mean values are additionally assumed to vary (Higgins, Thompson and Spiegelhalter, 2009). In an additive randomeffects model, the are assumed to be normally distributed with mean and variance . Any additional variability beyond that predicted by the fixedeffect model () is interpreted as heterogeneity between the causal effects targeted by each instrumental variable. An estimate of the heterogeneity parameter is often obtained by a method of moments estimator, developed by DerSimonian and Laird (DerSimonian and Laird, 1986).
In a multiplicative randomeffects model, the estimates are assumed to be normally distributed with mean and variance . This model can be fitted by linear regression of the on the using the as weights. A fixedeffect model can be fitted by setting the residual standard error in the regression model to be one; this can be achieved after fitting the regression model by dividing the standard error by the estimate of the residual standard error (Thompson and Sharp, 1999). A multiplicative randomeffects model can be fitted by allowing the residual standard error (which is equivalent to the heterogeneity parameter ) to be estimated as part of the model. The multiplicative randomeffects model is therefore equivalent to an overdispersed regression model. In case of underdispersion (that is, the estimated residual standard error is less than one), the standard errors should be fixed by setting , as any underdispersion is assumed to occur by chance, and not to be empirically justified.
The point estimate from a fixedeffect metaanalysis is identical to that from a multiplicative randomeffects metaanalysis (Thompson and Sharp, 1999). However, it differs to that from an additive randomeffects metaanalysis when , as the weights in the randomeffects metaanalysis are inflated to account for heterogeneity. As heterogeneity increases, weights become more similar, which results in estimates with low weights being upweighted (relatively speaking) in an additive randomeffects metaanalysis.
2.5 Weak instrument bias
Although instrumental variable estimates are consistent (and so they are asymptotically unbiased), they can suffer from substantial bias in finite samples (Staiger and Stock, 1997; Stock, Wright and Yogo, 2002). This bias, known as ‘weak instrument bias’, occurs when the instrumental variables explain a small proportion of variance in the risk factor (Burgess and Thompson, 2011). In a conventional Mendelian randomization analysis in which the risk factor and outcome are measured in the same participants (a onesample analysis), weak instrument bias is in the direction of the observational association between the risk factor and the outcome (Burgess, Thompson and CRP CHD Genetics Collaboration, 2011). It can also lead to overly narrow confidence intervals and overrejection of the causal null hypothesis (Stock and Yogo, 2002). Bias from the inversevariance weighted method using the firstorder weights and a fixedeffect model has been shown to be similar to that from the twostage least squares method in a realistic simulation study (Burgess, Butterworth and Thompson, 2013). However, bias and coverage properties have not been investigated for different choices of the weights or for randomeffects models.
3 Motivating example: analysis of the causal effect of early menopause on triglycerides
This paper was motivated by a particular implementation of two versions of the inversevariance weighted method with different choices of weights that gave substantially different answers. A Mendelian randomization analysis was performed to assess the causal effect of early menopause risk on triglycerides using 47 genetic variants. Associations of the genetic variants with early menopause (and their standard errors) were obtained from Day et al. (2015); associations represent number of years earlier menopause per additional effect allele. Associations of the genetic variants with triglycerides (and their standard errors) were obtained from the The Global Lipids Genetics Consortium (2013). These associations are provided in Appendix Table A1 and displayed graphically in Appendix Figure A1. Analyses for the motivating example were performed in Microsoft Excel (Windows 2000 version) and R (version 3.1.2) (R Core Team, 2014).
Fixedeffect inversevariance weighted methods were performed using the secondorder weights (equation 4) and the firstorder weights (equation 5). The weights were substantially the same in both cases; 35 out of the 47 weights differed by less than 5%, and 44 of the weights differed by less than 10%. Using the secondorder weights (equation 4), the causal effect of early menopause on triglycerides was estimated as 0.0021 (standard error, 0.0037; 95% confidence interval: 0.0052, 0.0095). Using the firstorder weights (equation 5), the causal effect estimate was 0.0103 (standard error, 0.0036; 95% confidence interval: 0.0032, 0.0175). These estimates represent the change in triglycerides in standard deviation units per 1 year earlier menopause. The applied implications of this analysis are not the focus of this paper, and depend on the validity of the instrumental variable assumptions for the genetic variants used in the analysis. However, the magnitude of the difference between the estimates (over twice the standard error of the estimates) is striking, and the conclusions from the two analyses would be diametrically opposite. In the first case, the causal null hypothesis that early menopause is a causal risk factor for triglycerides would not be rejected (), whereas in the second case, the causal null hypothesis would be rejected (). By comparison, using the firstorder weights and a multiplicative random effects model, the standard error is 0.0103, meaning that the causal null hypothesis would not be rejected ().
It turns out that the genetic variant with the greatest difference between the first and secondorder weights is rs704795, the variant that also has the greatest causal estimate. The estimate from this variant is heavily downweighted in the analysis using the secondorder weights compared with using the firstorder weights. Omitting this variant from the analysis led to similar estimates using the second and firstorder weights (0.0000 versus ). Another interesting observation is that use of the secondorder weights reduced heterogeneity between the causal estimates from each genetic variant (for example, in the multiplicative randomeffects model, was 1.69 using the secondorder weights compared with 2.83 using the firstorder weights). This suggests that, even though the secondorder standard errors for the causal estimates from the individual variants will always be greater than the firstorder standard errors, precision of the overall causal estimate under a randomeffects model may be improved by using the secondorder weights when there is heterogeneity between the causal estimates (in this example, in the multiplicative randomeffects model using the secondorder weights, using the firstorder weights).
Estimates from each of the methods are summarized in Table 1.
In general, genetic variants that have large values of and/or small values of will be downweighted by the secondorder weights. This means that genetic variants that have large and heterogeneous effects on the outcome compared with other variants and/or are weak will be downweighted. Further methodological investigation is therefore needed to investigate the impact on the bias and coverage properties of inversevariance weighted methods for Mendelian randomization analyses, and which of the versions of the method should be preferred in applied practice.
4 Simulation study
In this manuscript, we consider estimates from the inversevariance weighted method using weights from equations (4, secondorder) and (5, firstorder), and fixedeffect, additive randomeffects, and multiplicative randomeffects models for combining the estimates from different genetic variants. Code for implementing these methods is provided in the Appendix. Analyses for the simulation study were performed in R (version 3.1.2).
The datagenerating model is as follows:
(9)  
Individuals are indexed by . The 20 genetic variants , indexed by , are drawn from binomial distributions, corresponding to single nucleotide polymorphisms (SNPs) with minor allele frequency . The risk factor is a linear combination of the genetic variants, a confounder () that is assumed to be unmeasured, and an independent error term (). The risk factor is a linear combination of the genetic variants, the risk factor, confounder, and a further independent error term (). The per allele effects of the genetic variants on the risk factor () are drawn from a normal distribution with mean and variance . The direct effects of the genetic variants on the outcome (, these effects are not via the risk factor) are zero when the genetic variants are valid instrumental variables. The causal effect of the risk factor on the outcome, the main parameter of interest, is . The effect of the confounder on the outcome is .
We consider four scenarios:

a onesample analysis in which the genetic variants are all valid instrumental variables;

a onesample analysis in which the genetic variants have direct effects on the outcome;

a twosample analysis in which the genetic variants are all valid instrumental variables;

and a twosample analysis in which the genetic variants have direct effects on the outcome.
In scenarios 1 and 2, data are generated for participants, and associations with the risk factor and with the outcome are estimated in these participants. In scenarios 3 and 4, data are generated for participants. Associations with the risk factor are estimated in the first 5000 participants, and associations with the outcome in the second 5000 participants. Twosample analyses are common in Mendelian randomization, particularly when the association estimates are obtained from publicly available data sources (Burgess et al., 2015). In a twosample analysis, weak instrument bias acts in the direction of the null, and hence should not lead to misleading inferences (Pierce and Burgess, 2013). However, it is common that many participants in large genetic consortia overlap, such that even if the associations with the risk factor and with the outcome are obtained from separate consortia, they may not be estimated in separate participants. Hence, the onesample and twosample settings are both of interest in this paper.
In scenarios 1 and 3, the parameters are all set to zero, and the genetic variants are all valid instrumental variables. In scenarios 2 and 4, the parameters are drawn from a normal distribution with mean 0 and variance . This is a situation known as “balanced pleiotropy” (Bowden et al., 2015). Pleiotropy refers to a genetic variant having an independent effect on the outcome that is not via the risk factor (Davey Smith and Hemani, 2014). Balanced pleiotropy means that the pleiotropic effects for all strengths of instrument have mean zero. Such pleiotropic effects should induce heterogeneity between the causal estimates using different genetic variants. Simulations conducted under a multiplicative randomeffects model with balanced pleiotropy have suggested that estimates may not be biased on average (Bowden, Davey Smith and Burgess, 2015). Additional simulations for the case of directional (that is, unbalanced) pleiotropy are considered in the Appendix.
Four sets of parameters are considered – two values of the causal effect: (null causal effect), and (positive causal effect); and two values of the confounder effect: (positive confounding), and (negative confounding). Additionally, four values of instrument strength are considered for each set of parameters: . 10 000 simulated datasets are generated in each case.
4.1 Results
Scenarios 1 and 2: Results from scenario 1 (onesample, valid instruments) and scenario 2 (onesample, invalid instruments) are presented in Table 2. For each value of the instrument strength, set of parameters, and scenario, the mean estimate and empirical power of the 95% confidence interval (estimate plus or minus 1.96 times the standard error) to reject the null hypothesis is given. The coverage is 100% minus the power; power under the null hypothesis should be 5%. The Monte Carlo standard error for the mean estimate is around 0.001 or less, and for the power is 0.2% under the null, and at most 0.5% otherwise. Additionally, to judge the instrument strength, the mean F statistic and the mean coefficient of determination ( statistic) are given in each case.
With a null causal effect, the results demonstrate the wellknown bias and inflated Type 1 error rate of instrumental variable estimates with weak instruments in a onesample setting. Although bias is similar for both choices of weights (slightly less with the firstorder weights), coverage rates are much worse with the firstorder weights. Neither the additive nor the multiplicative randomeffects models detect heterogeneity in the vast majority of cases (particularly for weaker instruments) with the secondorder weights. With the firstorder weights, heterogeneity is detected in a greater proportion of simulated datasets. In scenario 1, heterogeneity is not present in the underlying datagenerating model, and only estimated by chance; in scenario 2, heterogeneity is expected. For the secondorder weights, coverage properties are similar in scenarios 1 and 2; whereas for the firstorder weights, coverage properties are worse in scenario 2 for the fixedeffect model, but improved for the randomeffects models. For weaker instruments, coverage properties are best using the secondorder weights, whereas for stronger instruments, estimates using the firstorder weights and a randomeffects model perform almost as well, and occasionally better particularly when there is heterogeneity (scenario 2). However, there is inflation of Type 1 error rates even in the bestcase scenarios.
With a positive causal effect, estimates with the firstorder weights generally have better power to detect a causal effect than those using the secondorder weights, particularly with weaker instruments. However, in the light of the Type 1 error rate inflation, this property should not be overvalued. Making fewer Type 2 errors (fewer false negative findings) at the expense of making more Type 1 errors (more false positive findings) is not generally a desirable tradeoff.
Additional results from scenarios 1 and 2 are presented in Appendix Table A3. For each value of the instrument strength, the (Monte Carlo) standard deviation and the mean standard error of estimates are presented. This helps judge whether uncertainty in the effect estimates is correctly accounted for in the standard errors.
The estimates using secondorder weights are the least variable throughout, with the lowest standard deviations. The standard deviation of estimates using secondorder weights was always less than the mean standard error of the estimates. In contrast, for scenario 1, the estimates using firstorder weights were more variable, but generally had lower average standard errors. This was always true for the fixedeffect analyses, and usually true for the randomeffects analyses. However, when there was heterogeneity in the causal estimates identified by the instrumental variables (scenario 2), mean standard errors for the randomeffects analyses using firstorder weights could be greater than those using secondorder weights, despite the secondorder standard errors for each causal estimate being uniformly than the firstorder standard errors. In scenario 2, mean standard errors for the fixedeffect analyses were generally similar to those in scenario 1, but the standard deviations of the estimates were increased. For the randomeffects analyses using the firstorder weights in scenario 2, mean standard errors and standard deviations were similar in magnitude. However, mean standard errors using the secondorder weights were typically slightly lower, with no loss in coverage (recall Table 2).
Under the null, standard deviations and mean standard errors are similar whether there is positive or negative confounding, whereas under the alternative, standard errors appear to be wider when confounding is in the same direction as the causal effect, and narrower when confounding is in the opposite direction. This has previously been observed (Burgess and Thompson, 2012); see Figure 3 of that reference for a potential explanation.
Scenarios 3 and 4: Results from scenario 3 (twosample, valid instruments) and scenario 4 (twosample, invalid instruments) are presented in Table 3 for the mean and power and in Appendix Table A4 for the standard deviation and standard error. These results demonstrate the wellknown bias in the direction of the null in the twosample setting.
With a null causal effect, no bias is observed. Coverage levels for the secondorder weights are conservative, with power below the nominal 5% level. By contrast, in scenario 3, coverage levels with the firstorder weights are close to nominal levels, with slight undercoverage for randomeffects models. In scenario 4, there is inflation of Type 1 error rates with the firstorder weights for a fixedeffect model, but coverage for both the additive and multiplicative randomeffects models is close to nominal levels.
With a positive causal effect, bias is in the direction of the null. The bias is more severe using the secondorder weights. Power to detect a causal effect is substantially lower using the secondorder weights than using the firstorder weights, particularly for weaker instruments.
For the firstorder weights, mean standard errors are fairly close to the standard deviations of estimates for the fixedeffect model when there is no heterogeneity in the causal effects, and for the randomeffects models when there is heterogeneity in the causal effects. In contrast, for the secondorder weights, the mean standard errors are larger than the standard deviations throughout. This corresponds with the coverage properties: in a twosample setting using firstorder weights, estimates are unbiased under the null with correct rejection rates, whereas using secondorder weights, rejection rates are conservative.
Choice of randomeffects model: As for choosing between the additive and multiplicative randomeffects models, with the secondorder weights, there was little difference between the results, or even with the results for a fixedeffect model. However, as viewed in the motivating example, there will be a difference if the level of heterogeneity is increased. With the firstorder weights, bias was generally slightly less with the additive randomeffect model. Coverage under the null was better with an additive randomeffects model, and power to detect a causal effect was better with a multiplicative randomeffects model. However, differences were slight. Because of the better properties under the null, we therefore prefer the additive randomeffects model for the scenarios considered in this paper, although the preference is not a strong one.
Directional pleiotropy: Results with directional pleiotropy are presented in Appendix Table A5. In brief, the results echo those with no pleiotropy and with balanced pleiotropy: the importance of randomeffects models, and the preference for use of secondorder weights in a onesample setting, and firstorder weights in a twosample setting.
4.2 Additional scenario: extreme outlying variants
In the motivating example, the difference between estimates seemed to be driven by a single rogue variant. In order to better evaluate bias and coverage in this scenario, we considered an additional simulation scenario 5. Rather than generating the direct effects of the genetic variants on the outcome (the parameters) from a normal distribution with mean 0 and variance , instead we generated them from a distribution with 2 degrees of freedom, and multiplied the result by 0.02. The distribution with a small number of degrees of freedom has much heavier tails than a normal distribution, and so extreme outliers will be more frequent. With 2 degrees of freedom, the variance of the distribution is not even defined. Simulation results are only considered in the onesample setting and under the null (), as inflated Type 1 error rates in this scenario are the primary concern.
In Table 4, results are given for the inversevariance weighted methods with different choices of weights and different models for combining the estimates. With a fixedeffect model, coverage rates for the secondorder weights are similar to those in scenario 2 with the normally distributed direct effects. For the firstorder weights, coverage rates are substantially worse and well above the nominal 5% level even for the strongest instruments considered in this paper, although bias is similar to that in scenario 2. This corresponds to the motivating example, in which the outlying variant had a large influence on the pooled estimate using the firstorder weights, but was heavily downweighted using the secondorder weights. However, for a randomeffects model using the secondorder weights, particularly with the multiplicative randomeffects model and for the additive randomeffects model with weaker instruments, results were similar to those with a fixedeffect model. In contrast, for a randomeffects model with the firstorder weights, mean estimates were generally closer to the null (with one notable exception – scenario 5b, – that was mostly driven by a single aberrant estimate) and coverage levels were much improved. Coverage levels with a randomeffects model were generally slightly better with the firstorder weights than with the secondorder weights, although not uniformly and the difference was slight. As observed in the motivating example, and particularly with weaker instruments, heterogeneity is more often detected using the firstorder weights, as the secondorder weights tend to downweight the influence of the outlying variants.
5 Discussion
Several highprofile Mendelian randomization analyses have employed summarized data and some version of an inversevariance weighted method. These include analyses of the causal effect of blood pressure on coronary heart disease risk (Ehret et al., 2011), height on coronary heart disease risk (Nelson et al., 2015), adiponectin on type 2 diabetes risk (Dastani et al., 2012), lipids on type 2 diabetes risk (Fall et al., 2015), and telomere length on risk of various cancers (Zhang et al., 2015), amongst several others. The statistical properties of estimates from the inversevariance weighted method are therefore of considerable interest.
In this paper, we demonstrated that Type 1 error rates for the inversevariance weighted method as it was initially proposed (firstorder weights, fixedeffect model) are likely to be inflated in a onesample Mendelian randomization setting either when the instruments are weak, or when there is heterogeneity between the causal estimates targeted by different genetic variants. This can be resolved either by using secondorder weights or a randomeffects model to combine the estimates from multiple genetic variants. These approaches affect the analysis in different ways: the secondorder weights tend to downweight the influence of weak and heterogeneous variants on the overall causal estimate, whereas the randomeffects models tend to increase standard errors by allowing for heterogeneity between the causal estimates in the model. While both approaches can be applied simultaneously, our simulations indicate that heterogeneity is less substantial when using the secondorder weights. However, there is little disadvantage in assuming a randomeffects model, as in the absence of heterogeneity, the fixedeffect analysis is recovered, and in the presence of heterogeneity, the randomeffects analysis is more appropriate. Our results provide slight preference for an additive randomeffects model over a multiplicative randomeffects model.
In a twosample Mendelian randomization setting, weak instruments do not lead to inflated Type 1 error rates but rather attenuate of estimates towards the null. The use of secondorder weights was demonstrated to lead to conservative inference, whereas firstorder weights gave correct coverage rates under the null. When there was heterogeneity in the causal estimates from different genetic variants, which was simulated to arise due to genetic variants having pleiotropic effects, a fixedeffect model with firstorder weights was shown to lead to undercoverage, although this was corrected by use of a randomeffects model.
A conclusion from this paper is the need to assess heterogeneity between the causal estimates from different genetic variants prior to performing a Mendelian randomization analysis based on multiple genetic variants, for example by a scatter plot of the gene–risk factor and gene–outcome associations (Appendix Figure A1). The presence of heterogeneous variants is likely to indicate violation of the instrumental variable assumptions for some of the variants, and can lead to misleading estimates as observed in the motivating example. Assessment for heterogeneity is also relevant when performing an analysis using individuallevel data, for example using a twostage least squares or allele score method.
5.1 Limitation of simulation studies
Our conclusions are limited as they are based on simulation studies. This is by necessity, as the properties of the estimators that we want to assess are finitesample properties, not asymptotic properties. Our findings may have differed if we had considered a different datagenerating mechanism, or more substantial heterogeneity between estimates from genetic variants. However, the findings are in line with theoretical considerations, and we believe the scenarios that we have chosen to be representative of a typical Mendelian randomization investigation in practice.
5.2 Unbalanced pleiotropy and robust methods (Egger regression, medianbased approaches)
In particular, we mostly considered scenarios in this paper corresponding to balanced pleiotropy. In the case of unbalanced (or directional) pleiotropy, causal estimates from inversevariance weighted methods are biased and Type 1 error rates are inflated in all settings, even in the asymptotic limit (Bowden et al., 2015). This can be resolved in a number of ways. In Egger regression, we perform a weighted linear regression of the gene–outcome association estimates () on the gene–risk factor association estimates () in the same way as in an inversevariance weighted method, except that an intercept term is included in the regression model. This intercept term represents the average direct effect of the genetic variants on the outcome. (It is additionally required that all genetic variants are orientated such that the estimates are all positive, or are all negative.) The causal estimate from Egger regression is the slope parameter from this regression model. It is a consistent estimate of the causal effect under the alternative assumption that the direct effects of the genetic variants are uncorrelated with the instrument strength; this is known as the InSIDE (instrument strength independent of direct effect) assumption. In the notation of the datagenerating model of equation (9), the parameters must be uncorrelated with the parameters; in the balanced pleiotropy examples of this paper, these parameters are drawn from independent distributions. This is a weaker assumption than the standard instrumental variable assumptions (the parameters all equal zero) or the assumption of balanced pleiotropy (the parameters have mean zero).
Similar considerations as to the choice of weights in Egger regression could be considered; the original proposal was equivalent to using the firstorder weights. Informal simulations (not presented) have suggested that the same conclusions from this paper also hold for Egger regression (particularly the use of randomeffects models). However, a full investigation would require simulating data with unbalanced pleiotropy (potentially both when the InSIDE assumption is satisfied and when it is violated); this is considered to be beyond the scope of this paper.
One notable difference about Egger regression is that if the genetic variants are allowed to have direct effects on the outcome, then heterogeneity in the causal estimates from individual variants is expected. Therefore, while heterogeneity in an inversevariance weighted analysis is unwelcome and a potential sign that the assumptions are not satisfied, heterogeneity in the Egger method is a natural consequence of weakening the instrumental variable assumptions and does not necessarily invalidate the analysis.
Another approach for dealing with unbalanced pleiotropy is a medianbased approach. The median of the causal estimates from each of the genetic variants taken individually is a consistent estimate of the causal effect under the assumption that at least 50% of the genetic variants are valid instrumental variables (Han, 2008). This is a different assumption to the InSIDE assumption, and neither assumption includes all cases of the other. Confidence intervals for the median can be obtained by bootstrapping; we suggest estimating a bootstrap standard error and forming confidence intervals from the standard error (Bowden et al., 2015). A weighted median estimator can also be obtained using inversevariance weights in a weighted median function (Bowden et al., 2015). This method may have better asymptotic properties than an inversevariance weighted method in a number of cases, as outlying estimates do not influence the median of the distribution. Simulations performed using second and firstorder weights from the delta method suggested that weighted median estimates were not sensitive to the particular choice of weighting function. In a medianbased approach, the choice of weights influences not only the bias and variability of estimates, but also the identification condition, as the consistency criterion for a weighted median estimator is that 50% of the weight in the analysis corresponds to valid instrumental variables. Hence, in some cases, the simple (unweighted) median estimator may be preferred even if it is less efficient.
5.3 Overlap between the samples in a ‘twosample’ analysis
In practice, before following the recommendation to use firstorder weights in a twosample Mendelian randomization setting, it is advisable to check whether the samples used to estimate the gene–risk factor and the gene–outcome associations truly do not overlap. In the motivating example of the paper, genetic associations with early menopause are obtained from a consortium of 33 studies, and genetic associations with triglycerides from a consortium of 23 studies. Although the consortia appear to be different, in fact, at least 17 of the studies are included in both consortia, meaning that the analysis is not a true twosample analysis. It is not clear exactly the extent of the overlap without having the individuallevel data, but it is likely to be substantial.
Although the full secondorder expression for the variance of a causal estimate (equation 2) includes a term that depends on the overlap between the two datasets, in this paper we have set even in a onesample setting. This was undertaken for computational simplicity in the simulation study setting. If the individuallevel data were available, an estimate of could be obtained by bootstrapping the samples, and calculating the correlation between the bootstrapped distributions of and for each . However, this was infeasible in the simulation study. Additionally, if the individuallevel data are not available, it is unclear how to estimate . A sensitivity analysis can be performed for the value of ; results for the motivating example of this paper are shown in Appendix Table A2. We see that different choices of lead to similar causal estimates and 95% confidence intervals for each of the inversevariance weighted methods.
5.4 Interpretation of a randomeffects estimate
A theoretical concern in recommending the use of randomeffects models for Mendelian randomization is the interpretation of the randomeffects estimate. Under the assumptions of linearity and no effect modification, and in particular under the stable unit treatment value assumption (SUTVA (Cox, 1958) – this states that the effect on the outcome of modifying the risk factor should be the same for all possible interventions on the risk factor, also expressed as “no multiple versions of treatment” (VanderWeele and Hernán, 2013)), the causal estimates from different instrumental variables should target the same causal parameter. However, in reality, taking the context of the motivating example, different interventions on age at menopause (such as ooectomy, hysterectomy, and hormone therapy) may have different effects on triglyceride levels; similar heterogeneity is expected for genetic variants that affect age at menopause via different biological pathways. By allowing for heterogeneity in causal estimates from different genetic variants, the notion of a single causal effect of the risk factor on the outcome is lost, and it is not clear for what intervention on the risk factor the causal estimate is targeting. Additionally, if the choice of genetic variants changes, then the causal parameter also changes, as the randomeffects distribution is taken across a different set of variants. The randomeffects estimate is correctly interpreted not as targeting a common causal effect, but as targeting the average value of the distribution of causal effects identified by the different variants (Riley et al., 2011). This subtlety is not unique to causal estimation, rather it is relevant in metaanalysis more widely (Higgins, Thompson and Spiegelhalter, 2009). However, heterogeneity is more forgiveable in metaanalysis; it could be argued that any deviation from homogeneity should be interpreted as evidence that the instrumental variable assumptions are violated for at least one of the genetic variants, and so a causal estimate based on all the genetic variants should not be presented.
We take a practical approach, and view these theoretical concerns as secondary to the primary concern of obtaining reliable causal inferences (Burgess, Butterworth and Thompson, 2015). Our view is that a literal interpretation of causal effect estimates from Mendelian randomization is rarely justified, due to differences between the way in which genetic variants influence the risk factor and any potential clinical intervention on the risk factor in practice (Burgess et al., 2012). However, if there is substantial heterogeneity, or if there are individual genetic variants that clear outliers, then the overall causal estimate is likely to be unreliable even as a test of causality, and the instrumental variable assumptions should be examined carefully, particularly for the outlying variants.
5.5 Conclusion
In conclusion, in a Mendelian randomization analysis using summarized data in a (strict) twosample setting (that is, when there is no overlap between the datasets in which associations with the risk factor and with the outcome are estimated), the inversevariance weighted method with firstorder weights may be preferred, although a randomeffects model for combining the causal effects from the individual genetic variants should be used. In a onesample setting, or if there is any overlap between the datasets, then a randomeffects model using the secondorder weights should be preferred to avoid falsepositive findings. If the overlap is not substantial, then an analysis using the firstorder weights may be presented as a sensitivity analysis, as it may have increased power to detect a causal effect.
References
 Ahmad et al. (2015) {barticle}[author] \bauthor\bsnmAhmad, \bfnmOmar S\binitsO. S., \bauthor\bsnmMorris, \bfnmJohn A\binitsJ. A., \bauthor\bsnmMujammami, \bfnmMuhammad\binitsM., \bauthor\bsnmForgetta, \bfnmVincenzo\binitsV., \bauthor\bsnmLeong, \bfnmAaron\binitsA., \bauthor\bsnmLi, \bfnmRui\binitsR., \bauthor\bsnmTurgeon, \bfnmMaxime\binitsM., \bauthor\bsnmGreenwood, \bfnmCelia MT\binitsC. M., \bauthor\bsnmThanassoulis, \bfnmGeorge\binitsG., \bauthor\bsnmMeigs, \bfnmJames B\binitsJ. B. \betalet al. (\byear2015). \btitleA Mendelian randomization study of the effect of type2 diabetes on coronary heart disease. \bjournalNature Communications \bvolume6 \bpages7060. \bdoi10.1038/ncomms8060 \endbibitem
 Baum, Schaffer and Stillman (2003) {barticle}[author] \bauthor\bsnmBaum, \bfnmCF\binitsC., \bauthor\bsnmSchaffer, \bfnmME\binitsM. \AND\bauthor\bsnmStillman, \bfnmS\binitsS. (\byear2003). \btitleInstrumental variables and GMM: Estimation and testing. \bjournalStata Journal \bvolume3 \bpages1–31. \endbibitem
 Borenstein et al. (2009) {bbook}[author] \bauthor\bsnmBorenstein, \bfnmM.\binitsM., \bauthor\bsnmHedges, \bfnmL. V.\binitsL. V., \bauthor\bsnmHiggins, \bfnmJ. P. T.\binitsJ. P. T. \AND\bauthor\bsnmRothstein, \bfnmH. R.\binitsH. R. (\byear2009). \btitleIntroduction to metaanalysis. Chapter 34: Generality of the basic inversevariance method. \bpublisherWiley. \endbibitem
 Bowden, Davey Smith and Burgess (2015) {barticle}[author] \bauthor\bsnmBowden, \bfnmJack\binitsJ., \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG. \AND\bauthor\bsnmBurgess, \bfnmStephen\binitsS. (\byear2015). \btitleMendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. \bjournalInternational Journal of Epidemiology \bvolume44 \bpages512–525. \endbibitem
 Bowden et al. (2015) {bunpublished}[author] \bauthor\bsnmBowden, \bfnmJack\binitsJ., \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG., \bauthor\bsnmHaycock, \bfnmPhilip C\binitsP. C. \AND\bauthor\bsnmBurgess, \bfnmStephen\binitsS. (\byear2015). \btitleConsistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. \bnoteAvailable at https://www.academia.edu/15479132/Consistent. \endbibitem
 Burgess, Butterworth and Thompson (2013) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS., \bauthor\bsnmButterworth, \bfnmA.\binitsA. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2013). \btitleMendelian randomization analysis with multiple genetic variants using summarized data. \bjournalGenetic Epidemiology \bvolume37 \bpages658–665. \bdoi10.1002/gepi.21758 \endbibitem
 Burgess, Butterworth and Thompson (2015) {barticle}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmButterworth, \bfnmAdam S\binitsA. S. \AND\bauthor\bsnmThompson, \bfnmJohn R\binitsJ. R. (\byear2015). \btitleBeyond Mendelian randomization: how to interpret evidence of shared genetic predictors. \bjournalJournal of Clinical Epidemiology. \bdoi10.1016/j.jclinepi.2015.08.001 \endbibitem
 Burgess, Dudbridge and Thompson (2015a) {barticle}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmDudbridge, \bfnmFrank\binitsF. \AND\bauthor\bsnmThompson, \bfnmSimon G\binitsS. G. (\byear2015a). \btitleRe: “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects”. \bjournalAmerican Journal of Epidemiology \bvolume181 \bpages290–291. \endbibitem
 Burgess, Dudbridge and Thompson (2015b) {bunpublished}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmDudbridge, \bfnmF\binitsF. \AND\bauthor\bsnmThompson, \bfnmSG\binitsS. (\byear2015b). \btitleCombining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. \bnoteAvailable at https://www.academia.edu/15479109/Combining. \endbibitem
 Burgess and Thompson (2011) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2011). \btitleBias in causal estimates from Mendelian randomization studies with weak instruments. \bjournalStatistics in Medicine \bvolume30 \bpages1312–1323. \bdoi10.1002/sim.4197 \endbibitem
 Burgess, Thompson and CRP CHD Genetics Collaboration (2011) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS., \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmCRP CHD Genetics Collaboration (\byear2011). \btitleAvoiding bias from weak instruments in Mendelian randomization studies. \bjournalInternational Journal of Epidemiology \bvolume40 \bpages755–764. \bdoi10.1093/ije/dyr036 \endbibitem
 Burgess and Thompson (2012) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2012). \btitleImprovement of bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. \bjournalStatistics in Medicine \bvolume31 \bpages1582–1600. \bdoi10.1002/sim.4498 \endbibitem
 Burgess and Thompson (2015) {bbook}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS. \AND\bauthor\bsnmThompson, \bfnmSimon G\binitsS. G. (\byear2015). \btitleMendelian randomization: methods for using genetic variants in causal estimation. \bpublisherChapman & Hall. \endbibitem
 Burgess et al. (2012) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmButterworth, \bfnmA\binitsA., \bauthor\bsnmMalarstig, \bfnmA\binitsA. \AND\bauthor\bsnmThompson, \bfnmSG\binitsS. (\byear2012). \btitleUse of Mendelian randomisation to assess potential benefit of clinical intervention. \bjournalBritish Medical Journal \bvolume345 \bpagese7325. \bdoi10.1136/bmj.e7325 \endbibitem
 Burgess et al. (2015) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmScott, \bfnmRA\binitsR., \bauthor\bsnmTimpson, \bfnmNJ\binitsN., \bauthor\bsnmDavey Smith, \bfnmG\binitsG., \bauthor\bsnmThompson, \bfnmSG\binitsS. \AND\bauthor\bsnmEPICInterAct Consortium (\byear2015). \btitleUsing published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. \bjournalEuropean Journal of Epidemiology \bvolume30 \bpages543–552. \bdoi10.1007/s106540150011z \endbibitem
 The Global Lipids Genetics Consortium (2013) {barticle}[author] \bauthor\bsnmThe Global Lipids Genetics Consortium (\byear2013). \btitleDiscovery and refinement of loci associated with lipid levels. \bjournalNature Genetics \bvolume45 \bpages1274–1283. \bdoi10.1038/ng.2797 \endbibitem
 Cox (1958) {bbook}[author] \bauthor\bsnmCox, \bfnmD. R.\binitsD. R. (\byear1958). \btitlePlanning of experiments. Section 2: Some key assumptions. \bpublisherWiley. \endbibitem
 Dastani et al. (2012) {barticle}[author] \bauthor\bsnmDastani, \bfnmZari\binitsZ., \bauthor\bsnmHivert, \bfnmMarieFrance\binitsM.F., \bauthor\bsnmTimpson, \bfnmNicholas\binitsN., \bauthor\bsnmPerry, \bfnmJohn RB\binitsJ. R., \bauthor\bsnmYuan, \bfnmXin\binitsX., \bauthor\bsnmScott, \bfnmRobert A\binitsR. A., \bauthor\bsnmHenneman, \bfnmPeter\binitsP., \bauthor\bsnmHeid, \bfnmIris M\binitsI. M., \bauthor\bsnmKizer, \bfnmJorge R\binitsJ. R., \bauthor\bsnmLyytikäinen, \bfnmLeoPekka\binitsL.P. \betalet al. (\byear2012). \btitleNovel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: A multiethnic metaanalysis of 45,891 individuals. \bjournalPLOS Genetics \bvolume8 \bpagese1002607. \bdoi10.1371/journal.pgen.1002607 \endbibitem
 Davey Smith and Ebrahim (2003) {barticle}[author] \bauthor\bsnmDavey Smith, \bfnmG\binitsG. \AND\bauthor\bsnmEbrahim, \bfnmS\binitsS. (\byear2003). \btitle‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? \bjournalInternational Journal of Epidemiology \bvolume32 \bpages1–22. \bdoi10.1093/ije/dyg070 \endbibitem
 Davey Smith and Hemani (2014) {barticle}[author] \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG. \AND\bauthor\bsnmHemani, \bfnmGibran\binitsG. (\byear2014). \btitleMendelian randomization: genetic anchors for causal inference in epidemiological studies. \bjournalHuman Molecular Genetics \bvolume23 \bpagesR89–98. \bdoi10.1093/hmg/ddu328 \endbibitem
 Day et al. (2015) {barticle}[author] \bauthor\bsnmDay, \bfnmFelix\binitsF. \betalet al. (\byear2015). \btitleLargescale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1mediated DNA repair. \bjournalNature Genetics. \bdoi10.1038/ng.3412 \endbibitem
 DerSimonian and Laird (1986) {barticle}[author] \bauthor\bsnmDerSimonian, \bfnmR.\binitsR. \AND\bauthor\bsnmLaird, \bfnmN.\binitsN. (\byear1986). \btitleMetaanalysis in clinical trials. \bjournalControlled Clinical Trials \bvolume7 \bpages177–188. \bdoi10.1016/01972456(86)900462 \endbibitem
 Didelez, Meng and Sheehan (2010) {barticle}[author] \bauthor\bsnmDidelez, \bfnmV.\binitsV., \bauthor\bsnmMeng, \bfnmS.\binitsS. \AND\bauthor\bsnmSheehan, \bfnmN. A.\binitsN. A. (\byear2010). \btitleAssumptions of IV methods for observational epidemiology. \bjournalStatistical Science \bvolume25 \bpages22–40. \bdoi10.1214/09sts316 \endbibitem
 Didelez and Sheehan (2007) {barticle}[author] \bauthor\bsnmDidelez, \bfnmV\binitsV. \AND\bauthor\bsnmSheehan, \bfnmN\binitsN. (\byear2007). \btitleMendelian randomization as an instrumental variable approach to causal inference. \bjournalStatistical Methods in Medical Research \bvolume16 \bpages309–330. \bdoi10.1177/0962280206077743 \endbibitem
 Fall et al. (2015) {barticle}[author] \bauthor\bsnmFall, \bfnmTove\binitsT., \bauthor\bsnmXie, \bfnmWeijia\binitsW., \bauthor\bsnmPoon, \bfnmWenny\binitsW., \bauthor\bsnmYaghootkar, \bfnmHanieh\binitsH., \bauthor\bsnmMägi, \bfnmReedik\binitsR., \bauthor\bsnmKnowles, \bfnmJoshua W\binitsJ. W., \bauthor\bsnmLyssenko, \bfnmValeriya\binitsV., \bauthor\bsnmWeedon, \bfnmMichael\binitsM., \bauthor\bsnmFrayling, \bfnmTimothy M\binitsT. M. \AND\bauthor\bsnmIngelsson, \bfnmErik\binitsE. (\byear2015). \btitleUsing genetic variants to assess the relationship between circulating lipids and type 2 diabetes. \bjournalDiabetes \bvolumedoi:10.2337/db141710. \bdoi10.2337/db141710 \endbibitem
 The International Consortium for Blood Pressure GenomeWide Association Studies (2011) {barticle}[author] \bauthor\bsnmThe International Consortium for Blood Pressure GenomeWide Association Studies (\byear2011). \btitleGenetic variants in novel pathways influence blood pressure and cardiovascular disease risk. \bjournalNature \bvolume478 \bpages103–109. \bdoi10.1038/nature10405 \endbibitem
 Greenland (2000) {barticle}[author] \bauthor\bsnmGreenland, \bfnmS\binitsS. (\byear2000). \btitleAn introduction to instrumental variables for epidemiologists. \bjournalInternational Journal of Epidemiology \bvolume29 \bpages722–729. \bdoi10.1093/ije/29.4.722 \endbibitem
 Han (2008) {barticle}[author] \bauthor\bsnmHan, \bfnmC.\binitsC. (\byear2008). \btitleDetecting invalid instruments using L1GMM. \bjournalEconomics Letters \bvolume101 \bpages285–287. \endbibitem
 Higgins, Thompson and Spiegelhalter (2009) {barticle}[author] \bauthor\bsnmHiggins, \bfnmJ.\binitsJ., \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmSpiegelhalter, \bfnmD. J.\binitsD. J. (\byear2009). \btitleA reevaluation of randomeffects metaanalysis. \bjournalJournal of the Royal Statistical Society: Series A (Statistics in Society) \bvolume172 \bpages137–159. \bdoi10.1111/j.1467985x.2008.00552.x \endbibitem
 Johnson (2013) {btechreport}[author] \bauthor\bsnmJohnson, \bfnmToby\binitsT. (\byear2013). \btitleEfficient calculation for multiSNP genetic risk scores \btypeTechnical Report, \bpublisherThe Comprehensive R Archive Network. \bnoteAvailable at http://cran.rproject.org/web/packages/gtx/vignettes/ashg2012.pdf [last accessed 2014/11/19]. \endbibitem
 Lawlor et al. (2008) {barticle}[author] \bauthor\bsnmLawlor, \bfnmDA\binitsD., \bauthor\bsnmHarbord, \bfnmRM\binitsR., \bauthor\bsnmSterne, \bfnmJAC\binitsJ., \bauthor\bsnmTimpson, \bfnmN\binitsN. \AND\bauthor\bsnmDavey Smith, \bfnmG\binitsG. (\byear2008). \btitleMendelian randomization: using genes as instruments for making causal inferences in epidemiology. \bjournalStatistics in Medicine \bvolume27 \bpages1133–1163. \bdoi10.1002/sim.3034 \endbibitem
 Martens et al. (2006) {barticle}[author] \bauthor\bsnmMartens, \bfnmE. P.\binitsE. P., \bauthor\bsnmPestman, \bfnmW. R.\binitsW. R., \bauthor\bparticlede \bsnmBoer, \bfnmA.\binitsA., \bauthor\bsnmBelitser, \bfnmS. V.\binitsS. V. \AND\bauthor\bsnmKlungel, \bfnmO. H.\binitsO. H. (\byear2006). \btitleInstrumental variables: application and limitations. \bjournalEpidemiology \bvolume17 \bpages260–267. \bdoi10.1097/01.ede.0000215160.88317.cb \endbibitem
 Nelson et al. (2015) {barticle}[author] \bauthor\bsnmNelson, \bfnmChristopher P.\binitsC. P., \bauthor\bsnmHamby, \bfnmStephen E.\binitsS. E., \bauthor\bsnmSaleheen, \bfnmDanish\binitsD., \bauthor\bsnmHopewell, \bfnmJemma C.\binitsJ. C., \bauthor\bsnmZeng, \bfnmLingyao\binitsL., \bauthor\bsnmAssimes, \bfnmThemistocles L.\binitsT. L., \bauthor\bsnmKanoni, \bfnmStavroula\binitsS., \bauthor\bsnmWillenborg, \bfnmChristina\binitsC., \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmAmouyel, \bfnmPhillipe\binitsP., \bauthor\bsnmAnand, \bfnmSonia\binitsS., \bauthor\bsnmBlankenberg, \bfnmStefan\binitsS., \bauthor\bsnmBoehm, \bfnmBernhard O.\binitsB. O., \bauthor\bsnmClarke, \bfnmRobert J.\binitsR. J., \bauthor\bsnmCollins, \bfnmRory\binitsR., \bauthor\bsnmDedoussis, \bfnmGeorge\binitsG., \bauthor\bsnmFarrall, \bfnmMartin\binitsM., \bauthor\bsnmFranks, \bfnmPaul W.\binitsP. W., \bauthor\bsnmGroop, \bfnmLeif\binitsL., \bauthor\bsnmHall, \bfnmAlistair S.\binitsA. S., \bauthor\bsnmHamsten, \bfnmAnders\binitsA., \bauthor\bsnmHengstenberg, \bfnmChristian\binitsC., \bauthor\bsnmHovingh, \bfnmG. Kees\binitsG. K., \bauthor\bsnmIngelsson, \bfnmErik\binitsE., \bauthor\bsnmKathiresan, \bfnmSekar\binitsS., \bauthor\bsnmKee, \bfnmFrank\binitsF., \bauthor\bsnmKönig, \bfnmInke R.\binitsI. R., \bauthor\bsnmKooner, \bfnmJaspal\binitsJ., \bauthor\bsnmLehtimÃ¤ki, \bfnmTerho\binitsT., \bauthor\bsnmMärz, \bfnmWinifred\binitsW., \bauthor\bsnmMcPherson, \bfnmRuth\binitsR., \bauthor\bsnmMetspalu, \bfnmAndres\binitsA., \bauthor\bsnmNieminen, \bfnmMarkku S.\binitsM. S., \bauthor\bsnmO’Donnell, \bfnmChristopher J.\binitsC. J., \bauthor\bsnmPalmer, \bfnmColin N. A.\binitsC. N. A., \bauthor\bsnmPeters, \bfnmAnnette\binitsA., \bauthor\bsnmPerola, \bfnmMarkus\binitsM., \bauthor\bsnmReilly, \bfnmMuredach P.\binitsM. P., \bauthor\bsnmRipatti, \bfnmSamuli\binitsS., \bauthor\bsnmRoberts, \bfnmRobert\binitsR., \bauthor\bsnmSalomaa, \bfnmVeikko\binitsV., \bauthor\bsnmShah, \bfnmSvati H.\binitsS. H., \bauthor\bsnmSchreiber, \bfnmStefan\binitsS., \bauthor\bsnmSiegbahn, \bfnmAgneta\binitsA., \bauthor\bsnmThorsteinsdottir, \bfnmUnnur\binitsU., \bauthor\bsnmVeronesi, \bfnmGiovani\binitsG., \bauthor\bsnmWareham, \bfnmNicholas\binitsN., \bauthor\bsnmWiller, \bfnmCristen J.\binitsC. J., \bauthor\bsnmZalloua, \bfnmPierre A.\binitsP. A., \bauthor\bsnmErdmann, \bfnmJeanette\binitsJ., \bauthor\bsnmDeloukas, \bfnmPanos\binitsP., \bauthor\bsnmWatkins, \bfnmHugh\binitsH., \bauthor\bsnmSchunkert, \bfnmHeribert\binitsH., \bauthor\bsnmDanesh, \bfnmJohn\binitsJ., \bauthor\bsnmThompson, \bfnmJohn R.\binitsJ. R. \AND\bauthor\bsnmSamani, \bfnmNilesh J.\binitsN. J. (\byear2015). \btitleGenetically determined height and coronary artery disease. \bjournalNew England Journal of Medicine \bvolume372 \bpages1608–1618. \bdoi10.1056/NEJMoa1404881 \endbibitem
 Nitsch et al. (2006) {barticle}[author] \bauthor\bsnmNitsch, \bfnmD.\binitsD., \bauthor\bsnmMolokhia, \bfnmM.\binitsM., \bauthor\bsnmSmeeth, \bfnmL.\binitsL., \bauthor\bsnmDeStavola, \bfnmB. L.\binitsB. L., \bauthor\bsnmWhittaker, \bfnmJ. C.\binitsJ. C. \AND\bauthor\bsnmLeon, \bfnmD. A.\binitsD. A. (\byear2006). \btitleLimits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. \bjournalAmerican Journal of Epidemiology \bvolume163 \bpages397–403. \bdoi10.1093/aje/kwj062 \endbibitem
 Pearl (2000) {bbook}[author] \bauthor\bsnmPearl, \bfnmJ.\binitsJ. (\byear2000). \btitleCausality: models, reasoning, and inference. \bpublisherCambridge University Press. \endbibitem
 Pierce and Burgess (2013) {barticle}[author] \bauthor\bsnmPierce, \bfnmB.\binitsB. \AND\bauthor\bsnmBurgess, \bfnmS\binitsS. (\byear2013). \btitleEfficient design for Mendelian randomization studies: subsample and twosample instrumental variable estimators. \bjournalAmerican Journal of Epidemiology \bvolume178 \bpages1177–1184. \bdoi10.1093/aje/kwt084 \endbibitem
 Riley et al. (2011) {barticle}[author] \bauthor\bsnmRiley, \bfnmRichard D\binitsR. D., \bauthor\bsnmHiggins, \bfnmJulian PT\binitsJ. P., \bauthor\bsnmDeeks, \bfnmJonathan J\binitsJ. J. \betalet al. (\byear2011). \btitleInterpretation of random effects metaanalyses. \bjournalBritish Medical Journal \bvolume342 \bpagesd549. \bdoi10.1136/bmj.d549 \endbibitem
 Schatzkin et al. (2009) {barticle}[author] \bauthor\bsnmSchatzkin, \bfnmA.\binitsA., \bauthor\bsnmAbnet, \bfnmC. C.\binitsC. C., \bauthor\bsnmCross, \bfnmA. J.\binitsA. J., \bauthor\bsnmGunter, \bfnmM.\binitsM., \bauthor\bsnmPfeiffer, \bfnmR.\binitsR., \bauthor\bsnmGail, \bfnmM.\binitsM., \bauthor\bsnmLim, \bfnmU.\binitsU. \AND\bauthor\bsnmDavey Smith, \bfnmG.\binitsG. (\byear2009). \btitleMendelian randomization: how it can – and cannot – help confirm causal relations between nutrition and cancer. \bjournalCancer Prevention Research \bvolume2 \bpages104–113. \bdoi10.1158/19406207.capr080070 \endbibitem
 Shen and Zhan (2015) {barticle}[author] \bauthor\bsnmShen, \bfnmXia\binitsX. \AND\bauthor\bsnmZhan, \bfnmYiqiang\binitsY. (\byear2015). \btitleRe: The effect on melanoma risk of genes previously associated with telomere length. \bjournalJournal of the National Cancer Institute \bvolume107 \bpagesdjv237. \bdoi10.1093/jnci/djv237 \endbibitem
 Staiger and Stock (1997) {barticle}[author] \bauthor\bsnmStaiger, \bfnmD\binitsD. \AND\bauthor\bsnmStock, \bfnmJH\binitsJ. (\byear1997). \btitleInstrumental variables regression with weak instruments. \bjournalEconometrica \bvolume65 \bpages557–586. \endbibitem
 Stock, Wright and Yogo (2002) {barticle}[author] \bauthor\bsnmStock, \bfnmJH\binitsJ., \bauthor\bsnmWright, \bfnmJH\binitsJ. \AND\bauthor\bsnmYogo, \bfnmM\binitsM. (\byear2002). \btitleA survey of weak instruments and weak identification in generalized method of moments. \bjournalJournal of Business and Economic Statistics \bvolume20 \bpages518–529. \bdoi10.1198/073500102288618658 \endbibitem
 Stock and Yogo (2002) {barticle}[author] \bauthor\bsnmStock, \bfnmJH\binitsJ. \AND\bauthor\bsnmYogo, \bfnmM\binitsM. (\byear2002). \btitleTesting for weak instruments in linear IV regression. \bjournalSSRN eLibrary \bvolume11 \bpagesT0284. \endbibitem
 Swanson and Hernán (2013) {barticle}[author] \bauthor\bsnmSwanson, \bfnmSonja\binitsS. \AND\bauthor\bsnmHernán, \bfnmMiguel\binitsM. (\byear2013). \btitleCommentary: how to report instrumental variable analyses (suggestions welcome). \bjournalEpidemiology \bvolume24 \bpages370–374. \bdoi10.1097/ede.0b013e31828d0590 \endbibitem
 R Core Team (2014) {bmanual}[author] \bauthor\bsnmR Core Team (\byear2014). \btitleR: A Language and Environment for Statistical Computing. Version 3.1.2 (Pumpkin Helmet) \bpublisherR Foundation for Statistical Computing, \baddressVienna, Austria. \endbibitem
 Thomas, Lawlor and Thompson (2007) {barticle}[author] \bauthor\bsnmThomas, \bfnmD. C.\binitsD. C., \bauthor\bsnmLawlor, \bfnmD. A.\binitsD. A. \AND\bauthor\bsnmThompson, \bfnmJ. R.\binitsJ. R. (\byear2007). \btitleRe: Estimation of bias in nongenetic observational studies using “Mendelian triangulation” by Bautista et al. \bjournalAnnals of Epidemiology \bvolume17 \bpages511–513. \bdoi10.1016/j.annepidem.2006.12.005 \endbibitem
 Thompson and Sharp (1999) {barticle}[author] \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmSharp, \bfnmS. J.\binitsS. J. (\byear1999). \btitleExplaining heterogeneity in metaanalysis: a comparison of methods. \bjournalStatistics in Medicine \bvolume18 \bpages2693–2708. \endbibitem
 VanderWeele and Hernán (2013) {barticle}[author] \bauthor\bsnmVanderWeele, \bfnmTJ\binitsT. \AND\bauthor\bsnmHernán, \bfnmMA\binitsM. (\byear2013). \btitleCausal inference under multiple versions of treatment. \bjournalJournal of Causal Inference \bvolume1 \bpages1–20. \bdoi10.1515/jci20120002 \endbibitem
 VanderWeele et al. (2014) {barticle}[author] \bauthor\bsnmVanderWeele, \bfnmTyler\binitsT., \bauthor\bsnmTchetgen Tchetgen, \bfnmEric\binitsE., \bauthor\bsnmCornelis, \bfnmMarilyn\binitsM. \AND\bauthor\bsnmKraft, \bfnmPeter\binitsP. (\byear2014). \btitleMethodological challenges in Mendelian randomization. \bjournalEpidemiology \bvolume25 \bpages427–435. \bdoi10.1097/ede.0000000000000081 \endbibitem
 Zhang et al. (2015) {barticle}[author] \bauthor\bsnmZhang, \bfnmChenan\binitsC., \bauthor\bsnmDoherty, \bfnmJennifer A\binitsJ. A., \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmHung, \bfnmRayjean J\binitsR. J., \bauthor\bsnmLindström, \bfnmSara\binitsS., \bauthor\bsnmKraft, \bfnmPeter\binitsP., \bauthor\bsnmGong, \bfnmJian\binitsJ., \bauthor\bsnmAmos, \bfnmChristopher I\binitsC. I., \bauthor\bsnmSellers, \bfnmThomas A\binitsT. A., \bauthor\bsnmMonteiro, \bfnmAlvaro NA\binitsA. N. \betalet al. (\byear2015). \btitleGenetic determinants of telomere length and risk of common cancers: a Mendelian randomization study. \bjournalHuman Molecular Genetics. \bdoi10.1093/hmg/ddv252 \endbibitem
Appendix
a.1 Data for motivating example: causal effect of early menopause on triglycerides
Information on the genetic variants included in the motivating analysis are presented in Appendix Table A1: for each variant, we provide the rsid, nearest gene(s), effect allele, other allele, association with early menopause (expressed as number of years earlier menopause) and standard error, and association with triglycerides (in standard deviation units) and standard error. Associations are also displayed visually as a scatter plot in Appendix Figure A1. Associations with early menopause are obtained from Day et al. (Day et al., 2015); larger numbers indicate that individuals with copies of the effect allele have earlier menopause on average compared with carriers of the other allele. These association estimates are available from download as part of the Supplementary Material to Day et al. (Supplementary Table 3). Associations with triglycerides are obtained from the Global Lipids Genetics Consortium (The Global Lipids Genetics Consortium, 2013), and can be downloaded from http://csg.sph.umich.edu//abecasis/public/lipids2013/.
Genetic  Gene  Effect  Other  Early menopause  Triglycerides 

variant  region  allele  allele  in years (SE)  SD difference (SE) 
rs10734411  EIF3M  G  A  0.12 (0.02)  0.0017 (0.0047) 
rs10852344  GSPT1/BCAR4  T  C  0.16 (0.02)  0.0030 (0.0047) 
rs10905065  FBXO18  A  G  0.11 (0.02)  0.0056 (0.0047) 
rs10957156  CHD7  G  A  0.14 (0.02)  0.0114 (0.0056) 
rs11031006  FSHB  G  A  0.25 (0.03)  0.0186 (0.0068) 
rs11668344  BRSK1/NLRP11/U2AF2  A  G  0.41 (0.02)  0.0009 (0.0049) 
rs11738223  SH3PXD2B  G  A  0.12 (0.02)  0.0007 (0.0036) 
rs1183272  HELB  T  C  0.31 (0.03)  0.0005 (0.0047) 
rs12142240  RAD54L  C  T  0.13 (0.02)  0.0051 (0.0050) 
rs12196873  REV3L  A  C  0.16 (0.03)  0.0099 (0.0068) 
rs12461110  BRSK1/NLRP11/U2AF2  G  A  0.15 (0.02)  0.0061 (0.0051) 
rs12824058  PIWIL1  A  G  0.14 (0.02)  0.0006 (0.0048) 
rs13040088  SLCO4A1/DIDO1  A  G  0.16 (0.02)  0.0004 (0.0057) 
rs1411478  STX6  A  G  0.13 (0.02)  0.0004 (0.0047) 
rs16858210  PARL/POLR2H  A  G  0.14 (0.02)  0.0023 (0.0055) 
rs16991615  MCM8  A  G  0.88 (0.04)  0.0025 (0.0073) 
rs1713460  APEX1/PARP2/PNP  A  G  0.14 (0.02)  0.0015 (0.0056) 
rs1799949  BRCA1  A  G  0.14 (0.02)  0.0107 (0.0049) 
rs1800932  MSH6  G  A  0.17 (0.03)  0.0020 (0.0060) 
rs2230365  MSH5/HLA  T  C  0.16 (0.03)  0.0202 (0.0046) 
rs2236553  SLCO4A1/DIDO1  C  T  0.16 (0.03)  0.0021 (0.0065) 
rs2241584  UIMC1  A  G  0.14 (0.02)  0.0007 (0.0048) 
rs2277339  PRIM1/TAC3  G  T  0.31 (0.03)  0.0072 (0.0080) 
rs2720044  STAR  C  A  0.29 (0.03)  0.0043 (0.0078) 
rs2941505  STARD3/PGAP3/CDK12  A  G  0.13 (0.02)  0.0074 (0.0035) 
rs349306  POLR2E/KISS1R  G  A  0.23 (0.04)  0.0082 (0.0055) 
rs365132  UIMC1  G  T  0.24 (0.02)  0.0003 (0.0047) 
rs3741604  HELB  T  C  0.29 (0.03)  0.0014 (0.0047) 
rs4246511  RHBDL2/MYCBP  T  C  0.22 (0.02)  0.0093 (0.0056) 
rs427394  PAPD7  G  A  0.13 (0.02)  0.0013 (0.0048) 
rs451417  MCM8  C  A  0.20 (0.03)  0.0019 (0.0081) 
rs4693089  HELQ/FAM175A  G  A  0.20 (0.02)  0.0045 (0.0048) 
rs4879656  APTX  C  A  0.12 (0.02)  0.0033 (0.0049) 
rs4886238  TDRD3  A  G  0.18 (0.02)  0.0009 (0.0050) 
rs551087  SPPL3/SRSF9  A  G  0.13 (0.02)  0.0032 (0.0036) 
rs5762534  CHEK2  C  T  0.16 (0.03)  0.0056 (0.0066) 
rs6484478  FSHB  G  A  0.14 (0.02)  0.0102 (0.0053) 
rs6856693  ASCL1/MLF1IP  A  G  0.16 (0.02)  0.0044 (0.0048) 
rs6899676  SYCP2L/MAK  G  A  0.21 (0.03)  0.0045 (0.0058) 
rs704795  BRE/GTF3C2/EIFB4  G  A  0.16 (0.02)  0.0567 (0.0034) 
rs707938  MSH5/HLA  A  G  0.16 (0.02)  0.0014 (0.0049) 
rs7259376  ZNF729  A  G  0.11 (0.02)  0.0041 (0.0047) 
rs763121  DMC1/DDX17  G  A  0.16 (0.02)  0.0179 (0.0036) 
rs8070740  RPAIN  G  A  0.15 (0.02)  0.0121 (0.0056) 
rs9039  C16orf72/ABAT  C  T  0.12 (0.02)  0.0068 (0.0037) 
rs930036  TLK1/GAD1  A  G  0.19 (0.02)  0.0001 (0.0049) 
rs9393800  SYCP2L/MAK  A  G  0.14 (0.02)  0.0073 (0.0054) 
a.2 Code for implementing methods used in simulation study
Code for performing the methods used in the simulation study for the R software package is provided below:
alpx=NULL; alpxsd=NULL # genetic associations with risk factor and standard errors alpy=NULL; alpysd=NULL # genetic associations with outcome and standard errors for (j in 1:vars) { alpx[j] = lm(x~g[,j])$coef[2] alpy[j] = lm(y~g[,j])$coef[2] alpxsd[j] = summary(lm(x~g[,j]))$coef[2,2] alpysd[j] = summary(lm(y~g[,j]))$coef[2,2] } reg.first = summary(lm(alpy~alpx1, weights=alpysd^2)) betafirst.fixed = reg.first$coef[1] # estimate using firstorder weights, fixedeffect model betafirst.mulran = reg.first$coef[1] # estimate using firstorder weights, multiplicative randomeffects sefirst.fixed = reg.first$coef[1,2]/reg.first$sigma # standard error using firstorder weights, fixedeffect model sefirst.mulran = reg.first$coef[1,2]/min(reg.first$sigma,1) betafirst.addran = metagen(alpy/alpx, abs(alpysd/alpx))$TE.random # estimate using firstorder weights, additive randomeffects model sefirst.addran = metagen(alpy/alpx, abs(alpysd/alpx))$seTE.random reg.second = summary(lm(alpy~alpx1, weights=(alpysd^2+alpy^2*alpxsd^2/alpx^2)^1)) betasecond.fixed = reg.second$coef[1] # estimate using secondorder weights, fixedeffect model betasecond.mulran = reg.second$coef[1] sesecond.fixed = reg.second$coef[1,2]/reg.second$sigma sesecond.mulran = reg.second$coef[1,2]/min(reg.second$sigma,1) betasecond.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4))$TE.random sesecond.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4))$seTE.random theta = 0.1 # correlation term from equation (1) reg.second.theta = summary(lm(alpy~alpx1, weights=(alpysd^2+alpy^2*alpxsd^2/alpx^22*theta*alpy*alpxsd*alpysd/alpx)^1)) betasecond.theta.fixed = reg.second.theta$coef[1] # estimate using secondorder weights with correlation, fixedeffect model betasecond.theta.mulran = reg.second.theta$coef[1] sesecond.theta.fixed = reg.second.theta$coef[1,2]/reg.second.theta$sigma sesecond.theta.mulran = reg.second.theta$coef[1,2]/min(reg.second.theta$sigma,1) betasecond.theta.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^42*theta*alpy*alpxsd*alpysd/alpx^3))$TE.random sesecond.theta.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^42*theta*alpy*alpxsd*alpysd/alpx^3))$seTE.random
a.3 Sensitivity analysis for value of in motivating example
As stated in Section 5.3, in this paper we have assumed that the correlation parameter in the secondorder expression for the variance of a causal estimate from the delta method (equation 2) is zero. While computational and practical considerations (the length of the simulation study to run, and the difficulty in estimating the parameter using summarized data only) preclude an investigation into the impact of this term in the simulation study, we can conduct a sensitivity analysis to consider the impact of the value of on estimates from the motivating example.
We conduct inversevariance weighted analyses using weights derived from equation (2) and fixedeffect, additive randomeffects, and multiplicative randomeffects models for . The causal estimates and 95% confidence intervals from each analysis are presented in Appendix Table A2. We see that the estimates and confidence intervals do not change substantially despite the wide range of values of considered.
The true value of should be zero if the associations with the risk factor and outcome are estimated in nonoverlapping samples, and similar to the correlation between the risk factor and the outcome if the associations are estimated in the same individuals. With partial overlap, the value of will be between these two values.
Fixedeffects  Additive randomeffects  Multiplicative randomeffects  

Estimate  95% CI  Estimate  95% CI  Estimate  95% CI  
0.000  0.007, 0.007  0.004  0.010, 0.017  0.000  0.012, 0.012  
0.001  0.006, 0.008  0.005  0.009, 0.018  0.001  0.011, 0.013  
0  0.002  0.005, 0.009  0.006  0.008, 0.019  0.002  0.010, 0.014 
0.1  0.003  0.004, 0.011  0.007  0.007, 0.020  0.003  0.009, 0.016 
0.2  0.004  0.003, 0.012  0.008  0.006, 0.022  0.004  0.008, 0.017 
0.3  0.005  0.002, 0.013  0.009  0.006, 0.023  0.005  0.008, 0.018 
a.4 Additional results from simulation study
Additional results from scenarios 1 and 2 are presented in Appendix Table A3, and from scenarios 3 and 4 in Appendix Table A4. For each value of the instrument strength, the (Monte Carlo) standard deviation and the mean standard error of estimates are presented. Using secondorder weights, only results from the fixedeffect analyses are presented, as heterogeneity was not detected in the vast majority of datasets, and so results were the same up to 3 decimal places in almost all cases.
a.5 Additional simulation with directional pleiotropy
To provide some guidance as to the performance of the inversevariance weighted method when there is directional pleiotropy, we perform a further simulation under this scenario. The parameters and scenarios are taken to be the same as those in the main body of the paper, except that rather than drawing the genetic effects on the risk factor () and the direct effects of the genetic variants on the outcome () from independent normal distributions as in Scenarios 2 and 4, we draw them from a bivariate normal distribution. The univariate distributions of these parameters are the same (the parameters have mean and variance ; the parameters have mean 0 and variance ), but the correlation between the distributions is set to 0.4.
This correlation means that the direct effects of genetic variants on the outcome are greater for those variants that have stronger effects on the risk factor, and so for those variants that receive more weight in the analysis. Hence, although the overall mean pleiotropic effect has mean zero, pleiotropic effects of weak and strong instruments separately do not have mean zero. We refer to the onesample setting with directional pleiotropy as Scenario 6, and the twosample setting with directional pleiotropy as Scenario 7.
Results for the mean estimate and empirical power to detect a causal effect are given in Appendix Table A5. In the onesample setting (scenario 6), there is bias in the direction of confounding in all cases. While Type 1 error rates under the null are inflated throughout, there is a clear preference for the use of secondorder weights and randomeffects models, as well as a slight preference for the additive randomeffects model (based on slightly more conservative coverage properties with firstorder weights). This mirrors the advice in the main paper. In the twosample setting (scenario 7), bias under the null is in the positive direction, whereas bias under the alternative is towards the null. Type 1 error rates under the null with randomeffects models are close to nominal levels, with conservative coverage for secondorder weights, and slightly anticonservative coverage for firstorder weights. However, the advice from the main paper to use firstorder weights in a twosample setting would not lead to overly misleading inferences, as Type 1 error rates with firstorder weights are close to the nominal 5% level. Power to detect a causal effect is greater using firstorder weights in this case. Hence, on the basis of these simulations, the advice in the main body of the paper also holds with directional pleiotropy.
In practice, we repeat that estimates from the inversevariance weighted method will typically be biased if the genetic variants are not valid instruments (and the example of directional pleiotropy considered here is far from extreme), and recommend the use of robust methods (such as the Egger method and medianbased methods introduced in the discussion of the paper) as sensitivity analyses for applied Mendelian randomization investigations.