Integrating summarized data from multiple genetic variants in Mendelian randomization: bias and coverage properties of inverse-variance weighted methods

Integrating summarized data from multiple genetic variants in Mendelian randomization: bias and coverage properties of inverse-variance weighted methods

\fnmsStephen \snmBurgess \corref label=e1]sb452@medschl.cam.ac.uk [ Strangeways Research Laboratory
2 Worts Causeway
Cambridge, CB1 8RN
UK
Department of Public Health and Primary Care, University of Cambridge
\fnmsJack \snmBowden label=e2]jack.bowden@bristol.ac.uk [ Oakfield House
Oakfield Grove
Bristol, BS8 2BN
UK
MRC Integrative Epidemiology Unit, University of Bristol
Abstract

Mendelian randomization is the use of genetic variants as instrumental variables to assess whether a risk factor is a cause of a disease outcome. Increasingly, Mendelian randomization investigations are conducted on the basis of summarized data, rather than individual-level data. These summarized data comprise the coefficients and standard errors from univariate regression models of the risk factor on each genetic variant, and of the outcome on each genetic variant. A causal estimate can be derived from these associations for each individual genetic variant, and a combined estimate can be obtained by inverse-variance weighted meta-analysis of these causal estimates. Various proposals have been made for how to calculate this inverse-variance weighted estimate. In this paper, we show that the inverse-variance weighted method as originally proposed (equivalent to a two-stage least squares or allele score analysis using individual-level data) can lead to over-rejection of the null, particularly when there is heterogeneity between the causal estimates from different genetic variants. Random-effects models should be routinely employed to allow for this possible heterogeneity. Additionally, over-rejection of the null is observed when associations with the risk factor and the outcome are obtained in overlapping participants. The use of weights including second-order terms from the delta method is recommended in this case.

\kwd
\startlocaldefs\endlocaldefs\runtitle

Summarized data in Mendelian randomization: bias and coverage

Mendelian randomization \kwdinstrumental variables \kwdsummarized data \kwdweak instruments \kwdcausal inference

1 Introduction

Mendelian randomization is the use of genetic variants as instrumental variables to investigate the causal effect of a modifiable risk factor on an outcome using observational data (Burgess and Thompson, 2015). Mendelian randomization analyses are increasingly performed using summarized data, rather than individual-level data (Burgess et al., 2015). There are various methods for combining the estimates from multiple genetic variants into a single causal estimate (Burgess, Butterworth and Thompson, 2013). In particular, an inverse-variance weighted method has been proposed (Johnson, 2013) that is equivalent (for a particular choice of weights) to the standard two-stage least squares method usually employed with individual-level data (Burgess, Dudbridge and Thompson, 2015a). However, different authors have used different formulae for estimating the variances of the estimates that are used as weights (Dastani et al., 2012; Shen and Zhan, 2015). Additionally, some authors have used fixed-effect meta-analysis for the combination of estimates from different genetic variants (Nelson et al., 2015), whereas other authors have used random-effects meta-analysis (Ahmad et al., 2015).

In this paper, we compare the bias and coverage properties of estimates from the inverse-variance weighted method for different choices of weights, and using fixed-effect, additive random-effects, and multiplicative random-effects models for combining the estimates. In Section 2, we introduce the inverse-variance weighted method, and demonstrate its equivalence to both a two-stage least squares analysis and to a weighted linear regression of the association estimates. We also present the different versions of the method that are investigated further in this paper. In Section 3, we provide an example analysis that was the motivation for this work. In this example, subtly different choices in the analysis method result in estimates that differ considerably and lead to substantively different conclusions. In Section 4, we perform a simulation study to compare the bias and coverage properties of the different versions of the method. Finally, in Section 5, we discuss the findings of this paper and their relevance to applied practice.

2 Methods

We provide a brief introduction to Mendelian randomization – the use of genetic variants as instrumental variables; further introductory references to the subject area are available (Davey Smith and Ebrahim, 2003; Lawlor et al., 2008; Schatzkin et al., 2009). The objective of Mendelian randomization is to judge whether intervention on a modifiable risk factor would affect a disease outcome. This is achieved by testing whether genetic variants that satisfy the assumptions of an instrumental variable for the risk factor are associated with the outcome. An instrumental variable is a variable that is associated with the risk factor, but not associated with confounders of the risk factor–outcome association, nor is there any causal pathway from the instrumental variable to the outcome except for that via the risk factor (see Greenland (2000); Martens et al. (2006) for further information on instrumental variables). This means that the genetic variant is an unconfounded proxy for variation in the risk factor, and therefore can be treated as similar to treatment assignment in a randomized trial, where the treatment is to change the level of the risk factor (Nitsch et al., 2006). Similarly to an intention-to-treat analysis in a randomized trial, an association between such a genetic variant and the outcome implies a causal effect of the risk factor (VanderWeele et al., 2014). Additionally, under further parametric assumptions, the magnitude of the causal effect of the risk factor on the outcome can be estimated (Didelez, Meng and Sheehan, 2010). In this paper, we assume that the effect of the risk factor on the outcome is linear with no effect modification, and the associations of the genetic variants with the risk factor and with the outcome are linear without effect modification (Didelez and Sheehan, 2007):

 E(X|Gj=g,U=u) =βX0j+βXjg+βXUu (1) E(Y|Gj=g,U=u) =βY0j+βYjg+βYUu for j=1,…,J E(Y|do(X=x),U=u) =β0+βx+βUu

where is the risk factor, are the genetic variants, is the outcome, is an unmeasured confounder, is the do-operator of Pearl meaning that the value of the risk factor is set to by intervention (Pearl, 2000), and the causal effect parameter for all . We also assume that the effects of the genetic variants on the risk factor are the same in all individuals. Although these assumptions are not necessary to identify a causal parameter (weaker assumptions have been proposed (Swanson and Hernán, 2013)), alternative assumptions mean that the causal parameters identified by different instrumental variables are likely to be different. While these assumptions are restrictive, a causal estimate has an interpretation as a test statistic for the null hypothesis that the risk factor is not causal for the outcome without requiring the assumptions of linearity and homogeneity of the genetic effects on the risk factor (Burgess, Butterworth and Thompson, 2015).

We assume that summarized data are available in the form of association estimates (beta-coefficients and standard errors) with the risk factor and with the outcome for genetic variants that are instrumental variables. The association estimates with the risk factor are denoted with standard error ; association estimates with the outcome are denoted with standard error . The genetic variants are assumed to be independently distributed (that is, not in linkage disequilibrium).

2.1 Standard inverse-variance weighted method

The ratio estimate of the causal effect of the risk factor on the outcome based on the th genetic variant is (Lawlor et al., 2008). We refer to this as . The variance of the ratio of two random variables can be calculated using the delta method; the formula including first- and second-order terms for the variance of is:

 var(^βIVj)=σ2Yj^β2Xj+^β2Yjσ2Xj^β4Xj−2θ^βYjσYjσXj^β3Xj (2)

where is the correlation between and (Thomas, Lawlor and Thompson, 2007). This can be rewritten in terms of the causal estimates as:

 var(^βIVj)=1^β2Xj(σ2Yj+^β2IVjσ2Xj−2θ^βIVjσYjσXj) (3)

Assuming that the correlation between and is zero (this would be the case if the associations with the risk factor and with the outcome were estimated in non-overlapping datasets – known as a two-sample analysis (Pierce and Burgess, 2013)), the variance is:

 var(^βIVj)=σ2Yj^β2Xj+^β2Yjσ2Xj^β4Xj. (4)

If only the first-order term from the delta formula is taken, then the variance is:

 var(^βIVj)=σ2Yj^β2Xj. (5)

The inverse-variance weighted (IVW) estimate is a weighted mean of the causal estimates from each genetic variant considered individually:

 ^βIVW=∑j^βIVjvar(^βIVj)−1∑jvar(^βIVj)−1. (6)

This is equivalent to meta-analysing the causal estimates from each genetic variant using the standard inverse-variance weighted formula (hence the name “inverse-variance weighted estimate”) under a fixed-effect model (Borenstein et al., 2009). Using the first-order variance estimates (equation 5), the IVW estimate is:

 ^βIVW=∑j^βYj^βXjσ−2Yj∑j^β2Xjσ−2Yj. (7)

This is the same estimate as would be obtained from a weighted linear regression of the coefficients on the coefficients with no intercept term, using the as weights.

Using the first-order weights and assuming a fixed-effect model (Section 2.3), the standard error is:

 se(^βIVW)= ⎷1∑j^β2Xjσ−2Yj. (8)

This is the form of the inverse-variance weighted estimate as it was initially proposed (\al@johnson2013vig, ehret2011, dastani2012; \al@johnson2013vig, ehret2011, dastani2012; \al@johnson2013vig, ehret2011, dastani2012).

2.2 Equivalence to two-stage least squares estimate

The inverse-variance weighted estimate using first-order weights is also equal to the estimate obtained from the two-stage least squares method that is commonly used with individual-level data (sample size ). If the we write the risk factor as (usually an matrix, although the result can be generalized for multiple risk factors (Burgess, Dudbridge and Thompson, 2015a)), the outcome as (an matrix), and the instrumental variables as (an matrix), then the two-stage least squares estimate of causal effects (Baum, Schaffer and Stillman, 2003) is:

 ^β2SLS=[XTZ(ZTZ)−1ZTX]−1XTZ(ZTZ)−1ZTY.

This estimate can be obtained by sequential regression of the risk factor on the instrumental variables, and then the outcome on fitted values of the risk factor from the first-stage regression.

Regression of on gives beta-coefficients with standard errors the square roots of the diagonal elements of the matrix where is the residual standard error. If the instrumental variables are perfectly uncorrelated, then the off-diagonal elements of are all equal to zero. Regression of on gives beta-coefficients . Weighted linear regression of the beta-coefficients on the beta-coefficients using the inverse-variance weights gives an estimate:

 [^βTX(ZTZ)^βX]−1σ−2^βTX(ZTZ)σ2^βY = [XTZ(ZTZ)−1(ZTZ)(ZTZ)−1ZTX]−1XTZ(ZTZ)−1(ZTZ)(ZTZ)−1ZTY = [XTZ(ZTZ)−1ZTX]−1XTZ(ZTZ)−1ZTY = ^β2SLS

The assumption of uncorrelated instrumental variables ensures that the regression coefficients from univariate regressions (as in the regression-based methods) equal those from multivariable regression (as in the two-stage least squares method). In practice, the two-stage least squares and weighted regression-based estimates will differ slightly as there will be non-zero correlations between the genetic variants in finite samples, even if the variants are truly uncorrelated in the population. However, these differences are likely to be slight, and to tend to zero asymptotically (Burgess, Dudbridge and Thompson, 2015b).

2.3 Fixed- versus random-effects

A fixed-effect meta-analysis assumes that the causal effects targeted by each genetic variant are all equal. While this would be true if all the genetic variants are valid instrumental variables, and also under the additional linearity assumptions stated above, this may not be true in practice. For instance, genetic variants may affect the exposure via different mechanisms, leading to different magnitudes of effect on the outcome. Alternatively, some variants may have direct effects on the outcome that do not pass via the risk factor, and hence not all genetic variants may be valid instrumental variables. To combat heterogeneity in the causal effects identified by each genetic variant, a random-effects meta-analysis may be preferred. We outline two ways to model this heterogeneity: an additive random-effects model, and a multiplicative random-effects model.

2.4 Additive and multiplicative random-effects models

In a fixed-effect meta-analysis, we assume that the estimates from each instrumental variable can be modelled as normally distributed with common mean and variance . In a random-effects meta-analysis, the mean values are additionally assumed to vary (Higgins, Thompson and Spiegelhalter, 2009). In an additive random-effects model, the are assumed to be normally distributed with mean and variance . Any additional variability beyond that predicted by the fixed-effect model () is interpreted as heterogeneity between the causal effects targeted by each instrumental variable. An estimate of the heterogeneity parameter is often obtained by a method of moments estimator, developed by DerSimonian and Laird (DerSimonian and Laird, 1986).

In a multiplicative random-effects model, the estimates are assumed to be normally distributed with mean and variance . This model can be fitted by linear regression of the on the using the as weights. A fixed-effect model can be fitted by setting the residual standard error in the regression model to be one; this can be achieved after fitting the regression model by dividing the standard error by the estimate of the residual standard error (Thompson and Sharp, 1999). A multiplicative random-effects model can be fitted by allowing the residual standard error (which is equivalent to the heterogeneity parameter ) to be estimated as part of the model. The multiplicative random-effects model is therefore equivalent to an overdispersed regression model. In case of underdispersion (that is, the estimated residual standard error is less than one), the standard errors should be fixed by setting , as any underdispersion is assumed to occur by chance, and not to be empirically justified.

 ^βIVj∼N(β,σ2IVj) (fixed-effect model) ^βIVj∼N(βj,σ2IVj)βj∼N(β,ϕ2A)} (additive random-effects model) ^βIVj∼N(β,ϕ2Mσ2IVj) (multiplicative random-effects model)

The point estimate from a fixed-effect meta-analysis is identical to that from a multiplicative random-effects meta-analysis (Thompson and Sharp, 1999). However, it differs to that from an additive random-effects meta-analysis when , as the weights in the random-effects meta-analysis are inflated to account for heterogeneity. As heterogeneity increases, weights become more similar, which results in estimates with low weights being upweighted (relatively speaking) in an additive random-effects meta-analysis.

2.5 Weak instrument bias

Although instrumental variable estimates are consistent (and so they are asymptotically unbiased), they can suffer from substantial bias in finite samples (Staiger and Stock, 1997; Stock, Wright and Yogo, 2002). This bias, known as ‘weak instrument bias’, occurs when the instrumental variables explain a small proportion of variance in the risk factor (Burgess and Thompson, 2011). In a conventional Mendelian randomization analysis in which the risk factor and outcome are measured in the same participants (a one-sample analysis), weak instrument bias is in the direction of the observational association between the risk factor and the outcome (Burgess, Thompson and CRP CHD Genetics Collaboration, 2011). It can also lead to overly narrow confidence intervals and overrejection of the causal null hypothesis (Stock and Yogo, 2002). Bias from the inverse-variance weighted method using the first-order weights and a fixed-effect model has been shown to be similar to that from the two-stage least squares method in a realistic simulation study (Burgess, Butterworth and Thompson, 2013). However, bias and coverage properties have not been investigated for different choices of the weights or for random-effects models.

3 Motivating example: analysis of the causal effect of early menopause on triglycerides

This paper was motivated by a particular implementation of two versions of the inverse-variance weighted method with different choices of weights that gave substantially different answers. A Mendelian randomization analysis was performed to assess the causal effect of early menopause risk on triglycerides using 47 genetic variants. Associations of the genetic variants with early menopause (and their standard errors) were obtained from Day et al. (2015); associations represent number of years earlier menopause per additional effect allele. Associations of the genetic variants with triglycerides (and their standard errors) were obtained from the The Global Lipids Genetics Consortium (2013). These associations are provided in Appendix Table A1 and displayed graphically in Appendix Figure A1. Analyses for the motivating example were performed in Microsoft Excel (Windows 2000 version) and R (version 3.1.2) (R Core Team, 2014).

Fixed-effect inverse-variance weighted methods were performed using the second-order weights (equation 4) and the first-order weights (equation 5). The weights were substantially the same in both cases; 35 out of the 47 weights differed by less than 5%, and 44 of the weights differed by less than 10%. Using the second-order weights (equation 4), the causal effect of early menopause on triglycerides was estimated as 0.0021 (standard error, 0.0037; 95% confidence interval: -0.0052, 0.0095). Using the first-order weights (equation 5), the causal effect estimate was 0.0103 (standard error, 0.0036; 95% confidence interval: 0.0032, 0.0175). These estimates represent the change in triglycerides in standard deviation units per 1 year earlier menopause. The applied implications of this analysis are not the focus of this paper, and depend on the validity of the instrumental variable assumptions for the genetic variants used in the analysis. However, the magnitude of the difference between the estimates (over twice the standard error of the estimates) is striking, and the conclusions from the two analyses would be diametrically opposite. In the first case, the causal null hypothesis that early menopause is a causal risk factor for triglycerides would not be rejected (), whereas in the second case, the causal null hypothesis would be rejected (). By comparison, using the first-order weights and a multiplicative random effects model, the standard error is 0.0103, meaning that the causal null hypothesis would not be rejected ().

It turns out that the genetic variant with the greatest difference between the first- and second-order weights is rs704795, the variant that also has the greatest causal estimate. The estimate from this variant is heavily downweighted in the analysis using the second-order weights compared with using the first-order weights. Omitting this variant from the analysis led to similar estimates using the second- and first-order weights (0.0000 versus ). Another interesting observation is that use of the second-order weights reduced heterogeneity between the causal estimates from each genetic variant (for example, in the multiplicative random-effects model, was 1.69 using the second-order weights compared with 2.83 using the first-order weights). This suggests that, even though the second-order standard errors for the causal estimates from the individual variants will always be greater than the first-order standard errors, precision of the overall causal estimate under a random-effects model may be improved by using the second-order weights when there is heterogeneity between the causal estimates (in this example, in the multiplicative random-effects model using the second-order weights, using the first-order weights).

Estimates from each of the methods are summarized in Table 1.

In general, genetic variants that have large values of and/or small values of will be downweighted by the second-order weights. This means that genetic variants that have large and heterogeneous effects on the outcome compared with other variants and/or are weak will be downweighted. Further methodological investigation is therefore needed to investigate the impact on the bias and coverage properties of inverse-variance weighted methods for Mendelian randomization analyses, and which of the versions of the method should be preferred in applied practice.

4 Simulation study

In this manuscript, we consider estimates from the inverse-variance weighted method using weights from equations (4, second-order) and (5, first-order), and fixed-effect, additive random-effects, and multiplicative random-effects models for combining the estimates from different genetic variants. Code for implementing these methods is provided in the Appendix. Analyses for the simulation study were performed in R (version 3.1.2).

The data-generating model is as follows:

 zij ∼Binomial(2,1/3) independently for j=1,…,20 (9) xi =20∑j=1αjzij+ui+ϵXi yi =βXxi+20∑j=1βZjzij+βUui+ϵYi ui,ϵXi,ϵYi ∼N(0,1) independently αj ∼N(α,0.022) independently.

Individuals are indexed by . The 20 genetic variants , indexed by , are drawn from binomial distributions, corresponding to single nucleotide polymorphisms (SNPs) with minor allele frequency . The risk factor is a linear combination of the genetic variants, a confounder () that is assumed to be unmeasured, and an independent error term (). The risk factor is a linear combination of the genetic variants, the risk factor, confounder, and a further independent error term (). The per allele effects of the genetic variants on the risk factor () are drawn from a normal distribution with mean and variance . The direct effects of the genetic variants on the outcome (, these effects are not via the risk factor) are zero when the genetic variants are valid instrumental variables. The causal effect of the risk factor on the outcome, the main parameter of interest, is . The effect of the confounder on the outcome is .

We consider four scenarios:

1. a one-sample analysis in which the genetic variants are all valid instrumental variables;

2. a one-sample analysis in which the genetic variants have direct effects on the outcome;

3. a two-sample analysis in which the genetic variants are all valid instrumental variables;

4. and a two-sample analysis in which the genetic variants have direct effects on the outcome.

In scenarios 1 and 2, data are generated for participants, and associations with the risk factor and with the outcome are estimated in these participants. In scenarios 3 and 4, data are generated for participants. Associations with the risk factor are estimated in the first 5000 participants, and associations with the outcome in the second 5000 participants. Two-sample analyses are common in Mendelian randomization, particularly when the association estimates are obtained from publicly available data sources (Burgess et al., 2015). In a two-sample analysis, weak instrument bias acts in the direction of the null, and hence should not lead to misleading inferences (Pierce and Burgess, 2013). However, it is common that many participants in large genetic consortia overlap, such that even if the associations with the risk factor and with the outcome are obtained from separate consortia, they may not be estimated in separate participants. Hence, the one-sample and two-sample settings are both of interest in this paper.

In scenarios 1 and 3, the parameters are all set to zero, and the genetic variants are all valid instrumental variables. In scenarios 2 and 4, the parameters are drawn from a normal distribution with mean 0 and variance . This is a situation known as “balanced pleiotropy” (Bowden et al., 2015). Pleiotropy refers to a genetic variant having an independent effect on the outcome that is not via the risk factor (Davey Smith and Hemani, 2014). Balanced pleiotropy means that the pleiotropic effects for all strengths of instrument have mean zero. Such pleiotropic effects should induce heterogeneity between the causal estimates using different genetic variants. Simulations conducted under a multiplicative random-effects model with balanced pleiotropy have suggested that estimates may not be biased on average (Bowden, Davey Smith and Burgess, 2015). Additional simulations for the case of directional (that is, unbalanced) pleiotropy are considered in the Appendix.

Four sets of parameters are considered – two values of the causal effect: (null causal effect), and (positive causal effect); and two values of the confounder effect: (positive confounding), and (negative confounding). Additionally, four values of instrument strength are considered for each set of parameters: . 10 000 simulated datasets are generated in each case.

4.1 Results

Scenarios 1 and 2: Results from scenario 1 (one-sample, valid instruments) and scenario 2 (one-sample, invalid instruments) are presented in Table 2. For each value of the instrument strength, set of parameters, and scenario, the mean estimate and empirical power of the 95% confidence interval (estimate plus or minus 1.96 times the standard error) to reject the null hypothesis is given. The coverage is 100% minus the power; power under the null hypothesis should be 5%. The Monte Carlo standard error for the mean estimate is around 0.001 or less, and for the power is 0.2% under the null, and at most 0.5% otherwise. Additionally, to judge the instrument strength, the mean F statistic and the mean coefficient of determination ( statistic) are given in each case.

With a null causal effect, the results demonstrate the well-known bias and inflated Type 1 error rate of instrumental variable estimates with weak instruments in a one-sample setting. Although bias is similar for both choices of weights (slightly less with the first-order weights), coverage rates are much worse with the first-order weights. Neither the additive nor the multiplicative random-effects models detect heterogeneity in the vast majority of cases (particularly for weaker instruments) with the second-order weights. With the first-order weights, heterogeneity is detected in a greater proportion of simulated datasets. In scenario 1, heterogeneity is not present in the underlying data-generating model, and only estimated by chance; in scenario 2, heterogeneity is expected. For the second-order weights, coverage properties are similar in scenarios 1 and 2; whereas for the first-order weights, coverage properties are worse in scenario 2 for the fixed-effect model, but improved for the random-effects models. For weaker instruments, coverage properties are best using the second-order weights, whereas for stronger instruments, estimates using the first-order weights and a random-effects model perform almost as well, and occasionally better particularly when there is heterogeneity (scenario 2). However, there is inflation of Type 1 error rates even in the best-case scenarios.

With a positive causal effect, estimates with the first-order weights generally have better power to detect a causal effect than those using the second-order weights, particularly with weaker instruments. However, in the light of the Type 1 error rate inflation, this property should not be overvalued. Making fewer Type 2 errors (fewer false negative findings) at the expense of making more Type 1 errors (more false positive findings) is not generally a desirable trade-off.

Additional results from scenarios 1 and 2 are presented in Appendix Table A3. For each value of the instrument strength, the (Monte Carlo) standard deviation and the mean standard error of estimates are presented. This helps judge whether uncertainty in the effect estimates is correctly accounted for in the standard errors.

The estimates using second-order weights are the least variable throughout, with the lowest standard deviations. The standard deviation of estimates using second-order weights was always less than the mean standard error of the estimates. In contrast, for scenario 1, the estimates using first-order weights were more variable, but generally had lower average standard errors. This was always true for the fixed-effect analyses, and usually true for the random-effects analyses. However, when there was heterogeneity in the causal estimates identified by the instrumental variables (scenario 2), mean standard errors for the random-effects analyses using first-order weights could be greater than those using second-order weights, despite the second-order standard errors for each causal estimate being uniformly than the first-order standard errors. In scenario 2, mean standard errors for the fixed-effect analyses were generally similar to those in scenario 1, but the standard deviations of the estimates were increased. For the random-effects analyses using the first-order weights in scenario 2, mean standard errors and standard deviations were similar in magnitude. However, mean standard errors using the second-order weights were typically slightly lower, with no loss in coverage (recall Table 2).

Under the null, standard deviations and mean standard errors are similar whether there is positive or negative confounding, whereas under the alternative, standard errors appear to be wider when confounding is in the same direction as the causal effect, and narrower when confounding is in the opposite direction. This has previously been observed (Burgess and Thompson, 2012); see Figure 3 of that reference for a potential explanation.

Scenarios 3 and 4: Results from scenario 3 (two-sample, valid instruments) and scenario 4 (two-sample, invalid instruments) are presented in Table 3 for the mean and power and in Appendix Table A4 for the standard deviation and standard error. These results demonstrate the well-known bias in the direction of the null in the two-sample setting.

With a null causal effect, no bias is observed. Coverage levels for the second-order weights are conservative, with power below the nominal 5% level. By contrast, in scenario 3, coverage levels with the first-order weights are close to nominal levels, with slight undercoverage for random-effects models. In scenario 4, there is inflation of Type 1 error rates with the first-order weights for a fixed-effect model, but coverage for both the additive and multiplicative random-effects models is close to nominal levels.

With a positive causal effect, bias is in the direction of the null. The bias is more severe using the second-order weights. Power to detect a causal effect is substantially lower using the second-order weights than using the first-order weights, particularly for weaker instruments.

For the first-order weights, mean standard errors are fairly close to the standard deviations of estimates for the fixed-effect model when there is no heterogeneity in the causal effects, and for the random-effects models when there is heterogeneity in the causal effects. In contrast, for the second-order weights, the mean standard errors are larger than the standard deviations throughout. This corresponds with the coverage properties: in a two-sample setting using first-order weights, estimates are unbiased under the null with correct rejection rates, whereas using second-order weights, rejection rates are conservative.

Choice of random-effects model: As for choosing between the additive and multiplicative random-effects models, with the second-order weights, there was little difference between the results, or even with the results for a fixed-effect model. However, as viewed in the motivating example, there will be a difference if the level of heterogeneity is increased. With the first-order weights, bias was generally slightly less with the additive random-effect model. Coverage under the null was better with an additive random-effects model, and power to detect a causal effect was better with a multiplicative random-effects model. However, differences were slight. Because of the better properties under the null, we therefore prefer the additive random-effects model for the scenarios considered in this paper, although the preference is not a strong one.

Directional pleiotropy: Results with directional pleiotropy are presented in Appendix Table A5. In brief, the results echo those with no pleiotropy and with balanced pleiotropy: the importance of random-effects models, and the preference for use of second-order weights in a one-sample setting, and first-order weights in a two-sample setting.

4.2 Additional scenario: extreme outlying variants

In the motivating example, the difference between estimates seemed to be driven by a single rogue variant. In order to better evaluate bias and coverage in this scenario, we considered an additional simulation scenario 5. Rather than generating the direct effects of the genetic variants on the outcome (the parameters) from a normal distribution with mean 0 and variance , instead we generated them from a distribution with 2 degrees of freedom, and multiplied the result by 0.02. The distribution with a small number of degrees of freedom has much heavier tails than a normal distribution, and so extreme outliers will be more frequent. With 2 degrees of freedom, the variance of the distribution is not even defined. Simulation results are only considered in the one-sample setting and under the null (), as inflated Type 1 error rates in this scenario are the primary concern.

In Table 4, results are given for the inverse-variance weighted methods with different choices of weights and different models for combining the estimates. With a fixed-effect model, coverage rates for the second-order weights are similar to those in scenario 2 with the normally distributed direct effects. For the first-order weights, coverage rates are substantially worse and well above the nominal 5% level even for the strongest instruments considered in this paper, although bias is similar to that in scenario 2. This corresponds to the motivating example, in which the outlying variant had a large influence on the pooled estimate using the first-order weights, but was heavily downweighted using the second-order weights. However, for a random-effects model using the second-order weights, particularly with the multiplicative random-effects model and for the additive random-effects model with weaker instruments, results were similar to those with a fixed-effect model. In contrast, for a random-effects model with the first-order weights, mean estimates were generally closer to the null (with one notable exception – scenario 5b, – that was mostly driven by a single aberrant estimate) and coverage levels were much improved. Coverage levels with a random-effects model were generally slightly better with the first-order weights than with the second-order weights, although not uniformly and the difference was slight. As observed in the motivating example, and particularly with weaker instruments, heterogeneity is more often detected using the first-order weights, as the second-order weights tend to downweight the influence of the outlying variants.

5 Discussion

Several high-profile Mendelian randomization analyses have employed summarized data and some version of an inverse-variance weighted method. These include analyses of the causal effect of blood pressure on coronary heart disease risk (Ehret et al., 2011), height on coronary heart disease risk (Nelson et al., 2015), adiponectin on type 2 diabetes risk (Dastani et al., 2012), lipids on type 2 diabetes risk (Fall et al., 2015), and telomere length on risk of various cancers (Zhang et al., 2015), amongst several others. The statistical properties of estimates from the inverse-variance weighted method are therefore of considerable interest.

In this paper, we demonstrated that Type 1 error rates for the inverse-variance weighted method as it was initially proposed (first-order weights, fixed-effect model) are likely to be inflated in a one-sample Mendelian randomization setting either when the instruments are weak, or when there is heterogeneity between the causal estimates targeted by different genetic variants. This can be resolved either by using second-order weights or a random-effects model to combine the estimates from multiple genetic variants. These approaches affect the analysis in different ways: the second-order weights tend to downweight the influence of weak and heterogeneous variants on the overall causal estimate, whereas the random-effects models tend to increase standard errors by allowing for heterogeneity between the causal estimates in the model. While both approaches can be applied simultaneously, our simulations indicate that heterogeneity is less substantial when using the second-order weights. However, there is little disadvantage in assuming a random-effects model, as in the absence of heterogeneity, the fixed-effect analysis is recovered, and in the presence of heterogeneity, the random-effects analysis is more appropriate. Our results provide slight preference for an additive random-effects model over a multiplicative random-effects model.

In a two-sample Mendelian randomization setting, weak instruments do not lead to inflated Type 1 error rates but rather attenuate of estimates towards the null. The use of second-order weights was demonstrated to lead to conservative inference, whereas first-order weights gave correct coverage rates under the null. When there was heterogeneity in the causal estimates from different genetic variants, which was simulated to arise due to genetic variants having pleiotropic effects, a fixed-effect model with first-order weights was shown to lead to undercoverage, although this was corrected by use of a random-effects model.

A conclusion from this paper is the need to assess heterogeneity between the causal estimates from different genetic variants prior to performing a Mendelian randomization analysis based on multiple genetic variants, for example by a scatter plot of the gene–risk factor and gene–outcome associations (Appendix Figure A1). The presence of heterogeneous variants is likely to indicate violation of the instrumental variable assumptions for some of the variants, and can lead to misleading estimates as observed in the motivating example. Assessment for heterogeneity is also relevant when performing an analysis using individual-level data, for example using a two-stage least squares or allele score method.

5.1 Limitation of simulation studies

Our conclusions are limited as they are based on simulation studies. This is by necessity, as the properties of the estimators that we want to assess are finite-sample properties, not asymptotic properties. Our findings may have differed if we had considered a different data-generating mechanism, or more substantial heterogeneity between estimates from genetic variants. However, the findings are in line with theoretical considerations, and we believe the scenarios that we have chosen to be representative of a typical Mendelian randomization investigation in practice.

5.2 Unbalanced pleiotropy and robust methods (Egger regression, median-based approaches)

In particular, we mostly considered scenarios in this paper corresponding to balanced pleiotropy. In the case of unbalanced (or directional) pleiotropy, causal estimates from inverse-variance weighted methods are biased and Type 1 error rates are inflated in all settings, even in the asymptotic limit (Bowden et al., 2015). This can be resolved in a number of ways. In Egger regression, we perform a weighted linear regression of the gene–outcome association estimates () on the gene–risk factor association estimates () in the same way as in an inverse-variance weighted method, except that an intercept term is included in the regression model. This intercept term represents the average direct effect of the genetic variants on the outcome. (It is additionally required that all genetic variants are orientated such that the estimates are all positive, or are all negative.) The causal estimate from Egger regression is the slope parameter from this regression model. It is a consistent estimate of the causal effect under the alternative assumption that the direct effects of the genetic variants are uncorrelated with the instrument strength; this is known as the InSIDE (instrument strength independent of direct effect) assumption. In the notation of the data-generating model of equation (9), the parameters must be uncorrelated with the parameters; in the balanced pleiotropy examples of this paper, these parameters are drawn from independent distributions. This is a weaker assumption than the standard instrumental variable assumptions (the parameters all equal zero) or the assumption of balanced pleiotropy (the parameters have mean zero).

Similar considerations as to the choice of weights in Egger regression could be considered; the original proposal was equivalent to using the first-order weights. Informal simulations (not presented) have suggested that the same conclusions from this paper also hold for Egger regression (particularly the use of random-effects models). However, a full investigation would require simulating data with unbalanced pleiotropy (potentially both when the InSIDE assumption is satisfied and when it is violated); this is considered to be beyond the scope of this paper.

One notable difference about Egger regression is that if the genetic variants are allowed to have direct effects on the outcome, then heterogeneity in the causal estimates from individual variants is expected. Therefore, while heterogeneity in an inverse-variance weighted analysis is unwelcome and a potential sign that the assumptions are not satisfied, heterogeneity in the Egger method is a natural consequence of weakening the instrumental variable assumptions and does not necessarily invalidate the analysis.

Another approach for dealing with unbalanced pleiotropy is a median-based approach. The median of the causal estimates from each of the genetic variants taken individually is a consistent estimate of the causal effect under the assumption that at least 50% of the genetic variants are valid instrumental variables (Han, 2008). This is a different assumption to the InSIDE assumption, and neither assumption includes all cases of the other. Confidence intervals for the median can be obtained by bootstrapping; we suggest estimating a bootstrap standard error and forming confidence intervals from the standard error (Bowden et al., 2015). A weighted median estimator can also be obtained using inverse-variance weights in a weighted median function (Bowden et al., 2015). This method may have better asymptotic properties than an inverse-variance weighted method in a number of cases, as outlying estimates do not influence the median of the distribution. Simulations performed using second- and first-order weights from the delta method suggested that weighted median estimates were not sensitive to the particular choice of weighting function. In a median-based approach, the choice of weights influences not only the bias and variability of estimates, but also the identification condition, as the consistency criterion for a weighted median estimator is that 50% of the weight in the analysis corresponds to valid instrumental variables. Hence, in some cases, the simple (unweighted) median estimator may be preferred even if it is less efficient.

5.3 Overlap between the samples in a ‘two-sample’ analysis

In practice, before following the recommendation to use first-order weights in a two-sample Mendelian randomization setting, it is advisable to check whether the samples used to estimate the gene–risk factor and the gene–outcome associations truly do not overlap. In the motivating example of the paper, genetic associations with early menopause are obtained from a consortium of 33 studies, and genetic associations with triglycerides from a consortium of 23 studies. Although the consortia appear to be different, in fact, at least 17 of the studies are included in both consortia, meaning that the analysis is not a true two-sample analysis. It is not clear exactly the extent of the overlap without having the individual-level data, but it is likely to be substantial.

Although the full second-order expression for the variance of a causal estimate (equation 2) includes a term that depends on the overlap between the two datasets, in this paper we have set even in a one-sample setting. This was undertaken for computational simplicity in the simulation study setting. If the individual-level data were available, an estimate of could be obtained by bootstrapping the samples, and calculating the correlation between the bootstrapped distributions of and for each . However, this was infeasible in the simulation study. Additionally, if the individual-level data are not available, it is unclear how to estimate . A sensitivity analysis can be performed for the value of ; results for the motivating example of this paper are shown in Appendix Table A2. We see that different choices of lead to similar causal estimates and 95% confidence intervals for each of the inverse-variance weighted methods.

5.4 Interpretation of a random-effects estimate

A theoretical concern in recommending the use of random-effects models for Mendelian randomization is the interpretation of the random-effects estimate. Under the assumptions of linearity and no effect modification, and in particular under the stable unit treatment value assumption (SUTVA (Cox, 1958) – this states that the effect on the outcome of modifying the risk factor should be the same for all possible interventions on the risk factor, also expressed as “no multiple versions of treatment” (VanderWeele and Hernán, 2013)), the causal estimates from different instrumental variables should target the same causal parameter. However, in reality, taking the context of the motivating example, different interventions on age at menopause (such as ooectomy, hysterectomy, and hormone therapy) may have different effects on triglyceride levels; similar heterogeneity is expected for genetic variants that affect age at menopause via different biological pathways. By allowing for heterogeneity in causal estimates from different genetic variants, the notion of a single causal effect of the risk factor on the outcome is lost, and it is not clear for what intervention on the risk factor the causal estimate is targeting. Additionally, if the choice of genetic variants changes, then the causal parameter also changes, as the random-effects distribution is taken across a different set of variants. The random-effects estimate is correctly interpreted not as targeting a common causal effect, but as targeting the average value of the distribution of causal effects identified by the different variants (Riley et al., 2011). This subtlety is not unique to causal estimation, rather it is relevant in meta-analysis more widely (Higgins, Thompson and Spiegelhalter, 2009). However, heterogeneity is more forgiveable in meta-analysis; it could be argued that any deviation from homogeneity should be interpreted as evidence that the instrumental variable assumptions are violated for at least one of the genetic variants, and so a causal estimate based on all the genetic variants should not be presented.

We take a practical approach, and view these theoretical concerns as secondary to the primary concern of obtaining reliable causal inferences (Burgess, Butterworth and Thompson, 2015). Our view is that a literal interpretation of causal effect estimates from Mendelian randomization is rarely justified, due to differences between the way in which genetic variants influence the risk factor and any potential clinical intervention on the risk factor in practice (Burgess et al., 2012). However, if there is substantial heterogeneity, or if there are individual genetic variants that clear outliers, then the overall causal estimate is likely to be unreliable even as a test of causality, and the instrumental variable assumptions should be examined carefully, particularly for the outlying variants.

5.5 Conclusion

In conclusion, in a Mendelian randomization analysis using summarized data in a (strict) two-sample setting (that is, when there is no overlap between the datasets in which associations with the risk factor and with the outcome are estimated), the inverse-variance weighted method with first-order weights may be preferred, although a random-effects model for combining the causal effects from the individual genetic variants should be used. In a one-sample setting, or if there is any overlap between the datasets, then a random-effects model using the second-order weights should be preferred to avoid false-positive findings. If the overlap is not substantial, then an analysis using the first-order weights may be presented as a sensitivity analysis, as it may have increased power to detect a causal effect.

References

• Ahmad et al. (2015) {barticle}[author] \bauthor\bsnmAhmad, \bfnmOmar S\binitsO. S., \bauthor\bsnmMorris, \bfnmJohn A\binitsJ. A., \bauthor\bsnmMujammami, \bfnmMuhammad\binitsM., \bauthor\bsnmForgetta, \bfnmVincenzo\binitsV., \bauthor\bsnmLeong, \bfnmAaron\binitsA., \bauthor\bsnmLi, \bfnmRui\binitsR., \bauthor\bsnmTurgeon, \bfnmMaxime\binitsM., \bauthor\bsnmGreenwood, \bfnmCelia MT\binitsC. M., \bauthor\bsnmThanassoulis, \bfnmGeorge\binitsG., \bauthor\bsnmMeigs, \bfnmJames B\binitsJ. B. \betalet al. (\byear2015). \btitleA Mendelian randomization study of the effect of type-2 diabetes on coronary heart disease. \bjournalNature Communications \bvolume6 \bpages7060. \bdoi10.1038/ncomms8060 \endbibitem
• Baum, Schaffer and Stillman (2003) {barticle}[author] \bauthor\bsnmBaum, \bfnmCF\binitsC., \bauthor\bsnmSchaffer, \bfnmME\binitsM. \AND\bauthor\bsnmStillman, \bfnmS\binitsS. (\byear2003). \btitleInstrumental variables and GMM: Estimation and testing. \bjournalStata Journal \bvolume3 \bpages1–31. \endbibitem
• Borenstein et al. (2009) {bbook}[author] \bauthor\bsnmBorenstein, \bfnmM.\binitsM., \bauthor\bsnmHedges, \bfnmL. V.\binitsL. V., \bauthor\bsnmHiggins, \bfnmJ. P. T.\binitsJ. P. T. \AND\bauthor\bsnmRothstein, \bfnmH. R.\binitsH. R. (\byear2009). \btitleIntroduction to meta-analysis. Chapter 34: Generality of the basic inverse-variance method. \bpublisherWiley. \endbibitem
• Bowden, Davey Smith and Burgess (2015) {barticle}[author] \bauthor\bsnmBowden, \bfnmJack\binitsJ., \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG. \AND\bauthor\bsnmBurgess, \bfnmStephen\binitsS. (\byear2015). \btitleMendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. \bjournalInternational Journal of Epidemiology \bvolume44 \bpages512–525. \endbibitem
• Bowden et al. (2015) {bunpublished}[author] \bauthor\bsnmBowden, \bfnmJack\binitsJ., \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG., \bauthor\bsnmHaycock, \bfnmPhilip C\binitsP. C. \AND\bauthor\bsnmBurgess, \bfnmStephen\binitsS. (\byear2015). \btitleConsistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. \bnoteAvailable at https://www.academia.edu/15479132/Consistent. \endbibitem
• Burgess, Butterworth and Thompson (2013) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS., \bauthor\bsnmButterworth, \bfnmA.\binitsA. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2013). \btitleMendelian randomization analysis with multiple genetic variants using summarized data. \bjournalGenetic Epidemiology \bvolume37 \bpages658–665. \bdoi10.1002/gepi.21758 \endbibitem
• Burgess, Butterworth and Thompson (2015) {barticle}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmButterworth, \bfnmAdam S\binitsA. S. \AND\bauthor\bsnmThompson, \bfnmJohn R\binitsJ. R. (\byear2015). \btitleBeyond Mendelian randomization: how to interpret evidence of shared genetic predictors. \bjournalJournal of Clinical Epidemiology. \bdoi10.1016/j.jclinepi.2015.08.001 \endbibitem
• Burgess, Dudbridge and Thompson (2015a) {barticle}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmDudbridge, \bfnmFrank\binitsF. \AND\bauthor\bsnmThompson, \bfnmSimon G\binitsS. G. (\byear2015a). \btitleRe: “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects”. \bjournalAmerican Journal of Epidemiology \bvolume181 \bpages290–291. \endbibitem
• Burgess, Dudbridge and Thompson (2015b) {bunpublished}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmDudbridge, \bfnmF\binitsF. \AND\bauthor\bsnmThompson, \bfnmSG\binitsS. (\byear2015b). \btitleCombining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. \bnoteAvailable at https://www.academia.edu/15479109/Combining. \endbibitem
• Burgess and Thompson (2011) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2011). \btitleBias in causal estimates from Mendelian randomization studies with weak instruments. \bjournalStatistics in Medicine \bvolume30 \bpages1312–1323. \bdoi10.1002/sim.4197 \endbibitem
• Burgess, Thompson and CRP CHD Genetics Collaboration (2011) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS., \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmCRP CHD Genetics Collaboration (\byear2011). \btitleAvoiding bias from weak instruments in Mendelian randomization studies. \bjournalInternational Journal of Epidemiology \bvolume40 \bpages755–764. \bdoi10.1093/ije/dyr036 \endbibitem
• Burgess and Thompson (2012) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS.\binitsS. \AND\bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. (\byear2012). \btitleImprovement of bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. \bjournalStatistics in Medicine \bvolume31 \bpages1582–1600. \bdoi10.1002/sim.4498 \endbibitem
• Burgess and Thompson (2015) {bbook}[author] \bauthor\bsnmBurgess, \bfnmStephen\binitsS. \AND\bauthor\bsnmThompson, \bfnmSimon G\binitsS. G. (\byear2015). \btitleMendelian randomization: methods for using genetic variants in causal estimation. \bpublisherChapman & Hall. \endbibitem
• Burgess et al. (2012) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmButterworth, \bfnmA\binitsA., \bauthor\bsnmMalarstig, \bfnmA\binitsA. \AND\bauthor\bsnmThompson, \bfnmSG\binitsS. (\byear2012). \btitleUse of Mendelian randomisation to assess potential benefit of clinical intervention. \bjournalBritish Medical Journal \bvolume345 \bpagese7325. \bdoi10.1136/bmj.e7325 \endbibitem
• Burgess et al. (2015) {barticle}[author] \bauthor\bsnmBurgess, \bfnmS\binitsS., \bauthor\bsnmScott, \bfnmRA\binitsR., \bauthor\bsnmTimpson, \bfnmNJ\binitsN., \bauthor\bsnmDavey Smith, \bfnmG\binitsG., \bauthor\bsnmThompson, \bfnmSG\binitsS. \AND\bauthor\bsnmEPIC-InterAct Consortium (\byear2015). \btitleUsing published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. \bjournalEuropean Journal of Epidemiology \bvolume30 \bpages543–552. \bdoi10.1007/s10654-015-0011-z \endbibitem
• The Global Lipids Genetics Consortium (2013) {barticle}[author] \bauthor\bsnmThe Global Lipids Genetics Consortium (\byear2013). \btitleDiscovery and refinement of loci associated with lipid levels. \bjournalNature Genetics \bvolume45 \bpages1274–1283. \bdoi10.1038/ng.2797 \endbibitem
• Cox (1958) {bbook}[author] \bauthor\bsnmCox, \bfnmD. R.\binitsD. R. (\byear1958). \btitlePlanning of experiments. Section 2: Some key assumptions. \bpublisherWiley. \endbibitem
• Dastani et al. (2012) {barticle}[author] \bauthor\bsnmDastani, \bfnmZari\binitsZ., \bauthor\bsnmHivert, \bfnmMarie-France\binitsM.-F., \bauthor\bsnmTimpson, \bfnmNicholas\binitsN., \bauthor\bsnmPerry, \bfnmJohn RB\binitsJ. R., \bauthor\bsnmYuan, \bfnmXin\binitsX., \bauthor\bsnmScott, \bfnmRobert A\binitsR. A., \bauthor\bsnmHenneman, \bfnmPeter\binitsP., \bauthor\bsnmHeid, \bfnmIris M\binitsI. M., \bauthor\bsnmKizer, \bfnmJorge R\binitsJ. R., \bauthor\bsnmLyytikäinen, \bfnmLeo-Pekka\binitsL.-P. \betalet al. (\byear2012). \btitleNovel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: A multi-ethnic meta-analysis of 45,891 individuals. \bjournalPLOS Genetics \bvolume8 \bpagese1002607. \bdoi10.1371/journal.pgen.1002607 \endbibitem
• Davey Smith and Ebrahim (2003) {barticle}[author] \bauthor\bsnmDavey Smith, \bfnmG\binitsG. \AND\bauthor\bsnmEbrahim, \bfnmS\binitsS. (\byear2003). \btitle‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? \bjournalInternational Journal of Epidemiology \bvolume32 \bpages1–22. \bdoi10.1093/ije/dyg070 \endbibitem
• Davey Smith and Hemani (2014) {barticle}[author] \bauthor\bsnmDavey Smith, \bfnmGeorge\binitsG. \AND\bauthor\bsnmHemani, \bfnmGibran\binitsG. (\byear2014). \btitleMendelian randomization: genetic anchors for causal inference in epidemiological studies. \bjournalHuman Molecular Genetics \bvolume23 \bpagesR89–98. \bdoi10.1093/hmg/ddu328 \endbibitem
• Day et al. (2015) {barticle}[author] \bauthor\bsnmDay, \bfnmFelix\binitsF. \betalet al. (\byear2015). \btitleLarge-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. \bjournalNature Genetics. \bdoi10.1038/ng.3412 \endbibitem
• DerSimonian and Laird (1986) {barticle}[author] \bauthor\bsnmDerSimonian, \bfnmR.\binitsR. \AND\bauthor\bsnmLaird, \bfnmN.\binitsN. (\byear1986). \btitleMeta-analysis in clinical trials. \bjournalControlled Clinical Trials \bvolume7 \bpages177–188. \bdoi10.1016/0197-2456(86)90046-2 \endbibitem
• Didelez, Meng and Sheehan (2010) {barticle}[author] \bauthor\bsnmDidelez, \bfnmV.\binitsV., \bauthor\bsnmMeng, \bfnmS.\binitsS. \AND\bauthor\bsnmSheehan, \bfnmN. A.\binitsN. A. (\byear2010). \btitleAssumptions of IV methods for observational epidemiology. \bjournalStatistical Science \bvolume25 \bpages22–40. \bdoi10.1214/09-sts316 \endbibitem
• Didelez and Sheehan (2007) {barticle}[author] \bauthor\bsnmDidelez, \bfnmV\binitsV. \AND\bauthor\bsnmSheehan, \bfnmN\binitsN. (\byear2007). \btitleMendelian randomization as an instrumental variable approach to causal inference. \bjournalStatistical Methods in Medical Research \bvolume16 \bpages309–330. \bdoi10.1177/0962280206077743 \endbibitem
• Fall et al. (2015) {barticle}[author] \bauthor\bsnmFall, \bfnmTove\binitsT., \bauthor\bsnmXie, \bfnmWeijia\binitsW., \bauthor\bsnmPoon, \bfnmWenny\binitsW., \bauthor\bsnmYaghootkar, \bfnmHanieh\binitsH., \bauthor\bsnmMägi, \bfnmReedik\binitsR., \bauthor\bsnmKnowles, \bfnmJoshua W\binitsJ. W., \bauthor\bsnmLyssenko, \bfnmValeriya\binitsV., \bauthor\bsnmWeedon, \bfnmMichael\binitsM., \bauthor\bsnmFrayling, \bfnmTimothy M\binitsT. M. \AND\bauthor\bsnmIngelsson, \bfnmErik\binitsE. (\byear2015). \btitleUsing genetic variants to assess the relationship between circulating lipids and type 2 diabetes. \bjournalDiabetes \bvolumedoi:10.2337/db14-1710. \bdoi10.2337/db14-1710 \endbibitem
• The International Consortium for Blood Pressure Genome-Wide Association Studies (2011) {barticle}[author] \bauthor\bsnmThe International Consortium for Blood Pressure Genome-Wide Association Studies (\byear2011). \btitleGenetic variants in novel pathways influence blood pressure and cardiovascular disease risk. \bjournalNature \bvolume478 \bpages103–109. \bdoi10.1038/nature10405 \endbibitem
• Greenland (2000) {barticle}[author] \bauthor\bsnmGreenland, \bfnmS\binitsS. (\byear2000). \btitleAn introduction to instrumental variables for epidemiologists. \bjournalInternational Journal of Epidemiology \bvolume29 \bpages722–729. \bdoi10.1093/ije/29.4.722 \endbibitem
• Han (2008) {barticle}[author] \bauthor\bsnmHan, \bfnmC.\binitsC. (\byear2008). \btitleDetecting invalid instruments using L1-GMM. \bjournalEconomics Letters \bvolume101 \bpages285–287. \endbibitem
• Higgins, Thompson and Spiegelhalter (2009) {barticle}[author] \bauthor\bsnmHiggins, \bfnmJ.\binitsJ., \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmSpiegelhalter, \bfnmD. J.\binitsD. J. (\byear2009). \btitleA re-evaluation of random-effects meta-analysis. \bjournalJournal of the Royal Statistical Society: Series A (Statistics in Society) \bvolume172 \bpages137–159. \bdoi10.1111/j.1467-985x.2008.00552.x \endbibitem
• Johnson (2013) {btechreport}[author] \bauthor\bsnmJohnson, \bfnmToby\binitsT. (\byear2013). \btitleEfficient calculation for multi-SNP genetic risk scores \btypeTechnical Report, \bpublisherThe Comprehensive R Archive Network. \bnoteAvailable at http://cran.r-project.org/web/packages/gtx/vignettes/ashg2012.pdf [last accessed 2014/11/19]. \endbibitem
• Lawlor et al. (2008) {barticle}[author] \bauthor\bsnmLawlor, \bfnmDA\binitsD., \bauthor\bsnmHarbord, \bfnmRM\binitsR., \bauthor\bsnmSterne, \bfnmJAC\binitsJ., \bauthor\bsnmTimpson, \bfnmN\binitsN. \AND\bauthor\bsnmDavey Smith, \bfnmG\binitsG. (\byear2008). \btitleMendelian randomization: using genes as instruments for making causal inferences in epidemiology. \bjournalStatistics in Medicine \bvolume27 \bpages1133–1163. \bdoi10.1002/sim.3034 \endbibitem
• Martens et al. (2006) {barticle}[author] \bauthor\bsnmMartens, \bfnmE. P.\binitsE. P., \bauthor\bsnmPestman, \bfnmW. R.\binitsW. R., \bauthor\bparticlede \bsnmBoer, \bfnmA.\binitsA., \bauthor\bsnmBelitser, \bfnmS. V.\binitsS. V. \AND\bauthor\bsnmKlungel, \bfnmO. H.\binitsO. H. (\byear2006). \btitleInstrumental variables: application and limitations. \bjournalEpidemiology \bvolume17 \bpages260–267. \bdoi10.1097/01.ede.0000215160.88317.cb \endbibitem
• Nelson et al. (2015) {barticle}[author] \bauthor\bsnmNelson, \bfnmChristopher P.\binitsC. P., \bauthor\bsnmHamby, \bfnmStephen E.\binitsS. E., \bauthor\bsnmSaleheen, \bfnmDanish\binitsD., \bauthor\bsnmHopewell, \bfnmJemma C.\binitsJ. C., \bauthor\bsnmZeng, \bfnmLingyao\binitsL., \bauthor\bsnmAssimes, \bfnmThemistocles L.\binitsT. L., \bauthor\bsnmKanoni, \bfnmStavroula\binitsS., \bauthor\bsnmWillenborg, \bfnmChristina\binitsC., \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmAmouyel, \bfnmPhillipe\binitsP., \bauthor\bsnmAnand, \bfnmSonia\binitsS., \bauthor\bsnmBlankenberg, \bfnmStefan\binitsS., \bauthor\bsnmBoehm, \bfnmBernhard O.\binitsB. O., \bauthor\bsnmClarke, \bfnmRobert J.\binitsR. J., \bauthor\bsnmCollins, \bfnmRory\binitsR., \bauthor\bsnmDedoussis, \bfnmGeorge\binitsG., \bauthor\bsnmFarrall, \bfnmMartin\binitsM., \bauthor\bsnmFranks, \bfnmPaul W.\binitsP. W., \bauthor\bsnmGroop, \bfnmLeif\binitsL., \bauthor\bsnmHall, \bfnmAlistair S.\binitsA. S., \bauthor\bsnmHamsten, \bfnmAnders\binitsA., \bauthor\bsnmHengstenberg, \bfnmChristian\binitsC., \bauthor\bsnmHovingh, \bfnmG. Kees\binitsG. K., \bauthor\bsnmIngelsson, \bfnmErik\binitsE., \bauthor\bsnmKathiresan, \bfnmSekar\binitsS., \bauthor\bsnmKee, \bfnmFrank\binitsF., \bauthor\bsnmKönig, \bfnmInke R.\binitsI. R., \bauthor\bsnmKooner, \bfnmJaspal\binitsJ., \bauthor\bsnmLehtimÃ¤ki, \bfnmTerho\binitsT., \bauthor\bsnmMärz, \bfnmWinifred\binitsW., \bauthor\bsnmMcPherson, \bfnmRuth\binitsR., \bauthor\bsnmMetspalu, \bfnmAndres\binitsA., \bauthor\bsnmNieminen, \bfnmMarkku S.\binitsM. S., \bauthor\bsnmO’Donnell, \bfnmChristopher J.\binitsC. J., \bauthor\bsnmPalmer, \bfnmColin N. A.\binitsC. N. A., \bauthor\bsnmPeters, \bfnmAnnette\binitsA., \bauthor\bsnmPerola, \bfnmMarkus\binitsM., \bauthor\bsnmReilly, \bfnmMuredach P.\binitsM. P., \bauthor\bsnmRipatti, \bfnmSamuli\binitsS., \bauthor\bsnmRoberts, \bfnmRobert\binitsR., \bauthor\bsnmSalomaa, \bfnmVeikko\binitsV., \bauthor\bsnmShah, \bfnmSvati H.\binitsS. H., \bauthor\bsnmSchreiber, \bfnmStefan\binitsS., \bauthor\bsnmSiegbahn, \bfnmAgneta\binitsA., \bauthor\bsnmThorsteinsdottir, \bfnmUnnur\binitsU., \bauthor\bsnmVeronesi, \bfnmGiovani\binitsG., \bauthor\bsnmWareham, \bfnmNicholas\binitsN., \bauthor\bsnmWiller, \bfnmCristen J.\binitsC. J., \bauthor\bsnmZalloua, \bfnmPierre A.\binitsP. A., \bauthor\bsnmErdmann, \bfnmJeanette\binitsJ., \bauthor\bsnmDeloukas, \bfnmPanos\binitsP., \bauthor\bsnmWatkins, \bfnmHugh\binitsH., \bauthor\bsnmSchunkert, \bfnmHeribert\binitsH., \bauthor\bsnmDanesh, \bfnmJohn\binitsJ., \bauthor\bsnmThompson, \bfnmJohn R.\binitsJ. R. \AND\bauthor\bsnmSamani, \bfnmNilesh J.\binitsN. J. (\byear2015). \btitleGenetically determined height and coronary artery disease. \bjournalNew England Journal of Medicine \bvolume372 \bpages1608–1618. \bdoi10.1056/NEJMoa1404881 \endbibitem
• Nitsch et al. (2006) {barticle}[author] \bauthor\bsnmNitsch, \bfnmD.\binitsD., \bauthor\bsnmMolokhia, \bfnmM.\binitsM., \bauthor\bsnmSmeeth, \bfnmL.\binitsL., \bauthor\bsnmDeStavola, \bfnmB. L.\binitsB. L., \bauthor\bsnmWhittaker, \bfnmJ. C.\binitsJ. C. \AND\bauthor\bsnmLeon, \bfnmD. A.\binitsD. A. (\byear2006). \btitleLimits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. \bjournalAmerican Journal of Epidemiology \bvolume163 \bpages397–403. \bdoi10.1093/aje/kwj062 \endbibitem
• Pearl (2000) {bbook}[author] \bauthor\bsnmPearl, \bfnmJ.\binitsJ. (\byear2000). \btitleCausality: models, reasoning, and inference. \bpublisherCambridge University Press. \endbibitem
• Pierce and Burgess (2013) {barticle}[author] \bauthor\bsnmPierce, \bfnmB.\binitsB. \AND\bauthor\bsnmBurgess, \bfnmS\binitsS. (\byear2013). \btitleEfficient design for Mendelian randomization studies: subsample and two-sample instrumental variable estimators. \bjournalAmerican Journal of Epidemiology \bvolume178 \bpages1177–1184. \bdoi10.1093/aje/kwt084 \endbibitem
• Riley et al. (2011) {barticle}[author] \bauthor\bsnmRiley, \bfnmRichard D\binitsR. D., \bauthor\bsnmHiggins, \bfnmJulian PT\binitsJ. P., \bauthor\bsnmDeeks, \bfnmJonathan J\binitsJ. J. \betalet al. (\byear2011). \btitleInterpretation of random effects meta-analyses. \bjournalBritish Medical Journal \bvolume342 \bpagesd549. \bdoi10.1136/bmj.d549 \endbibitem
• Schatzkin et al. (2009) {barticle}[author] \bauthor\bsnmSchatzkin, \bfnmA.\binitsA., \bauthor\bsnmAbnet, \bfnmC. C.\binitsC. C., \bauthor\bsnmCross, \bfnmA. J.\binitsA. J., \bauthor\bsnmGunter, \bfnmM.\binitsM., \bauthor\bsnmPfeiffer, \bfnmR.\binitsR., \bauthor\bsnmGail, \bfnmM.\binitsM., \bauthor\bsnmLim, \bfnmU.\binitsU. \AND\bauthor\bsnmDavey Smith, \bfnmG.\binitsG. (\byear2009). \btitleMendelian randomization: how it can – and cannot – help confirm causal relations between nutrition and cancer. \bjournalCancer Prevention Research \bvolume2 \bpages104–113. \bdoi10.1158/1940-6207.capr-08-0070 \endbibitem
• Shen and Zhan (2015) {barticle}[author] \bauthor\bsnmShen, \bfnmXia\binitsX. \AND\bauthor\bsnmZhan, \bfnmYiqiang\binitsY. (\byear2015). \btitleRe: The effect on melanoma risk of genes previously associated with telomere length. \bjournalJournal of the National Cancer Institute \bvolume107 \bpagesdjv237. \bdoi10.1093/jnci/djv237 \endbibitem
• Staiger and Stock (1997) {barticle}[author] \bauthor\bsnmStaiger, \bfnmD\binitsD. \AND\bauthor\bsnmStock, \bfnmJH\binitsJ. (\byear1997). \btitleInstrumental variables regression with weak instruments. \bjournalEconometrica \bvolume65 \bpages557–586. \endbibitem
• Stock, Wright and Yogo (2002) {barticle}[author] \bauthor\bsnmStock, \bfnmJH\binitsJ., \bauthor\bsnmWright, \bfnmJH\binitsJ. \AND\bauthor\bsnmYogo, \bfnmM\binitsM. (\byear2002). \btitleA survey of weak instruments and weak identification in generalized method of moments. \bjournalJournal of Business and Economic Statistics \bvolume20 \bpages518–529. \bdoi10.1198/073500102288618658 \endbibitem
• Stock and Yogo (2002) {barticle}[author] \bauthor\bsnmStock, \bfnmJH\binitsJ. \AND\bauthor\bsnmYogo, \bfnmM\binitsM. (\byear2002). \btitleTesting for weak instruments in linear IV regression. \bjournalSSRN eLibrary \bvolume11 \bpagesT0284. \endbibitem
• Swanson and Hernán (2013) {barticle}[author] \bauthor\bsnmSwanson, \bfnmSonja\binitsS. \AND\bauthor\bsnmHernán, \bfnmMiguel\binitsM. (\byear2013). \btitleCommentary: how to report instrumental variable analyses (suggestions welcome). \bjournalEpidemiology \bvolume24 \bpages370–374. \bdoi10.1097/ede.0b013e31828d0590 \endbibitem
• R Core Team (2014) {bmanual}[author] \bauthor\bsnmR Core Team (\byear2014). \btitleR: A Language and Environment for Statistical Computing. Version 3.1.2 (Pumpkin Helmet) \bpublisherR Foundation for Statistical Computing, \baddressVienna, Austria. \endbibitem
• Thomas, Lawlor and Thompson (2007) {barticle}[author] \bauthor\bsnmThomas, \bfnmD. C.\binitsD. C., \bauthor\bsnmLawlor, \bfnmD. A.\binitsD. A. \AND\bauthor\bsnmThompson, \bfnmJ. R.\binitsJ. R. (\byear2007). \btitleRe: Estimation of bias in nongenetic observational studies using “Mendelian triangulation” by Bautista et al. \bjournalAnnals of Epidemiology \bvolume17 \bpages511–513. \bdoi10.1016/j.annepidem.2006.12.005 \endbibitem
• Thompson and Sharp (1999) {barticle}[author] \bauthor\bsnmThompson, \bfnmS. G.\binitsS. G. \AND\bauthor\bsnmSharp, \bfnmS. J.\binitsS. J. (\byear1999). \btitleExplaining heterogeneity in meta-analysis: a comparison of methods. \bjournalStatistics in Medicine \bvolume18 \bpages2693–2708. \endbibitem
• VanderWeele and Hernán (2013) {barticle}[author] \bauthor\bsnmVanderWeele, \bfnmTJ\binitsT. \AND\bauthor\bsnmHernán, \bfnmMA\binitsM. (\byear2013). \btitleCausal inference under multiple versions of treatment. \bjournalJournal of Causal Inference \bvolume1 \bpages1–20. \bdoi10.1515/jci-2012-0002 \endbibitem
• VanderWeele et al. (2014) {barticle}[author] \bauthor\bsnmVanderWeele, \bfnmTyler\binitsT., \bauthor\bsnmTchetgen Tchetgen, \bfnmEric\binitsE., \bauthor\bsnmCornelis, \bfnmMarilyn\binitsM. \AND\bauthor\bsnmKraft, \bfnmPeter\binitsP. (\byear2014). \btitleMethodological challenges in Mendelian randomization. \bjournalEpidemiology \bvolume25 \bpages427–435. \bdoi10.1097/ede.0000000000000081 \endbibitem
• Zhang et al. (2015) {barticle}[author] \bauthor\bsnmZhang, \bfnmChenan\binitsC., \bauthor\bsnmDoherty, \bfnmJennifer A\binitsJ. A., \bauthor\bsnmBurgess, \bfnmStephen\binitsS., \bauthor\bsnmHung, \bfnmRayjean J\binitsR. J., \bauthor\bsnmLindström, \bfnmSara\binitsS., \bauthor\bsnmKraft, \bfnmPeter\binitsP., \bauthor\bsnmGong, \bfnmJian\binitsJ., \bauthor\bsnmAmos, \bfnmChristopher I\binitsC. I., \bauthor\bsnmSellers, \bfnmThomas A\binitsT. A., \bauthor\bsnmMonteiro, \bfnmAlvaro NA\binitsA. N. \betalet al. (\byear2015). \btitleGenetic determinants of telomere length and risk of common cancers: a Mendelian randomization study. \bjournalHuman Molecular Genetics. \bdoi10.1093/hmg/ddv252 \endbibitem

Appendix

a.1 Data for motivating example: causal effect of early menopause on triglycerides

Information on the genetic variants included in the motivating analysis are presented in Appendix Table A1: for each variant, we provide the rsid, nearest gene(s), effect allele, other allele, association with early menopause (expressed as number of years earlier menopause) and standard error, and association with triglycerides (in standard deviation units) and standard error. Associations are also displayed visually as a scatter plot in Appendix Figure A1. Associations with early menopause are obtained from Day et al. (Day et al., 2015); larger numbers indicate that individuals with copies of the effect allele have earlier menopause on average compared with carriers of the other allele. These association estimates are available from download as part of the Supplementary Material to Day et al. (Supplementary Table 3). Associations with triglycerides are obtained from the Global Lipids Genetics Consortium (The Global Lipids Genetics Consortium, 2013), and can be downloaded from http://csg.sph.umich.edu//abecasis/public/lipids2013/.

a.2 Code for implementing methods used in simulation study

Code for performing the methods used in the simulation study for the R software package is provided below:

alpx=NULL; alpxsd=NULL                    # genetic associations with risk factor and standard errors
alpy=NULL; alpysd=NULL                    # genetic associations with outcome and standard errors

for (j in 1:vars) {
alpx[j] = lm(x~g[,j])$coef[2] alpy[j] = lm(y~g[,j])$coef[2]
alpxsd[j] = summary(lm(x~g[,j]))$coef[2,2] alpysd[j] = summary(lm(y~g[,j]))$coef[2,2]
}

reg.first = summary(lm(alpy~alpx-1, weights=alpysd^-2))

betafirst.fixed  = reg.first$coef[1] # estimate using first-order weights, fixed-effect model betafirst.mulran = reg.first$coef[1]   # estimate using first-order weights, multiplicative random-effects
sefirst.fixed  = reg.first$coef[1,2]/reg.first$sigma
# standard error using first-order weights, fixed-effect model
sefirst.mulran = reg.first$coef[1,2]/min(reg.first$sigma,1)
betafirst.addran = metagen(alpy/alpx, abs(alpysd/alpx))$TE.random # estimate using first-order weights, additive random-effects model sefirst.addran = metagen(alpy/alpx, abs(alpysd/alpx))$seTE.random

reg.second = summary(lm(alpy~alpx-1, weights=(alpysd^2+alpy^2*alpxsd^2/alpx^2)^-1))

betasecond.fixed  = reg.second$coef[1] # estimate using second-order weights, fixed-effect model betasecond.mulran = reg.second$coef[1]
sesecond.fixed  = reg.second$coef[1,2]/reg.second$sigma
sesecond.mulran = reg.second$coef[1,2]/min(reg.second$sigma,1)
betasecond.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4))$TE.random sesecond.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4))$seTE.random

theta = 0.1                             # correlation term from equation (1)

reg.second.theta = summary(lm(alpy~alpx-1,
weights=(alpysd^2+alpy^2*alpxsd^2/alpx^2-2*theta*alpy*alpxsd*alpysd/alpx)^-1))

betasecond.theta.fixed  = reg.second.theta$coef[1] # estimate using second-order weights with correlation, fixed-effect model betasecond.theta.mulran = reg.second.theta$coef[1]
sesecond.theta.fixed  = reg.second.theta$coef[1,2]/reg.second.theta$sigma
sesecond.theta.mulran = reg.second.theta$coef[1,2]/min(reg.second.theta$sigma,1)
sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4-2*theta*alpy*alpxsd*alpysd/alpx^3))$TE.random sesecond.theta.addran = metagen(alpy/alpx, sqrt(alpysd^2/alpx^2+alpy^2*alpxsd^2/alpx^4-2*theta*alpy*alpxsd*alpysd/alpx^3))$seTE.random


Send three- and four-pence, we’re going to a dance.

a.3 Sensitivity analysis for value of θ in motivating example

As stated in Section 5.3, in this paper we have assumed that the correlation parameter in the second-order expression for the variance of a causal estimate from the delta method (equation 2) is zero. While computational and practical considerations (the length of the simulation study to run, and the difficulty in estimating the parameter using summarized data only) preclude an investigation into the impact of this term in the simulation study, we can conduct a sensitivity analysis to consider the impact of the value of on estimates from the motivating example.

We conduct inverse-variance weighted analyses using weights derived from equation (2) and fixed-effect, additive random-effects, and multiplicative random-effects models for . The causal estimates and 95% confidence intervals from each analysis are presented in Appendix Table A2. We see that the estimates and confidence intervals do not change substantially despite the wide range of values of considered.

The true value of should be zero if the associations with the risk factor and outcome are estimated in non-overlapping samples, and similar to the correlation between the risk factor and the outcome if the associations are estimated in the same individuals. With partial overlap, the value of will be between these two values.

a.4 Additional results from simulation study

Additional results from scenarios 1 and 2 are presented in Appendix Table A3, and from scenarios 3 and 4 in Appendix Table A4. For each value of the instrument strength, the (Monte Carlo) standard deviation and the mean standard error of estimates are presented. Using second-order weights, only results from the fixed-effect analyses are presented, as heterogeneity was not detected in the vast majority of datasets, and so results were the same up to 3 decimal places in almost all cases.

a.5 Additional simulation with directional pleiotropy

To provide some guidance as to the performance of the inverse-variance weighted method when there is directional pleiotropy, we perform a further simulation under this scenario. The parameters and scenarios are taken to be the same as those in the main body of the paper, except that rather than drawing the genetic effects on the risk factor () and the direct effects of the genetic variants on the outcome () from independent normal distributions as in Scenarios 2 and 4, we draw them from a bivariate normal distribution. The univariate distributions of these parameters are the same (the parameters have mean and variance ; the parameters have mean 0 and variance ), but the correlation between the distributions is set to 0.4.

 (αjβZj)∼N2((α0),(0.0220.4×0.0220.4×0.0220.022))

This correlation means that the direct effects of genetic variants on the outcome are greater for those variants that have stronger effects on the risk factor, and so for those variants that receive more weight in the analysis. Hence, although the overall mean pleiotropic effect has mean zero, pleiotropic effects of weak and strong instruments separately do not have mean zero. We refer to the one-sample setting with directional pleiotropy as Scenario 6, and the two-sample setting with directional pleiotropy as Scenario 7.

Results for the mean estimate and empirical power to detect a causal effect are given in Appendix Table A5. In the one-sample setting (scenario 6), there is bias in the direction of confounding in all cases. While Type 1 error rates under the null are inflated throughout, there is a clear preference for the use of second-order weights and random-effects models, as well as a slight preference for the additive random-effects model (based on slightly more conservative coverage properties with first-order weights). This mirrors the advice in the main paper. In the two-sample setting (scenario 7), bias under the null is in the positive direction, whereas bias under the alternative is towards the null. Type 1 error rates under the null with random-effects models are close to nominal levels, with conservative coverage for second-order weights, and slightly anti-conservative coverage for first-order weights. However, the advice from the main paper to use first-order weights in a two-sample setting would not lead to overly misleading inferences, as Type 1 error rates with first-order weights are close to the nominal 5% level. Power to detect a causal effect is greater using first-order weights in this case. Hence, on the basis of these simulations, the advice in the main body of the paper also holds with directional pleiotropy.

In practice, we repeat that estimates from the inverse-variance weighted method will typically be biased if the genetic variants are not valid instruments (and the example of directional pleiotropy considered here is far from extreme), and recommend the use of robust methods (such as the Egger method and median-based methods introduced in the discussion of the paper) as sensitivity analyses for applied Mendelian randomization investigations.