Improving the accuracy of likelihood-based inference in meta-analysis and meta-regression

Improving the accuracy of likelihood-based inference in meta-analysis and meta-regression

I. Kosmidis
Department of Statistical Science
University College London
London, WC1E 6BT, U.K

A. Guolo
Department of Statistical Sciences
University of Padova
Via Cesare Battisti, 241, 35121 Padova, Italy

C. Varin
Department of Environmental Sciences, Informatics and Statistics,
Ca’ Foscari University of Venice
Via Torino 150, 30170 Venezia Mestre, Italy

Random-effects models are frequently used to synthesise information from different studies in meta-analysis. While likelihood-based inference is attractive both in terms of limiting properties and of implementation, its application in random-effects meta-analysis may result in misleading conclusions, especially when the number of studies is small to moderate. The current paper shows how methodology that reduces the asymptotic bias of the maximum likelihood estimator of the variance component can also substantially improve inference about the mean effect size. The results are derived for the more general framework of random-effects meta-regression, which allows the mean effect size to vary with study-specific covariates.

Keywords: Bias reduction; Heterogeneity; Meta-analysis; Penalized likelihood; Random effect; Restricted maximum likelihood



1 Introduction

Meta-analysis is a widely applicable approach to combining information from different studies about a common effect of interest. A popular framework for accounting for the heterogeneity between studies is the random-effects specification in DerSimonian & Laird (1986). There is ample evidence that frequentist inference for this specification can result in misleading conclusions, especially if inference is carried out by relying on first-order asymptotic arguments in the common setting of small or moderate number of studies (e.g., van Houwelingen et al., 2002; Guolo & Varin, 2015). The same considerations apply to the random-effects meta-regression model, which is a direct extension of random-effects meta-analysis allowing for study-specific covariates. Proposals presented to account for the finite number of studies include modification of the limiting distribution of test statistics (Knapp & Hartung, 2003), restricted maximum likelihood (Viechtbauer, 2005) and second-order asymptotics (Guolo, 2012). Recently, Zeng & Lin (2015) suggested a double resampling approach that outperforms several alternatives in terms of empirical coverage probability of confidence intervals for the mean effect size.

The current paper studies the extent of the bias of the maximum likelihood estimator of the random-effect variance and introduces a bias-reducing penalized likelihood that yields a substantial improvement in the estimation of the random-effect variance. The bias-reducing penalized likelihood is related to the approximate conditional likelihood of Cox & Reid (1987) and the restricted maximum likelihood for inference about the random-effects variance. The order of the penalty function allows the derivation of a approximation of the distribution of the logarithm of the penalized likelihood ratio statistic, which can be used for inference about the fixed-effect parameters. Real-data examples and two simulation studies illustrate the improvement in finite-sample performance against alternatives from the recent literature.

2 Random-effects meta-regression and meta-analysis

Suppose there are studies about a common effect of interest, each of them providing pairs of summary measures , where is the study-specific estimate of the effect, and is the associated estimation variance . In some situations, the pairs may be accompanied by study-specific covariates , which describe the heterogeneity across studies. In the meta-analysis literature, it is usually assumed that the within-study variances are estimated well enough to be considered as known and equal to the values reported in each study. Under this assumption, the random-effects meta-regression model postulates that are realizations of random variables , respectively, which are independent conditionally on independent random effects , and the conditional distribution of given is , where is an unknown -vector of effects. The random effect is typically assumed to be distributed according to , where accounts for the between-study heterogeneity.

In matrix notation, and conditionally on , the random-effects meta-regression model is


where , is the model matrix of dimension with in its th row, and is a vector of independent errors each with a distribution. Under this specification, the marginal distribution of is multivariate normal with mean and variance , where is the identity matrix and . The random-effects meta-analysis model is a meta-regression model where is a column of ones.

The random-effects meta-regression model is used here as a working model for theoretical development. In light of the recent criticisms of the assumption of known within-study variances (see, for example Hoaglin, 2015), § 4 and the Supplementary Material illustrate the good performance of the derived procedures under more realistic scenarios, where the estimation variances are directly related to the estimates of the summary measure.

The parameter is naturally estimated by weighted least squares as


with . Then, inference about can be based on the fact that under model (1), has an asymptotic normal distribution with mean and variance . In this case, the reliability of the associated inferential procedures critically depends on the availability of an accurate estimate of the between-study variance . A popular choice is the DerSimonian & Laird (1986) estimator , where is the Cochran statistic, with and . Viechtbauer (2005) presents evidence of the loss of efficiency of , which can impact inference; see also Guolo (2012).

Inference about can alternatively be based on the likelihood function. The log-likelihood function for in model (1) is


where denotes the determinant of and . A calculation of the gradient of shows that the maximum likelihood estimator for results from solving the equations


where denotes a -dimensional vector of zeros, and and , so that . As observed in Guolo (2012) and Zeng & Lin (2015), inferential procedures that rely on first-order approximations of the log-likelihood, e.g., likelihood-ratio and Wald statistics, perform poorly when the number of studies is small to moderate.

3 Bias reduction

3.1 Bias-reducing penalized likelihood

From the results in Kosmidis & Firth (2009, 2010), the first term in the expansion of the bias function of the maximum likelihood estimator is found to be , where


with . A sketch derivation for (5) is given in the Appendix. In what follows is called the first-order bias.

The non-zero entries of and the diagonal entries of are all necessarily positive, so the maximum likelihood estimator of is subject to downward bias, which, as also noted in Viechtbauer (2005), affects inference about , by over-estimating the non-zero entries of , and hence over-estimating the information matrix


This over-estimation of can result in hypothesis tests with large Type I error and confidence intervals or regions with actual coverage appreciably lower than the nominal level.

An estimator that corrects for the first-order bias of results from solving the adjusted score equations (Firth, 1993; Kosmidis & Firth, 2009). Substituting (4), (5) and (6) in the expression for gives that the adjusted score functions for and are and


respectively. The expression for the differential of the log-determinant can be used to show that and are the derivatives of the penalized log-likelihood function


where is as in (3), is the -block of the information matrix , and denotes determinant, so the solution of the adjusted score equations is the maximum penalized likelihood estimator .

For , expression (8) reduces to both the logarithm of the approximate conditional likelihood of Cox & Reid (1987) for inference about , when is treated as a nuisance component, and to the restricted log-likelihood function of Harville (1977). Hence, maximising the bias-reducing penalized log-likelihood (8) is equivalent to calculating the maximum restricted likelihood estimator for . The latter estimator was originally constructed to reduce underestimation of variance components in finite samples as a consequence of failing to account for the degrees of freedom that are involved in the estimation of the fixed effects . Smyth & Verbyla (1996) and Stern & Welsh (2000) have shown the equivalence of the restricted log-likelihood with approximate conditional likelihood in the more general context of inference about variance components in normal linear mixed models.

3.2 Estimation

Given a starting value for , the following iterative process has a stationary point that maximizes (8). At the th iteration , a new candidate value for is obtained as the weighted least squares estimator (2) at ; a candidate value for is then computed by a line search for solving the adjusted score equation (7) evaluated at . The iteration is repeated until either the candidate values do not change across iterations or the adjusted score functions are sufficiently close to zero.

3.3 Penalized likelihood inference

The profile penalized likelihood function can be used to construct confidence intervals and regions, and carry out hypothesis tests for . If , and and are the estimators of and , respectively, from maximising (8) for fixed , then the logarithm of the penalized likelihood ratio statistic has the usual limiting distribution, where . To derive this limiting result, note that the adjustment to the scores in (4) is additive and , so the extra terms depending on it and its derivatives in the asymptotic expansion of the penalized likelihood disappear as information increases.

The impact of using the penalized likelihood for estimation and inference in random-effects meta-analysis and meta-regression is more profound for a small to moderate number of studies. As the number of studies increases, the log-likelihood derivatives dominate the bias-reducing adjustment in (7) in terms of asymptotic order. As a result, inference based on the penalized likelihood becomes indistinguishable from likelihood inference.

In § 4, the performance of penalized likelihood inference is compared with that of alternative methods under the random-effects meta-analysis model (1), and under a more realistic model for individual-within-study data.

4 Simulation studies

4.1 Random-effects meta-analysis

The simulation studies under the random-effects meta-analysis model (1) are performed using the design in Brockwell & Gordon (2001). Specifically, the study-specific effects are simulated from the random-effect meta-analysis with true effect and variance , where are independently generated from a distribution multiplied by and then restricted to the interval . The between-study variance ranges from to and the number of studies from to . For each combination of and considered, data sets are simulated using the same initial state for the random number generator.

Figure 1: Empirical coverage probabilities of two-sided confidence intervals for for increasing , when (a) and (b) , and for increasing (in log scale) when (c) and (d) . The curves correspond to profile penalized likelihood (solid), DerSimonian & Laird method (dashed), Zeng & Lin double resampling (dotted; available only for ), and Skovgaard’s statistic (dotted-dashed). The grey horizontal line is the target 95% nominal level.

Zeng & Lin (2015, Section 5) show that their double resampling approach outperforms several existing methods in terms of the empirical coverage probabilities of confidence intervals for at nominal level 95%. The methods considered in Zeng & Lin (2015) include profile likelihood (Hardy & Thompson, 1996), modified DerSimonian & Laird (see Sidik & Jonkman, 2002; Knapp & Hartung, 2003; Copas, 2003), quantile approximation (Jackson & Bowden, 2009) and the approach described in Henmi & Copas (2010). The present simulation study takes advantage of these previous simulation results, and Figure 1 compares the performance of double resampling with that of the profile penalized likelihood. In order to avoid long computing times, empirical coverage for double-resampling has been calculated only for .

The profile penalized likelihood confidence interval has empirical coverage that is appreciably closer to the nominal level than double resampling.

Figure 1 also includes results for two alternative confidence intervals. The first uses the classical DerSimonian & Laird estimator and its estimated variance . Not surprisingly, the empirical coverage of this confidence interval is grossly smaller than the nominal confidence level. The second interval is used for reference and results from the numerical inversion of Skovgaard’s statistic, which is designed to produce second-order accurate p-values for tests on the mean effect size (Guolo, 2012; Guolo & Varin, 2012). The profile penalized likelihood interval has comparable performance to that based on Skovgaard’s statistic, with the latter having empirical coverage slightly closer to the nominal level for a wider range of values for . In general, though, the numerical inversion of Skovgaard’s statistic can be unstable due to the discontinuity of the statistic around the maximum likelihood estimator. In contrast, the calculation of profile penalized likelihood intervals is not prone to such instabilities. The penalized likelihood also results in a bias-reduced estimator of , whose reliable estimation is often of interest in medical studies (Veroniki et al., 2016).

4.2 Standardized mean differences from two-arm studies

Figure 2: Empirical coverage probabilities of two-sided confidence intervals for for increasing , when (a) and (b) , and for increasing (in log scale) when (c) and (d) . The curves correspond to profile penalized likelihood (solid), DerSimonian & Laird method (dashed), Zeng & Lin double resampling (dotted), and Skovgaard’s statistic (dotted-dashed). The grey horizontal line is the target 95% nominal level.

The profile penalized likelihood and all other methods in Figure 1 have been developed under the validity of the random-effects meta-analysis model. This assumption may be unrealistic, especially in settings where the estimation variances are directly related to the summary measure (Hoaglin, 2015). Here, we examine the performance of the methods under an alternative specification of the data generating process, where the study-specific effects and their variances are calculated by simulating individual-within-study data. Specifically, we assume that the th study consists of two arms with individuals each, and that are independent uniform draws from the integers . Then, conditionally on a random effect , we assume that the observation for the th individual in the th arm is the realisation of a random variable, where and . The difference between the marginal variances of the arms increases with . The true effects are set to , and . The study-specific effect of interest is estimated using the standardized mean difference , where is the pooled variance from the two arms of the th study, and is the Hedges correction (see, e.g., Borenstein, 2009, Chapter 4). The corresponding estimated variance for is , which is a quadratic function of . The between-study variance ranges from to and the number of studies from to . For each combination of and considered, data sets are simulated using the same initial state for the random number generator.

Figure 2 shows the empirical coverage of the confidence intervals for based on the methods that were examined in Figure 1. Empirical coverage for double-resampling has again been calculated only for . The good performance of the profile penalized likelihood interval and the interval based on the Skovgaard’s statistic persists for small and moderate number of studies, even under the alternative data generating process. The performance of the intervals based on the DerSimonian & Laird estimator and double resampling is, again, poor.

Figure 2 also illustrates the effect of increasing the number of studies under the alternative specification of the data generating process. As the number of studies increases, the inadequacy of the assumptions of the working random-effects meta-regression model becomes more notable. Model mis-specification will eventually result in loss of coverage for all methods examined here, including the intervals based on profile penalized likelihood and the Skovgaard’s statistic.

The Supplementary Material provides the full results from this study and two other simulation studies, where the summary measures are log-odds-ratios from a case-control study.

5 Case study: meat consumption data

Larsson & Orsini (2014) investigate the association between meat consumption and relative risk of all-cause mortality. The data include prospective studies, eight of which are about unprocessed red meat consumption and eight about processed meat consumption. We consider meta-regression with a covariate taking value for processed red mean and 0 for unprocessed. The DerSimonian & Laird estimate of is , the maximum likelihood estimate is and the maximum penalized likelihood estimate is the largest with . The estimates of are , and , where the first element in each vector corresponds to the intercept and the second to meat consumption.

The DerSimonian & Laird method indicates some evidence for a higher risk associated to the consumption of red processed meat with a p-value of . In contrast, the penalized likelihood ratio and Skovgaard’s statistic suggest that there is rather weak evidence for higher risk, with p-values of and , respectively.

The Supplementary Material contains a simulation study under the maximum likelihood fit that illustrates that the maximum likelihood estimator of is negatively biased. The other estimators almost fully compensate for that bias, but is appreciably more efficient than . The simulation study therein is also used to illustrate the good performance of the penalized likelihood ratio test in terms of size.

6 Supplementary material

The Supplementary Material provides R (R Core Team, 2016) code to reproduce the case study in § 5, and another analysis. The full results of the simulations are provided including the performance of confidence intervals based on alternative methods.


This work was partially supported by grants ‘IRIDE’, Ca’ Foscari University of Venice and ‘Progetti di Ricerca di Ateneo 2015’, University of Padova. The authors thank David Firth for discussions and feedback on an early version of the paper. The authors are also grateful to two anonymous Referees, the Associate Editor and the Editor, for helpful comments and suggestions, and thankful to Sophia Kyriakou for bringing to their attention some inconsequential typos that have been fixed in the current version. In particular, expression (3) has instead of , and the middle expressions in the equations for and in (4) and (7), respectively, have been multiplied by a factor of .

Appendix A: derivation of the expression for the first-order bias

The first-order bias of has the form (Kosmidis & Firth, 2010), where has components

There, and .

The model assumptions imply that is if is odd and if is even, where , and denotes the double factorial of . Direct matrix calculations give

where is the zero matrix. So, for and

Inserting the expressions for the components of into the expression for gives , where is as in (5).


  • Borenstein (2009) Borenstein, M., Hedges, L. V., Higgins, J. P. T. & Rothstein, H. R. (2009). Introduction to Meta-Analysis. Chichester, West Sussex: Wiley.
  • Brockwell & Gordon (2001) Brockwell, S. E. & Gordon, I. N. (2001). A comparison of statistical methods for meta-analysis. Stat. Med. 20, 825–40.
  • Copas (2003) Copas, J. (2003) Letters to the editor: a simple confidence interval for meta-analysis. Sidik K., Jonkman J. N. (2002) Statistics in Medicine 21, 3153–3159. Stat. Med. 22, 2667–68.
  • Cox & Reid (1987) Cox, D. R. & Reid, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. R. Statist. Soc. B 49, 1–39.
  • DerSimonian & Laird (1986) DerSimonian, R. & Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7, 177–88.
  • Firth (1993) Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38.
  • Guolo (2012) Guolo, A. (2012). Higher-order likelihood inference in meta-analysis and meta-regression. Stat. Med. 31, 313–27.
  • Guolo & Varin (2012) Guolo, A. & Varin, C. (2012). The R package metaLik for likelihood inference in meta-analysis. J. Stat. Softw. 50, 1–14.
  • Guolo & Varin (2015) Guolo, A. & Varin, C. (2015). Random-effects meta-analysis: The number of studies matters. Stat. Methods Med. Res., to appear.
  • Hardy & Thompson (1996) Hardy, R. J. & Thompson, S. G. (1996). A likelihood approach to meta-analysis with random effects. Stat. Med. 15, 619–29.
  • Harville (1977) Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. J. Amer. Stat. Ass. 72, 320–38.
  • Henmi & Copas (2010) Henmi, M. & Copas, J. B. (2010). Confidence intervals for random effects meta-analysis and robustness to publication bias. Stat. Med. 29, 2969–83.
  • Hoaglin (2015) Hoaglin, D. C. (2015). We know less than we should about methods of meta-analysis. Res. Syn. Meth. 6, 287–9.
  • van Houwelingen et al. (2002) van Houwelingen, H. C., Arends, L. R. & Stijnen, T. (2002). Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat. Med. 21, 589–624.
  • Jackson & Bowden (2009) Jackson, D. & Bowden, J. (2009). A re-evaluation of the ‘quantile approximation method’ for random effects meta-analysis. Stat. Med. 28, 338–48.
  • Knapp & Hartung (2003) Knapp, G. & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Stat. Med. 22, 2693–710.
  • Kosmidis & Firth (2009) Kosmidis, I. & Firth, D. (2009). Bias reduction in exponential family nonlinear models. Biometrika 96, 793–804.
  • Kosmidis & Firth (2010) Kosmidis, I. & Firth, D. (2010). A generic algorithm for reducing bias in parametric estimation. Electron. J. Stat. 4, 1097–112.
  • Larsson & Orsini (2014) Larsson, S. C. & Orsini, N. (2014) Red meat and processed meat consumption and all-cause mortality: A meta-analysis. Am. J. Epidemiol. 179, 282–89.
  • R Core Team (2016) R Core Team (2016). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
  • Smyth & Verbyla (1996) Smyth, G. K. & Verbyla, A. P. (1996). A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models. J. R. Statist. Soc. B 58, 565–72.
  • Stern & Welsh (2000) Stern, S. E. & Welsh, A. H. (2000). Likelihood inference for small variance components. Can. J. Stat. 83, 319–30.
  • Sidik & Jonkman (2002) Sidik, K. & Jonkman, J. N. (2002). A simple confidence interval for meta-analysis. Stat. Med. 21, 3153–9.
  • Veroniki et al. (2016) Veroniki, A. A., Jackson, D., Viechtbauer, W., Bender, R., Bowden, J., Knapp, G., Kuss, O., Higgins, J. P. T., Langan, D., and Salanti, G. (2016) Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res. Syn. Meth. 7, 55–79.
  • Viechtbauer (2005) Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. J. Educ. Behav. Stat. 30, 261–93.
  • Zeng & Lin (2015) Zeng, D. & Lin, D. Y. (2015). On random-effects meta-analysis. Biometrika 102, 281–94.

See pages - of examples.pdf

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description