On composite likelihood in bivariate meta-analysis of diagnostic test accuracy studies
The composite likelihood (CL) is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. Its advantage is that the likelihood can be derived conveniently under the assumption of independence between the random effects, but there has not been a clear analysis of the merit or necessity of this method.
For synthesis of diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature.
This general model includes the GLMM
as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. A maximum likelihood (ML) method, which is based on evaluating the bi-dimensional integrals of the likelihood with quadrature methods has been proposed, and in fact it eases any computational difficulty that might be caused by the double integral in the likelihood function.
Both methods are thoroughly examined with extensive simulations and illustrated with data of a published meta-analysis. It is shown that the ML method has no non-convergence issues or computational difficulties and at the same time allows estimation of the dependence between study-specific sensitivity and specificity and thus prediction via summary receiver operating curves.
Keywords: Copula mixed model; diagnostic odds ratio; generalized linear mixed model; sensitivity/specificity; SROC.
2pt plus 4pt minus 2pt2pt plus 2pt minus 2pt
Synthesis of diagnostic test accuracy studies is the most common medical application of multivariate meta-analysis; we refer the interested reader to the surveys by Jackson et al. (2011); Mavridis and Salanti (2013); Ma et al. (2016). These data have two important properties. The first is that the estimated sensitivities and specificities are typically negatively associated across studies, because studies that adopt less stringent criterion for declaring a test positive invoke higher sensitivities and lower specificities (Jackson et al., 2011). The second important property of the data is the substantial between-study heterogeneity in sensitivities and specificities (Chu et al., 2012).
Nikoloulopoulos (2015a), to deal with the aforementioned properties, proposed a copula mixed model for bivariate meta-analysis of diagnostic test accuracy studies and made the argument for moving to copula random effects models. This general model includes the generalized linear mixed model (Chu and Cole, 2006; Arends et al., 2008) as a special case and can also operate on the original scale of sensitivity and specificity.
Chen et al. (2014, 2016b) proposed a composite likelihood (CL) method for estimation of the the generalized linear mixed model (hereafter GLMM) and the Sarmanov beta-binomial model (Chu et al., 2012). Note in passing that both models are special cases of a copula mixed model (Nikoloulopoulos, 2015a). The composite likelihood can be derived conveniently under the assumption of independence between the random effects. The CL method has been recommended by Chen et al. (2014, 2016b) to overcome practical ‘issues’ in the joint likelihood inference such as computational difficulty caused by a double integral in the joint likelihood function, and restriction to bivariate normality.
GLMM can only be unstable if there are too many parameters in the covariance matrix of the random effects or too many random effects for a small sample, which is not the case in this application domain. Furthermore, Chen et al. (2014, 2016b) restrict themselves to SAS PROC NLMIXED which is a general routine for random effect models and thus gives limited capacity. The CL method is well established as a surrogate alternative of maximum likelihood when the joint likelihood is too difficult to compute (Varin et al., 2011), which is apparently not the case in the synthesis of diagnostic test accuracy studies. The general model in Nikoloulopoulos (2015a) includes the GLMM as a special case and its numerical evaluation has been implemented in the package CopulaREMADA (Nikoloulopoulos, 2016) within the open source statistical environment R (R Core Team, 2015).
The random effects distribution of a copula mixed model can be expressed via other copulas (other than the bivariate normal) that allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence.
The contribution of this paper is to examine the merit of the CL method in the context of diagnostic test accuracy studies and compare it to the ML method in Nikoloulopoulos (2015a). The remainder of the paper proceeds as follows. Section 2 summarizes the copula mixed model for diagnostic test accuracy studies. Section 3 discusses both maximum and composite likelihood for estimation of the model parameters. Section 4 contains small-sample efficiency calculations to compare the two methods. Section 5 presents applications of the likelihood estimation methods to several data frames with diagnostic studies. We conclude with some discussion in Section 6.
2 The copula mixed model
We first introduce the notation used in this paper. The focus is on two-level (within-study and between-studies) cluster data. The data are are , where is an index for the within study measurements and is an index for the individual studies. The data, for study , can be summarized in a table with the number of true positives (), true negatives (), false negatives (), and false positives ().
The within-study model assumes that the number of true positives and true negatives are conditionally independent and binomially distributed given , where denotes the bivariate latent pair of (transformed) sensitivity and specificity. That is
where is a link function.
The stochastic representation of the between studies model takes the form
where is a parametric family of copulas with dependence parameter and is the cdf of the univariate distribution of the random effect. The copula parameter is a parameter of the random effects model and it is separated from the univariate parameters, the univariate parameters and are the meta-analytic parameters for the sensitivity and specificity, and and express the between-study variabilities. The models in (2) and (2) together specify a copula mixed model (Nikoloulopoulos, 2015a) with joint likelihood
where is the copula density and is the binomial probability mass function (pmf). The choices of the and are given in Table 1.
|logit, probit, cloglog|
3 Estimation methods
3.1 Maximum likelihood method
Estimation of the model parameters can be approached by the standard maximum likelihood (ML) method, by maximizing the logarithm of the joint likelihood in (3). For mixed models of the form with joint likelihood as in (3), numerical evaluation of the joint pmf is easily done with the following steps (Nikoloulopoulos, 2015a):
Convert from independent uniform random variables and to dependent uniform random variables and that have distribution . The inverse of the conditional distribution corresponding to the copula is used to achieve this.
Numerically evaluate the joint pmf
in a double sum:
The inverse conditional copula cdfs are given in Table 2 for the sufficient list of parametric families of copulas for meta-analysis of diagnostic test accuracy studies in Nikoloulopoulos (2015a, b). Since the copula parameter of each family has different range, in the sequel we re-parametrize them via their Kendallâs ; that is comparable across families.
|Clayton by 90|
|Clayton by 180|
|Clayton by 270|
3.2 Composite likelihood method
The composite likelihood method assumes independence between the random effects. Hence, it is identical for any copula mixed model, since all the parametric families of copulas in Table 2 contain the independence copula as a special case. This subsection summarizes the composite likelihood estimating equations and the asymptotic covariance matrix for the estimator that solves them in the context of diagnostic test accuracy studies.
Composite likelihood estimator
Chen et al. (2014) and Chen et al. (2016b) proposed the composite likelihood method for estimation of the copula mixed model with normal and beta margins, respectively. Composite likelihood is a surrogate likelihood which leads asymptotically to unbiased estimating equations obtained by the derivatives of the composite log-likelihoods. Estimation of the model parameters can be approached by solving the marginal estimating equations or equivalently by maximizing the sum of composite (univariate) likelihoods.
By using composite likelihood the authors are assuming between-study independence in sensitivities and specificities and thus the joint likelihood in (3) reduces to:
where , since under the independence assumption the copula density is equal to 1. Note that the joint likelihood reduces to the product of two univariate likelihoods and the evaluation of univariate integrals, thus the computational effort (if any) is subsided. Essentially, for beta margins the univariate likelihoods result in a closed form since
is the pmf of a Beta-Binomial() distribution.
Composite likelihood estimates can be obtained by maximizing the logarithm of the joint likelihood in (4) over the univariate parameters. The efficiency of the composite likelihood estimates has been studied and shown in a series of a papers (Varin, 2008; Varin et al., 2011). However, CL ignores the dependence at the estimation of the univariate marginal parameters, thus it is expected to be worse as the dependence increases.
Asymptotic covariance matrix–Inverse Godambe
Let . The asymptotic covariance matrix for the CL estimator , also known as the inverse Godambe information matrix (Godambe, 1991), is
4 Small- and moderate-sample efficiency–misspecification of the univariate distribution of the random effect
In this section an extensive simulation study with two different scenarios is conducted (a) to assess the performance of the CL and ML methods, and (b) to investigate in detail the effect of the misspecification of the parametric margin of the random effects distribution. The CL method assumes the independence copula and its focus is on marginal parameters and apparently not the choice of the copula. Hence in the simulations we only investigate the effect of the misspecification of the parametric margin of the random effects distribution. We refer the interested reader to Nikoloulopoulos (2015a) for a study on the misspecification of the parametric family of copulas of the random effects distribution.
Simulate the study size from a shifted gamma distribution, i.e., and round off to the nearest integer.
Simulate from a parametric family of copulas ; is converted to the dependence parameter via the relations in Table 2.
Convert to beta or normal realizations via for .
Draw the number of diseased from a distribution.
Set and generate from a for .
In the first scenario the simulated data are generated from the BVN copula mixed model with normal margins, logit link (the resulting model is the same with the GLMM) and true marginal parameters , while in the second scenario the simulated data are generated from the BVN copula mixed model with beta margins and true marginal parameters . The number of studies is set to and to represent a relatively small and moderate meta-analysis, and the Kendall’s association between study-specific sensitivity and specificity is set to and to represent moderate and strong negative dependence.
The data are generated from the BVN copula mixed model with normal margins (GLMM) with true marginal parameters for different number of studies and different values of Kendall’s association between study-specific sensitivity and specificity. All entries are multiplied by 100.
The data are generated from the BVN copula mixed model with beta margins with true marginal parameters for different number of studies and different values of Kendall’s association between study-specific sensitivity and specificity. All entries are multiplied by 100.
As stated in Chen et al. (2014, 2016b) one advantage of the CL method is that the problem of non-convergence is avoided, so we also report on the non-convergence of different methods in Table 3. To summarize the simulated data, we report the resultant biases, root mean square errors (RMSE), and standard deviations (SD), along with average theoretical variances for the ML and CL estimates of the univariate parameters under different marginal choices based on iterations in which all four competing approaches converged in Table 4 and Table 5. Following Chen et al. (2016b) we also summarize the diagnostic odds ratio, that is dOR. Clearly, this is a function of the univariate parameters; its value ranges from zero to infinity, with a higher value indicating better discriminatory power.
Conclusions from the values in the table are the following:
The CL method is nearly as efficient as the ‘gold standard’ ML method.
The meta-analytic ML and CL estimates and SDs are not robust to the margin misspecification.
The ML method has negligible non-convergence issues.
The CL method in Chen et al. (2014) has a non-convergence rate between 10% to 16%.
The CL method in Chen et al. (2016b) has no non-convergence issues at all as expected since the has a closed form.
The simulation results indicate that for both methods the effect of misspecifying the marginal choice can be seen as substantial for both the univariate parameters and the parameters that are functions of them, such as the dOR. This is in line with Nikoloulopoulos (2015a, b) for the ML method. Here we also show that the CL method is not robust to the misspecification of the margin. This agrees with the conclusions of Xu and Reid (2011) and Ogden (2016) who argue that if the marginal distribution of the random effects is misspecified then the CL estimator no longer retains robustness. This has not been revealed before, since Chen et al. (2014) (Chen et al., 2016b) focused solely on a beta (normal) margin and didn’t study the effect of misspecification of the marginal random effect distribution. The focus in the CL method is on marginal parameters and their functions (e.g., dOR). Since these are univariate inferences, all that matters, as regard as to the bias, is the univariate model.
5 Tumor markers for bladder cancer
In this section we illustrate the methods with data of the published meta-analyses in Glas et al. (2003b); also analysed in Chen et al. (2016b). This meta-analyses deal with the most common urological cancer, that is bladder cancer. Several diagnostic markers are assessed including the cytology () which is the classical marker for detecting bladder cancer since 1945 and is not expensive compared with the reference standard (that is cystoscopy procedure), but lacks the diagnostic sensitivity. The other markers under investigation to give a better sensitivity are NMP22 (), BTA (), BTASTAT(), telomerase (), and BTATRAK ().
For all the meta-analyses, we fit the copula mixed model for all different choices of parametric families of copulas and margins. Sufficient choices of copulas are BVN, Frank, Clayton, and the rotated versions of the latter (Table 2). These families have different strengths of tail behaviour; for more details see Nikoloulopoulos (2015a, b). We use the log-likelihood at estimates as a rough diagnostic measure for goodness of fit between the models and summarize the choice of the copula and margin with the largest log-likelihood, along with the GLMM (BVN copula mixed model with normal margins) as a benchmark. We also estimate the model parameters with the CL method under the assumption of both normal (CL-norm) and beta (CL-beta) margins. In Table 6 we report the resulting maximized ML and CL log-likelihoods, estimates, and standard errors.
Cln-norm and Cln-beta denotes a Clayton rotated by degrees copula mixed with normal and beta margins, respectively.
The CL-norm estimate of the between study variance was approximately zero, thus for this case the standard errors are unreliable as the between-study variance parameter estimate is on the boundary of the parameter space.
The log-likelihoods show that a copula mixed model with rotated by 270 degrees Clayton copula and beta margins provides the best fit and the estimates of sensitivity and specificity are smaller under this assumption. The CL method performs well since the estimated is weak and not significantly different from zero.
The log-likelihoods show that a copula mixed model with rotated by 180 degrees Clayton copula and normal margins provides the best fit. Chen et al. (2016b) previously restricted to beta margins thus the sensitivity and dOR were overestimated (CL-beta).
The log-likelihoods show that a BVN copula mixed model with beta margins provides the best fit and the estimates of specificity and dOR are smaller and larger, respectively, under this assumption.
Nikoloulopoulos (2015a) has previously analysed these data to illustrate the copula mixed model when there exists negative perfect dependence, and thus there is only one copula: the countermonotonic copula. This is a limiting case for all the parametric families of copulas, when the dependence parameter is fixed to the left boundary of its parameter space. Both models agree on the estimated sensitivity but the estimate of specificity is larger under the standard GLMM. The log-likelihood is for normal margins and for beta margins, and thus a normal margin seems to be a better fit for the data. In this example the CL method overestimates the dOR, since it ignores the perfect negative dependence at the estimation of the parameters.
The log-likelihoods show that a copula mixed model with rotated by 270 degrees Clayton copula and beta margins provides the best fit. Note that the CL-norm estimate of the between study variance was approximately zero, thus for this case the standard errors are unreliable as the between-study variance parameter estimate is on the boundary of the parameter space.
The log-likelihoods show that a copula mixed model with rotated by 90 degrees Clayton copula and beta margins provides the best fit. All models agree on the estimated sensitivity , but the estimated specificity and dOR are smaller when beta margins are assumed. The CL method performs well on the estimation of the univariate parameters and their functions since the estimated is weak and not significantly different from zero.
In this paper we have challenged claims made in Chen et al. (2014, 2016b) about the advantages of using a composite likelihood in meta-analysis of diagnostic test accuracy studies, in terms of convergence and robustness to model misspecification. The usual reason for using a composite likelihood does not apply here, because the full likelihood is straightforward to compute. We have demonstrated that the copula mixed model in Nikoloulopoulos (2015a) does not suffer for computational problems or convergence issues. Nikoloulopoulos (2015a) proposed a numerically stable ML estimation technique based on Gauss-Legendre quadrature; the crucial step is to convert from independent to dependent quadrature points. Furthermore, it has been shown the secondary motivation of robustness of the CL method is not retained in this context if the marginal distributions are misspecified. Hence it is a digression to use the CL methods in Chen et al. (2014, 2016b) for estimation in meta-analysis of diagnostic test accuracy studies as apparently there is neither computationally difficulty in the calculation of the bivariate log-likelihood nor robustness in the misspecification of the marginal distribution of the random effects. These conclusions hold to any context where clinical trials or observational studies report more than a single binary outcome.
Furtermore, in Chen et al. (2014, 2016b) the main inference is univariate such as the overall sensitivity or specificity or their functions as a single measure of diagnostic accuracy, e.g., the diagnostic odds ratio (dOR). The dOR for many cases is not useful since cannot distinguish the ability to detect individuals with disease from the ability to identify healthy individuals (Chen et al., 2014). Whenever the balance between false negative and false positive rates is of immediate importance, both the prevalence and the conditional error rates of the test have to be taken into consideration to make a balanced decision; hence the dOR is less useful, as it does not distinguish between the two types of diagnostic mistake (Glas et al., 2003a).
In fact, if the interest is only to overall sensitivity, and specificity, then the overall test accuracy across studies will not be clearly defined. Different studies use different thresholds for a positive test result, thus the overall sensitivity and specificity do not make sense. Instead, some form of the summary receiver operating characteristic (SROC) curve makes much more sense and will help decision makers to assess the actual diagnostic accuracy of a diagnostic test. In an era of evidence-based medicine, decision makers need high-quality procedures such as the SROC curves to support decisions about whether or not to use a diagnostic test in a specific clinical situation and, if so, which test.
An SROC curve is deduced for the copula mixed model in Nikoloulopoulos (2015a) through a median regression curve of on . For the copula mixed model, the model parameters (including dependence parameters), the choice of the copula, and the choice of the margin affect the shape of the SROC curve (Nikoloulopoulos, 2015a). However, there is no priori reason to regress on instead of the other way around, so Nikoloulopoulos (2015a) also provides a median regression curve of on . Apparently, while there is a unique definition of the ROC curve within a study with fixed accuracy, there is no unique definition of SROC curve across multiple studies with different accuracies (Rücker and Schumacher, 2010). As Arends et al. (2008) have pointed out, none of the SROC curves proposed in the literature can be interpreted as an average ROC. Rücker and Schumacher (2009) stated that instead of summarizing data using an SROC, it might be preferable to give confidence regions. Hence, in addition to using just median regression curves, Nikoloulopoulos (2015a) proposed quantile regression curves with a focus on high ( = 0.99) and low quantiles ( = 0.01), which are strongly associated with the upper and lower tail dependence imposed from each parametric family of copulas. These can been seen as confidence regions of the median regression SROC curve. Among the parametric families of copulas in Table 2 the tail dependence varies, and is a property to consider when choosing amongst different families of copulas as affects the shape of SROC curves (Nikoloulopoulos, 2015a). Finally, Nikoloulopoulos (2015a) to reserve the nature of a bivariate response instead of a univariate response along with a covariate, proposed to plot the estimated contour of the random effects distribution. The contour plot can be seen as the predictive region of the estimated pair of sensitivity and specificity. The prediction region of the copula mixed model does not depend on the assumption of bivariate normality of the random effects and has non-elliptical shape.
Figure 1 demonstrates these curves and summary operating points (a pair of average sensitivity and specificity) with a confidence and a predictive region from the best fitted copula mixed model for all the meta-analyses in Section 5. Both CL methods in Chen et al. (2014, 2016b) cannot be used to produce the SROC curves, since the dependence parameters affect the shape of the SROC curve and these are set to independence by definition. Note in passing that the CL method in Chen et al. (2014) can provide a confidence region but this is restricted to the elliptical shape.
Nevertheless, the additional feature of having to estimate the association among the random effects in ML estimation has been found to require larger sample sizes than in CL estimation where this parameter is set to independence. The application example includes cases with an adequate number of individual studies. For meta-analyses with fewer studies the CL methods in Chen et al. (2014, 2016b) can be recommended if a bivariate copula mixed model is near non-identifiable (or has a flat log-likelihood) and the estimation of an average operating point (summary sensitivity and specificity) is of interest instead of a SROC curve.
The R package CopulaREMADA (Nikoloulopoulos, 2016) has been used to produce the ML estimates (along with their SE) of the parameters from the copula mixed models and plot the SROC curves and summary operating points (a pair of average sensitivity and specificity) with a confidence and a predictive region. The R package xmeta (Chen et al., 2016a) has been used to produce the CL estimates (along with their SE) of the parameters from both methods in Chen et al. (2014, 2016b).
The simulations presented in this paper were carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia.
- A.Nikoloulopoulos@uea.ac.uk, School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
- Arends, L. R., Hamza, T. H., van Houwelingen, J. C., Heijenbrok-Kal, M. H., Hunink, M. G. M., and Stijnen, T. (2008). Bivariate random effects meta-analysis of ROC curves. Medical Decision Making, 28(5):621–638.
- Chen, Y., Hong, C., Chu, H., and Liu, Y. (2016a). xmeta: A Toolbox for Multivariate Meta-Analysis. R package version 1.1-2.
- Chen, Y., Hong, C., Ning, Y., and Su, X. (2016b). Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach. Statistics in Medicine, 1:21–40.
- Chen, Y., Liu, Y., Ning, J., Nie, L., Zhu, H., and Chu, H. (2014). A composite likelihood method for bivariate meta-analysis in diagnostic systematic reviews. Statistical Methods in Medical Research. DOI:10.1177/0962280214562146.
- Chu, H. and Cole, S. R. (2006). Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. Journal of Clinical Epidemiology, 59(12):1331–1332.
- Chu, H., Nie, L., Chen, Y., Huang, Y., and Sun, W. (2012). Bivariate random effects models for meta-analysis of comparative studies with binary outcomes: Methods for the absolute risk difference and relative risk. Statistical Methods in Medical Research, 21(6):621–633.
- Glas, A., Lijmer, J., Prins, M., Bonsel, G., and Bossuyt, P. (2003a). The diagnostic odds ratio: A single indicator of test performance. Journal of Clinical Epidemiology, 56(11):1129–1135.
- Glas, A., Roos, D., Deutekom, M., Zwinderman, A., Bossuyt, P., and Kurth, K. (2003b). Tumor markers in the diagnosis of primary bladder cancer. a systematic review. The Journal of Urology, 169(6):1975–1982.
- Godambe, V. P. (1991). Estimating Functions. Oxford University Press, Oxford.
- Jackson, D., Riley, R., and White, I. R. (2011). Multivariate meta-analysis: Potential and promise. Statistics in Medicine, 30(20):2481–2498.
- Ma, X., Nie, L., Cole, S. R., and Chu, H. (2016). Statistical methods for multivariate meta-analysis of diagnostic tests: An overview and tutorial. Statistical Methods in Medical Research, 25:1596–1619.
- Mavridis, D. and Salanti, G. (2013). A practical introduction to multivariate meta-analysis. Statistical Methods in Medical Research, 22(2):133–158.
- Nikoloulopoulos, A. K. (2015a). A mixed effect model for bivariate meta-analysis of diagnostic test accuracy studies using a copula representation of the random effects distribution. Statistics in Medicine, 34:3842–3865.
- Nikoloulopoulos, A. K. (2015b). A vine copula mixed effect model for trivariate meta-analysis of diagnostic test accuracy studies accounting for disease prevalence. Statistical Methods in Medical Research. DOI:10.1177/0962280215596769.
- Nikoloulopoulos, A. K. (2016). CopulaREMADA: Copula mixed effect models for bivariate and trivariate meta-analysis of diagnostic test accuracy studies. R package version 1.0.
- Ogden, H. (2016). A caveat on the robustness of composite likelihood estimators: The case of a mis-specified random effect distribution. Statistica Sinica, 26(2):639–651.
- R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Rücker, G. and Schumacher, M. (2009). Letter to the editor. Biostatistics, 10(4):806–807.
- Rücker, G. and Schumacher, M. (2010). Summary ROC curve based on a weighted Youden index for selecting an optimal cutpoint in meta-analysis of diagnostic accuracy. Statistics in Medicine, 29:3069–3078.
- Stroud, A. H. and Secrest, D. (1966). Gaussian Quadrature Formulas. Prentice-Hall, Englewood Cliffs, NJ.
- Varin, C. (2008). On composite marginal likelihoods. Advances in Statistical Analysis, 92:1–28.
- Varin, C., Reid, N., and Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 21:5–42.
- Xu, X. and Reid, N. (2011). On the robustness of maximum composite likelihood estimate. Journal of Statistical Planning and Inference, 141(9):3047–3054.