Longitudinal quantile regression in presence of informative drop-out through longitudinal-survival joint modeling
We propose a joint model for a time-to-event outcome and a quantile of a continuous response repeatedly measured over time. The quantile and survival processes are associated via shared latent and manifest variables. Our joint model provides a flexible approach to handle informative drop-out in quantile regression. A general Monte Carlo Expectation Maximization strategy based on importance sampling is proposed, which is directly applicable under any distributional assumption for the longitudinal outcome and random effects, and parametric and non-parametric assumptions for the baseline hazard. Model properties are illustrated through a simulation study and an application to an original data set about dilated cardiomyopathies.
Key Words: Quantile Regression; Longitudinal Regression; Joint Models; Shared-parameter models.
In longitudinal studies subjects may be lost to follow-up due to events, like death, which are associated with the outcome of interest. Failure to model drop-out may lead to biased estimates in such cases. From the reverse perspective, the time trend of a longitudinal measurement may predict the risk of an event (e.g., a steadily decreasing CD4 count is predictive of adverse events in HIV patients). A general account of related longitudinal and survival processes can be found in follmann:95. Participation to a study can in general be described by a survival model of time to drop-out. The simpler way of taking into account informative drop-out is through pattern-mixture models (e.g., wu:bail:88; wu:bail:89; litt:wang:96), where the outcome distribution is specified conditionally on the time to drop-out. Selection models on the other hand condition the drop-out mechanism to unobserved responses directly (digg:kenw:94) or indirectly. A simple and effective indirect link between drop-out and unobserved outcomes is by assuming that these are independent conditionally on unobserved shared random effects, as in wu:carrol:88. One can similarly assume that the risk of event at time is influenced by the expected value of the longitudinal response, as in riz:10 and rizop:ghosh:11. In the resulting Joint Model (JM), the hazard of drop-out is a function of the predicted longitudinal outcome, that is, of shared random and fixed effects, and related covariates. See also litt:95, hend:et:al:00, tsia:davi:04, and references therein.
Our concern in this paper is that the expected value of the longitudinal outcome may not always be the summary of interest. Further, in some cases it might be difficult to find a suitable transformation to normality for the outcome, or some resistance to outliers may be desired. An effective solution to these issues is given by modeling conditional quantiles of the longitudinal outcome. Quantile regression models (koen:05) are robust with respect to outliers, so that one can simply model the median rather than the mean. In many biomedical applications interest lies furthermore in at least one of the tails, and covariates may have different effects on different quantiles of a distribution. Examples include longitudinal fetal growth studies, which are usually focused on low and high quantiles of key anthropometric measurements. Finally, quantiles are invariant to transformations so it is never needed to transform the outcome. Longitudinal quantile regression models are proposed among others in koen:04, who maximizes a penalized version of the likelihood; and gera:bott:07, who introduce a random intercept and assume the outcome follows the asymmetric Laplace distribution (ALD). yuan:bott:09 extend the random intercept model to a general linear mixed quantile regression model. See also gera:bott:13. The ALD assumption is also used in farc:12, where random effects are time-varying and follow a discrete distribution. Informative missing data are ubiquitous in statistical applications, especially in longitudinal studies, but there are very few approaches to quantile regression with informative drop-out. In lips:et:al:97 and yi:he:09 estimating equations are weighted proportionally to the inverse of the probability of drop-out. yuan:yin:10, in a Bayesian framework, model missingness as a binary time series sharing a random effect with the quantile regression process. A common limit of these approaches is that drop-out can occur only at one of the observation times of the longitudinal process. This does not hold in general (e.g., when measurements are scheduled according to a protocol, and death occurs between two visits). Moreover, all approaches proposed so far do not directly model the strength of association between the longitudinal and time-to-event processes. The latter is summarized by non-ignorability parameter(s) in JM as the one we propose. A longitudinal quantile regression model with ignorable missingness is outlined in gera:13.
In this paper we propose a joint model for a right-censored time-to-event outcome and the quantile of a continuous response repeatedly measured over time. Drop-out is formally defined as a monotone missing pattern, that is, when the outcome is not measured for a subject, no further measurements take place for that subject until the end of the study.
In our approach the quantile and survival processes are associated not only via shared latent variables or the predicted longitudinal outcome. It is in fact assumed that the time-to-event outcome depends on a function, which for simplicity we assume linear, of both the latent variables and the predicted quantile of the longitudinal outcome. Two non-ignorability parameters are introduced, one for the fixed and the other for the random part of the linear predictor. Our joint model is therefore a flexible approach to handle informative drop-out in longitudinal quantile regression. Maximum likelihood estimates are efficiently derived by setting up a Monte Carlo Expectation-Maximization (MCEM) algorithm based on importance sampling (douc:et:al:01; levi:case:01). There are two clear advantages of using importance sampling: first, the resulting MCEM is completely general and straightforward to use with any distributional assumption for the longitudinal observations or the random effects. Secondly, it is computationally efficient given that we evaluate the posterior distribution only once for each sample. The MCEM approach, unlike commonly used quadrature methods, is amenable also to moderate dimensional random effects. The use of two non-ignorability parameters allows us also increase the flexibility of both the shared parameter model of wu:carrol:88, by conditioning the drop-out process also on the residuals between the predicted longitudinal outcome and the random effects, and the JM of WULF:TSIAT:97 and riz:10, by allowing a residual dependence on the random effects. In the first case, we can say that the two processes are not only linked by unmeasured heterogeneity, but that their dependence can also be in part explained through shared observed heterogeneity. In the second case, we can say that the two processes are linked not only via a quantile of the longitudinal outcome, but also by a residual unmeasured heterogeneity.
The rest of the paper is as follows: in the next section we describe the proposed model. In Section 3 we outline inference, we illustrate the approach through simulations in Section 4, and in Section 5 where we apply the method to an original data set about patients with cardiomyopathy. Finally, in Section 6 we conclude with a brief discussion.
2 Joint longitudinal quantile and survival regressions
Let denote the observed failure time for the th individual, , taken as the minimum between the true event time and the censoring time . Further, let be the corresponding event indicator defined by , where is the indicator function. The continuous outcome is repeatedly observed at times before , and is missing for . The longitudinal outcome at observation times is collected in . We assume that the longitudinal process is associated with , i.e with the true event time, but, as customary in survival analysis, is independent of the censoring time .
We let denote a vector of predictors used to model only the longitudinal outcome, a vector of shared predictors, and a vector of (time fixed) predictors used to model only the survival process. These are associated with fixed effects , and , respectively. We then have a vector of predictors associated with dimensional random effects , and two non-ignorability parameters (associated with fixed effects) and (associated with random effects). Our model can be expressed by a set of two equations, one for the longitudinal and the other one for the survival outcome:
where the first equation gives the longitudinal model and the second the time-to-event model, and is a baseline function. Specifically, the model for the longitudinal outcome is formulated along the usual lines for mixed effects models (verb:mole:00) and the model for the time-to-event outcome is based on the subject-specific hazard function (cox:72; and:82). The risk of drop-out is conditional on , i.e. the error-free longitudinal process history up to time . The model is completed by a distributional assumption for the shared latent distribution, that is, by specifying . Few options are discussed in Section 2.3.
The degree of dependence between the longitudinal and the survival processes is measured by the association parameters and , which are introduced to assess potential non-ignorability of the missing data mechanism. In doing so, we admit two sources of non-ignorability: a part that can be explained through observed heterogeneity in (but not ) and a part that is due to unobserved heterogeneity. The log-hazard ratios associated with can be estimated as , while those associated with are directly estimated as . All parameters are identifiable even if we multiply some of them in the survival model equation.
It is worth noting how our proposed model (1) generalizes previous work. If we fix , , and assume a Gaussian distribution for the error, we obtain a usual formulation for the JM. Otherwise, with an ALD error distribution (see below), we obtain a joint quantile regression model. Here, the link between the time-to-event and longitudinal processes is summarized by the term in the model for the hazard function. If and ,
that is, we generalize JM models by allowing for a residual dependence on unobserved heterogeneity as summarized by the shared random effects. When , we further assume that some predictors may be related only to the longitudinal outcome but may not contribute to explain non-ignorability. Similarly, a shared parameter model is obtained if we fix . Specifically, if and we have Gaussian error terms and random effects, we exactly obtain the classical model in WULF:TSIAT:97. When we condition the survival process also on the difference between the predicted longitudinal outcome and the random effects. It is straightforward to fully generalize (1) by including also random effects that are not shared in each part of the model, and by letting be time-dependent. In this way we would have, in each model, separate and shared covariates both for random and fixed effects. We do not pursue this explicitely to keep notation simple, but we note that our MCEM strategy can be directly adapted to this completely general case.
We start describing each part of the model separately, then we outline how they are linked by obtaining the observed likelihood.
2.1 The longitudinal model
The parametric assumption on the error distribution of the longitudinal outcome drives our target for inference. If we assume that follows a zero-centered Gaussian, we work with the conditional expectation of the outcome. On the other hand, if we assume an ALD, does not represent the conditional mean of anymore, but its conditional quantile. Note that is pre-specified and fixed. The resulting density of , conditionally on covariates and random effects, is given by
where is the quantile loss function and is a scale parameter. When , we obtain a random intercept model (gera:bott:07). The ALD is justifiable since maximum likelihood is exactly equivalent to minimization of the quantile loss function, when no parametric assumptions are made on (yu:moye:01). Further, it can be seen in simulation studies to lead to good estimates even when the residuals are not ALD distributed (see for instance yuan:bott:09, farc:12).
2.2 The survival model
The time-varying baseline risk function in (1) can be seen as the risk obtained when all covariates and random effects are exactly zero. We may wish to specify a flexible parametric form for (e.g., , leading to a Weibull model), as in for instance rizop:ghosh:11, riz:12, vivi:alfo:rizo:13 or we may leave it unspecified as in follmann:95. The inferential strategy for obtaining the MLE is slightly different in the two cases, as we discuss in the next section.
The time-to-event distribution can in both cases be written as
where denotes the survival function, i.e.
while is given by the second equation in (1).
2.3 The random effects model
A commonly used distribution for random effects is the multivariate normal. This is convenient to work with when a Gaussian assumption is formulated also for the longitudinal outcome. The multivariate normal may not be satisfactory anyway when the number of occasions is small (rizopoulos:08b,hsie:06) and /or the dimensionality of is large. Further, we may often expect a slower convergence of the posterior distribution of the random effects to a multivariate normal when modeling lower or upper quantiles. Two valid alternatives for the random effects distribution are given by a multivariate with degrees of freedom:
which can be used to capture fat tails of the random effects; and a multivariate ALD, which is often suggested in the quantile regression framework (yuan:bott:09):
where and is the modified Bessel function of the third kind. The most appropriate random effects distribution may be chosen for instance using the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC), see for instance verblesaf:96.
2.4 Observed likelihood
In what follows, is a short-hand notation for model parameters, that is, , , , , , , and any parameter associated with baseline hazard . The joint likelihood contribution of the longitudinal and survival processes for the -th subject is obtained integrating the conditional distributions in (1) over the random effects space:
where is given in equation (3) and
while is given in equation (2). The observed data log-likelihood for the joint quantile regression model is then
The integrals involved in (4) are usually tackled in the JM context through quadrature methods (see for instance riz:12, vivi:alfo:rizo:13). While effective in one or two dimensions, quadrature methods tend to become too slow or less precise as the dimensionality of the random effect distribution grows. Furthermore, quadrature methods should have to be tailored to the random effects distribution (e.g., a Gauss-Hermite quadrature would be best for Gaussian random effects, while a Gauss-Laguerre may be better under other assumptions). In next section we propose a Monte Carlo strategy which is completely general, and allows us to set up an algorithm which is easily adapted to any assumption on the random effects and to any functional form for the two parts of the JM.
3 Estimation of the proposed model
We propose a MCEM algorithm for fitting the proposed model. This involves alternating two steps until convergence: (i) sampling from the posterior distribution for the random effects, given the data and current values of the parameters (Monte Carlo step), and obtaining the conditional expected value of the complete data log likelihood (E-step), (ii) maximizing the latter (M-step). The algorithm is guaranteed to converge to a local optimum. In order to increase the odds of obtaining the global optimum, we perform a multistart. The first run starts from estimates obtained from separate longitudinal and survival models, which are readily available. Other runs are initialized by randomly perturbing the deterministic initial solution. A second remark regards how to obtain standard errors and confidence intervals. When performing quantile regression on the longitudinal outcome, we use a non-parametric bootstrap strategy (buch:95, andr:buch:00). We preserve the dependency structure in the data by resampling subjects rather than separately resampling the outcomes (and related predictors). Under the usual regularity conditions, tests on the regression parameters may then be simply performed by using Wald statistics based on the standard errors.
The MCEM algorithm is based on the complete likelihood, that is, the likelihood we would have if we could observe the random effects. The individual contribution to the complete data log-likelihood can be obtained as
The MCEM algorithm is completely general and can be simply adapted to any distributional assumption for the longitudinal error and random effects distribution. Assuming an ALD for the error distribution of the longitudinal measurements, the complete data log-likelihood is as follows:
3.1 Monte Carlo E-step
The conditional expected value of (6) for the -th subject at the -th iteration of the algorithm is expressed as
where denotes the current value of the parameters. The posterior distribution of the random effects is
Straightforward algebra can be used to see that (9) simplifies to
In order to work with (8), we need to marginalize the joint distribution with respect to the multivariate random effect posterior distribution. The resulting integral is conveniently approximated through Importance Sampling (IS). Importance sampling proceeds by obtaining a random sample from a proposal distribution . The IS identity
and In this work we proceed as in levi:case:01, where we sample from the posterior distribution at the initial parameter estimates using adaptive rejection Metropolis sampling (gilks:95), and then update the weights at each iteration. The MC sample size is increased to control the MC error (levi:case:01; eichoff:04). Finally, the observed likelihood (5) can be directly approximated as . The latter is used to check convergence of the MCEM and after convergence for testing and computation of information criteria. In summary, the E-step is given by updating of , , and evaluation of the likelihood.
We outline here the M-step, which consists in maximizing the approximated conditional expected value of the complete likelihood with respect to .
When is left completely unspecified, we obtain a Nelson-Aalen (nels:72; aale:78) type estimator as
by setting to zero the score equations for (WULF:TSIAT:97). See also niel:gill:ande:sore:92 and gill:92. The expression (11) is not an explicit solution as it depends on other parameters. Nevertheless, it can be plugged in the conditional expected value of the complete likelihood, thus obtaining a profile complete likelihood (complete with respect to the random effects, and profiled with respect to the non-parametric baseline along the lines of cox:72). If instead we specify a parametric form for , e.g, , we can plug-in this expression and update any parameter involved in within the rest of the M-step.
When a Gaussian distribution is assumed for the longitudinal measurements, regression coefficients and hazard ratios are updated through a one-step Newton-Raphson algorithm (which is easily adapted from WULF:TSIAT:97), while the variance of the random term in (1) is estimated through the usual closed form expression.
When we assume an ALD for the longitudinal measurements, the M-step is complicated by the presence of the quantile check function. An estimator of , dependent on the other parameters, can be explicitely obtained as:
The vector of regression parameters and dispersion parameter for the ALD is block updated using one step of the neld:mead:65 numerical optimization algorithm, majorizing (3.1) after plug-in of (12) and (11) or the parametric formula of . The resulting expected complete likelihood is
The integral involved in the expression above (and similarly in any expression where appears also at the E-step) reduce to sums when a non-parametric baseline is used, and can instead be approximated using one-dimensional Gauss-Kronrod quadrature (e.g., kaha:mole:nash:89) when a parametric assumption is formulated for .
The M-step is concluded by maximizing the approximated conditional expected value of the complete likelihood with respect to parameters involved in the distribution of the random effects. This is readily accomplished under any of the assumptions we have proposed in Section 2.3 using the method of moments. In all cases in fact can be updated as the weighted empirical covariance matrix of the random effects sampled at the E-step.
In this section we illustrate our approach through a simulation study. We evaluate the bias and standard deviation of the estimates for our proposed model for data Missing Not At Random (MNAR), and compare with a model which ignores informative drop-out (Missing At Random - MAR model).
For , , , we fix and . We assume random effects arise from a centered bivariate normal distribution with standard deviations equal to 0.3 and correlation 0.16. We let , , ; with , , , and generated from independent standard normals. By also fixing it is possible to exactly obtain the survival distribution as
when or and
when . The expression above can be inverted to obtain after generating random variates uniformly distributed on the unit interval. We then let the censoring time be distributed according to a Beta with parameters 4 and 1, in order to obtain a censoring proportion around 25%. We allow for a maximum of six observation times for each subject, at . Longitudinal observations before drop-out are independently generated from an ALD for the -th quantile, centered on
and with dispersion parameter . We fit our joint model with parametric baseline distribution and two separate models (one for the longitudinal process and one for the time-to-event, therefore obtaining MAR estimates), based on each generated data set.
For each setting we report the bias and standard deviation of the estimates averaged over replicates, and further averaged over groups of parameters for , and . Results are shown in Table 1, where it can be seen that the MNAR model has a very low bias and standard deviation of the estimates for all values of and , with very few exceptions which are likely due to random fluctuation. Results are consistent across all quantiles, with a slightly larger MSE for quantiles distant from the median, as expected.
In Table 2 we report the ratio of the bias and the variance of the estimates of the MAR over the MNAR model, based on the average bias for all parameters, separately for the longitudinal and survival parameters. These ratios are close to the unity when , with our model generally performing slightly better given that the MNAR model assumes parameters are equal in the longitudinal and survival parts. When or are non-zero, the ratios of the variances of the estimates are still close to the unity, but the ratios of biases increase substantially. The bias of the MAR model may be up to 30 times the bias of our joint model. The effect of is often stronger than the effect of , but this is likely only due to the fact that in all simulated settings there is a larger heterogeneity due to the shared covariates with respect to the unobserved heterogeneity due to random effects. As could be expected, the ratios are generally increasing with , given that the MSE of the joint model is infinitesimal.
5 Application to dilated cardiomyopathy data
In this section we briefly illustrate the proposed approach on an original data set about patients with dilated cardiomyopathy. Data refers to consecutive patients who begun treatment for dilated cardiomyopathy in the cardiovascular department of “Ospedali Riuniti” in Trieste, Italy. Patients were enrolled at first treatment and scheduled for follow-up after 6 months, 1, 2, 3, 4, 6 and 10 years, with only 25% of the patients having complete records. Maximum follow-up time before cardiovascular death or censoring due to loss at follow-up or transplant is 25 years, with a total of 212 events (32%).
Dilation of the left ventricular is known to lead to hearth failure, and in many cases an ethiological basis cannot be identified (merlo:11). The goal of this study is to compare patients with mild dilation of the left ventricular (Mildly Dilated CardioMyopathy or MDCM) with respect to patients with a general dilation of unrecognized ethiology (Idiopatic Dilated CardioMyopathy or IDCM). MDCM patients are generally believed to be at a slightly lower risk (e.g., kere:90), but the physiological reasons are still unrecognized.
Our longitudinal outcome of interest is the left ventricular ejection fraction (LVEF), that is, the volumetric fraction of blood pumped out of the ventricle with each heartbeat. Note that LVEF is a bounded outcome, but all measurements are far from the boundaries so that we unlike bott:cai:mcke:10 we can avoid any transformations. Dropout occurs due to cardiovascular death, and it can be easily expected that the two processes are related as patients with a lower ejection fraction are at higher risk of death. For example, a univariate Cox model for the baseline LVEF gives an hazard ratio (HR) of 0.95 for each percentage point, with . Furthermore, LVEF is skewed and its skewness seems to change over time. This leads to two issues: first, using a classical joint model after transformation of the LVEF would be awkward, as the optimal transformation is different at each time point. Secondly, modeling the mean of the transformed LVEF would not be as meaningful from a clinical perspective than directly modeling quantiles, which is also straightforward to interpret. We mostly are interested in low quantiles (like the 10th or the 15th), in the terziles or quartiles for the outcome. See for instance sand:03, Clements01062005, ndre:etal:07, Cowie03102012. Consequently, we explore the ejection fraction distribution by evaluating .
We model the longitudinal outcome conditionally on an intercept and age at baseline ( matrix), on the indicator of MDCM and its interaction with time ( matrix). Besides covariates in the matrix, we let the hazard of death depend on gender (1 for males) and indicator of New York Hearth Association (NYHA) functional classes I or II at baseline ( matrix). For more details on NYHA functional classes see for instance merlo:11 and references therein. We also include a shared subject-specific random intercept and a random slope, that is, . Given that the number of follow-up times is slightly large, we use a normality assumption for . We also have compared with a multivariate and multivariate Laplace, with analogous results which we do not report for reasons of space. We only mention that the multivariate normal distribution is chosen using AIC and BIC criteria. We estimate our proposed model both with a parametric Weibull baseline and with a non-parametric baseline. Likelihood, AIC, and BIC at each quantile are reported in Table 3.
Based on those results we select the non-parametric baseline for all quantiles. We compare the estimates with those obtained under a MAR model in Table 4.
There is a stronger and stronger effect of age on LVEF as increases, while the effect of MDCM is slightly constant with , with a negative interaction with time. Males are at slightly higher risk of death and patients in lower functional NYHA classifications are at a lower risk. After adjusting for these covariates, the significant and negative estimates for lead us to conclude that MDCM is an independent predictor of a slightly lower risk of death, even after considering its effect on LVEF. We could expect negative estimates for and as longitudinal measurements and survival time are positively dependent.
Ignoring drop-out may lead to an important bias. First of all, the intercepts estimated with the MAR models are slightly larger than those obtained with the MNAR models for all , except . This is in line with the expected consequences of drop-out in the low quantiles of the longitudinal outcome distribution. Secondly, under the MAR models a significant positive interaction between MDCM and time is estimated. This may be due to the fact that subjects with higher LVEF tend to drop-out later and to be in the MDCM class more often, resulting in a positive bias when ignoring the informative drop-out. We conclude by noting that given the results in Table 4 we can conclude there is some sensitivity to drop-out for the data at hand within the proposed class of models. As clearly noted in mole:et:al:08 one can never test the MAR assumption.
Informative drop-out may bias parameter estimates both in mean and quantile regression if ignored. As our data example suggests, the problem may be stronger for quantiles corresponding to a higher rate of events. In our example we have checked that sensitivity to drop-out is milder in high quantiles than in low quantiles, for instance. Moreover, the brief simulation study reported confirms that, as long as the informative drop-out process is ignored, bias of the parameter estimates may be substantial and, more importantly, may not decrease with the sample size.
The proposed approach allows to simultaneously model the quantile of a longitudinal outcome and the hazard of drop-out, allowing them to share part of the observed and unobserved heterogeneity. Our model can be applied with right-censored event times occuring between two scheduled visits, when drop-out times coincide with observation times for the longitudinal process, and also when the observation times are not scheduled in advance. We have generalized shared-parameters and joint-models in different directions: first of all, we have proposed an alternative parametric assumption for the longitudinal error, the ALD, which allows to fit quantile regression models. Secondly, we have proposed two alternative random effects distributions. A general efficient MCEM strategy has been used to fit our model under any of those assumptions.
In our example we have specified different values for the quantile of interest for illustration. It can be argued that under conditional independence assumptions the total likelihood is the sum of the likelihood based on each quantile, hence this approach is equivalent to simultaneously fitting the model for different values of , and -specific parameters. When more than one quantile is of interest in applications, one could also allow dependence (e.g., over the random effects at each quantile) or -homogeneity (e.g., for the variance of the random effects). Model estimation under these assumptions is at the moment grounds for further work.
The authors are grateful to the Cardiovascular department of “Ospedali riuniti”, Trento, Italy, and in particular to Giulia Barbati for permission to use the dilated cardiomyopathy data.