[

[

[ Institute of Diabetes Research, Helmholtz Zentrum München, and Forschergruppe Diabetes, Klinikum rechts der Isar, Technische Universität München, Germany    [ Department of Statistics, Faculty of Economics and Statistics, Universität Innsbruck, Austria    [    [    [ Forschergruppe Diabetes e.V. at the Helmholtz Zentrum München, Germany    [ Department of Statistics, Ludwig-Maximilians-Universität München, Germany
Abstract

The joint modeling of longitudinal and time-to-event data is an important tool of growing popularity to gain insights into the association between a biomarker and an event process. We develop a general framework of flexible additive joint models that allows the specification of a variety of effects, such as smooth nonlinear, time-varying and random effects, in the longitudinal and survival parts of the models. Our extensions are motivated by the investigation of the relationship between fluctuating disease-specific markers, in this case autoantibodies, and the progression to the autoimmune disease type 1 diabetes. By making use of Bayesian P-splines we are in particular able to capture highly nonlinear subject-specific marker trajectories as well as a time-varying association between the marker and the event process allowing new insights into disease progression. The model is estimated within a Bayesian framework and implemented in the R-package bamlss.

Anisotropic smoothing; Biomarkers; Longitudinal data; Time-to-event data; P-Splines;
\pagespan

1

Flexible additive joint models]Flexible Bayesian additive joint models with an application to type 1 diabetes research Meike Köhler et al.]Meike Köhler 111Corresponding author: e-mail: meike.koehler@helmholtz-muenchen.de, Phone: +49-(0)89-3068-2917, Fax: +49-(0)89-3187-3144 ]Nikolaus Umlauf ]Andreas Beyerlein ]Christiane Winkler ]Anette-Gabriele Ziegler ]Sonja Greven

1 Introduction

The joint modeling of longitudinal biomarkers and the time to disease onset or death offers unique insights into disease progression in various medical domains (Taylor et al., 2013; Gras et al., 2013; Daher Abdi et al., 2013). Depending on the disease and the respective biomarker different challenges have to be faced in joint modeling. In the following, a general framework for the flexible joint modeling of longitudinal data and time-to-event is presented, which was motivated by unique cohort data from studies exploring the development of type 1 diabetes (T1D). The research on T1D underwent a paradigm shift in the past decade, when disease-specific autoantibodies where shown to be diagnostic for the disease before the onset of clinical symptoms and thus paving the way for a pre-clinical diagnosis of T1D (Ziegler et al., 2013; Bonifacio, 2015; Insel et al., 2015). Prior to the onset of clinical symptoms, i.e. the need of insulin substitution, the disease is already progressing and insulin-producing beta-cells in the pancreas are gradually destroyed by the body’s own immune system. This immune process, leading to an onset of clinical symptoms within months up to a decade, can be diagnosed by the emergence of T1D-specific autoantibodies. However, it remains an open question whether the longitudinal patterns of these autoantibodies might be associated with the rate of progression to T1D.

In recent years joint models gained larger popularity in the modeling of associations between time-varying biomarkers and time-to-event. By estimating a submodel for a longitudinal biomarker, usually a mixed model, jointly with the survival submodel of a time-to-event process, one can account for the informative censoring and the within-subject errors in the longitudinal model and can incorporate the longitudinal information, observed only at person-specific discrete timepoints, as a continuous-time covariate in the survival model. Comprehensive overviews on the topic are given in Tsiatis and Davidian (2004), Rizopoulos (2012) and Gould et al. (2015). In our work we focus on extensions of so-called shared parameter models. These assume that a set of parameters influences both the longitudinal and the survival model, and that there is conditional independence given those parameters.

In T1D research little is known concerning typical trajectories of autoantibodies as biomarkers. At the same time the observed trajectories show highly nonlinear patterns over time and differ strongly between subjects, see Figure (a)a. In consequence, a flexible specification of individual trajectories in the longitudinal model is needed in our application.

(a)
(b)
Figure 3: Longitudinal marker values of for five randomly selected subjects in the BABYDIAB/BABYDIET study. (a) Observed values (points) and linear interpolation (lines); (b) Observed values (points) and estimated trajectories (lines).

Much work on joint models has focused on simple parametric longitudinal trajectories, while only few approaches allow for more flexible, potentially non-parametric longitudinal models. Ding and Wang (2008) model mean trajectories by B-splines and allow for one multiplicative random effect per subject. For our application however it remains questionable if such a model is flexible enough to capture the highly different trajectories. Spline based approaches, that allow also the random effects to be non-linear functions in time, are mentioned by Song and Wang (2008) and were employed by Rizopoulos and Ghosh (2011) and Rizopoulos et al. (2014) as well as Brown et al. (2005) and Brown (2009). While allowing for flexibility, a disadvantage of all these approaches is finding an optimal number of knots to specify the flexible longitudinal model, e.g. by AIC or DIC. As the number of random effects increases with the number of knots, this number is limited in practice. We aim to avoid the explicit choice of knots and number of basis functions by using a penalized spline approach, where a larger number of knots is specified and smoothness penalties are employed (Lang and Brezger, 2004). Tang and Tang (2015) also make use of P-Splines in modeling longitudinal trajectories, but do so only in estimating the mean function, whereas we model also the individual trajectories as smooth functions of time. This is similar in spirit to the specification of individual trajectories in Jiang et al. (2015), however we do not assume an underlying class membership for the random effects.

The estimation of joint models with complex subject-specific trajectories poses a challenge to frequentist estimation approaches due to the necessary integration over potentially high-dimensional random effects distributions. Due to this drawback and further advantages of the Bayesian approach in joint modeling, such as straightforward model assessment and the potential integration of previous knowledge via priors (Gould et al., 2015), many complex joint models, like e.g. the aforementioned models, are specified within a Bayesian framework. The most widely used sampling approach for the parameter distributions in Bayesian joint models is Gibbs Sampling, e.g. Faucett and Thomas (1996); Guo and Carlin (2004); Brown and Ibrahim (2003), also in conjunction with Metropolis-Hastings algorithms (Tang and Tang, 2015). In addition, the well established R-package JMbayes (Rizopoulos, 2016a, b) implementing Rizopoulos et al. (2014) employs a random walk Metropolis-Hastings algorithm. Our Bayesian estimation approach is different as we employ a derivative-based Metropolis-Hastings algorithm, where we draw samples from approximations of the full conditionals using score vectors and Hessians of the parameters. Despite being computationally demanding this algorithm shows a high stability in the model estimation, as we also show in our simulations.

In addition to the need for a flexible longitudinal model, a further generalization of existing joint models seems necessary in our application, namely a time-varying association between the biomarker and the time-to-event. Here, the biomarker indicates an ongoing immune process eventually leading to the destruction of the insulin-producing beta cells. As the activity of the immune system is constantly regulated, it is plausible that the association between a biomarker and the hazard of T1D varies over time. For example a recent paper by Meyer et al. (2016) indicated that patients with an autoimmune disease can also present unique disease-ameliorating autoantibodies. Such a time-varying association has rarely been studied in the context of joint models. Using a discretized time-scale and a probit model for the discrete hazard function, Barrett et al. (2015) allow for the association to vary over the discrete time points in their model. However this flexible specification is not considered in their simulations, the applied examples or the code provided to fit the models. A time-varying coefficient to associate the marker and the event process is the focus of the conditional score estimation approach in Song and Wang (2008). This approach can be seen as a weighted local partial likelihood without any assumptions on the distribution of the random effects. While this approach accounts for measurement error and short-term biological fluctuations in the longitudinal marker when modeling the hazard, it only permits inference on the survival parameters and not on the longitudinal model.

In order to allow for these two extensions, the flexible longitudinal trajectories and a potentially nonlinear time-varying association, both modeled by penalized splines, we develop and implement a highly flexible framework for joint models available within the R-package bamlss. As we represent all parts of this flexible joint model as structured additive predictors, which can include linear, parametric but also nonparametric penalized terms, we are able to allow potentially nonlinear, smooth, random, and time-varying effects in both submodels. In consequence the possibilities of this implementation go way beyond the two extensions that originally triggered the development. By applying this flexible model to the combined data set from two German high-risk T1D birth cohorts we aim to shed further light on the complex relationship between T1D-associated autoantibodies and the onset of clinical disease.

The remainder of this paper is structured as follows: The general model structure and potential extensions are outlined in Section 2. In Section 3, details on the Bayesian estimation procedure are given. A thorough testing of the model estimation through simulations is presented in Section 4 and the application to our T1D research question in Section 5. Concluding remarks are given in Section 6 and technical details as well as additional figures can be found in the Appendix. The presented model is implemented in the R-package bamlss (Umlauf et al., 2016). Source code to reproduce the simulation results is available in the ancillary material.

2 Methods

In the following, the general setup for additive joint models is presented with a special focus on two extensions in the present work compared to existing approaches: the flexible specification of longitudinal trajectories as well as the time-varying association between longitudinal marker and event. An overview of potential further model specifications illustrates the flexibility of the presented model family.

2.1 General Setup

For every subject we observe a potentially right-censored follow-up time and the event indicator (1 if subject experiences the event, 0 if it is censored). We model the hazard of an event at time as

(1)

including in the full predictor a predictor for all survival covariates that are time-varying or have a time-varying coefficient including the log baseline hazard, a predictor for baseline survival covariates as well as a predictor representing the potentially time-varying association between the longitudinal marker and the hazard.
We also observe a longitudinal response at the potentially subject-specific ordered time points with . denotes the vector of the longitudinal measurement time points of all subjects. The longitudinal response at with is modeled as

(2)

with independent errors allowing to also model the error variance. Thus represents the longitudinally observed marker value without error at timepoint . This “true” marker value serves as a continuous-time covariate in the hazard in (1) and links the two model equations.

Each predictor with is a structured additive predictor, i.e. a sum of functions of covariates ,

Different subsets of can serve as covariates for the different predictors, with each typically depending on one or two covariates. For time-varying predictors the functions can also dependent on time with a potentially time-varying covariate vector . We express the vector of predictors for all subjects as . These vectors are of length for the survival part of the model (1), where for denotes that predictors are evaluated at time . In the longitudinal part of the model (2) the vector for is of length , containing entries for all , , i.e. evaluations at all observed time points for the corresponding subjects.

The functions can model a variety of effects, such as smooth, spatial, time-varying or random effects terms which can be expressed in a straightforward notation for every term of predictor by using suitable basis function expansions and corresponding penalties . In a generic setup we let

(3)

with the vector of function evaluations stacked for each subject, the design matrix , the coefficient vector , the penalty matrix and the variance parameter that controls the amount of penalization of the respective term. In the Bayesian setting a penalization is imposed by specifying an appropriate prior distribution for the parameters, with denoting the generalized inverse of , as presented in more detail in section 3.3. Note that these basic penalties can be extended further as shown in more detail in the next subsection.

In analogy to the differences in form in the generic vector of predictors, i.e. , and , the form of the generic vectors of function evaluations and the generic design matrices also differs between predictors and submodels. For ease of notation we drop the subscript for the different terms per predictor in this illustration. For the time-constant survival predictor , we observe a vector of covariates for every subject and stack these in the design matrix of size resulting in the vector of function evaluations . For the time-varying predictors of the survival part, i.e. , denotes the subject covariate vector, including basis evaluations for non-linear effects over time at time , resulting in the design matrix of evaluations of size and the vector of length for each . Finally for predictors in the longitudinal submodel, i.e. , we observe the covariate matrix for every subject at the subject-specific time points, resulting in the stacked design matrix for all subjects at all timepoints with the vector .

To illustrate how different effects are subsumed under this notation by the specification of the respective design matrices, we formulate a standard shared parameter joint model within this framework. Note that we drop the index for predictors which consist of only one term. We specify the log-baseline hazard as a smooth function in time by P-splines with a B-spline basis, , and the penalty matrix with as the -th difference matrix of appropriate dimension (Eilers and Marx, 1996). In the Bayesian setting, using Bayesian P-Splines, smoothing is induced by appropriate prior specification, where the difference penalties are replaced by their stochastic analogues, i.e. random walks (Lang and Brezger, 2004). Here contains the evaluation of the B-spline basis functions at time . As the baseline hazard is not subject-specific, contains stacked replications of . Parametric effects of baseline survival covariates are modeled as , where each row of contains the subject-specific covariate vector and is taken as the zero-matrix . The usual time-constant association between longitudinal and survival model is implemented as , where is a vector of ones of length and . The predictor vector for the longitudinal part with a random intercept in the linear mixed effects model can be specified as , where are the design matrix and coefficient vector of the fixed effects potentially including a parametric effect of time with , and is a basis matrix representation of a random intercept. In more detail is an indicator matrix, where the th column indicates which longitudinal measurements belong to subject , denotes the coefficient vector and an identity matrix as penalty ensures independently. Finally the error variance is modeled as constant using with .

2.2 Important extensions of current models

A special focus in our joint model approach lies on the flexibility of the longitudinal predictor . We model the trajectory for every subject as the sum of fixed covariate effects, a smooth function of time, a random intercept as well as smooth subject-specific deviations from this function over time,

(4)

In this parameterization is a smooth effect of time, constructed like , and is a random intercept as illustrated in the previous sub-section. The term denotes the smooth subject-specific deviations from the global time effect using functional random intercepts (Scheipl et al., 2015). Additionally linear or parametric effects, including a global intercept, as well as further smooth effects of covariates can be represented by an extra term in . The basis for the functional random intercepts can be specified within the basis function approach as row tensor products of the marginal basis of a random intercept, marked by the subscript , and the marginal basis for a smooth effect of time, marked by the subscript . We denote the vector of function evaluations at every observed longitudinal time point in for the corresponding subjects in as

(5)

where is an indicator matrix as the basis for a random intercept as specified for in the previous sub-section, is an matrix of evaluations of a marginal spline basis at and is the basis matrix resulting from the row tensor product. The row tensor product of a matrix and a matrix is defined as the matrix with denoting element-wise multiplication.

The corresponding penalty term is constructed from the marginal penalty matrices:

(6)

where denotes the Kronecker product, is the penalty matrix for the random effect and is an appropriate penalty matrix for the smooth effect of time such as a difference penalty for B-splines. The enlarged penalty matrices and yield a penalization for every subject, resulting in a random effects structure and a smoothness penalization across time for each subject. Note that by specifying two variance parameters, and , the amount of penalization can differ in the direction of time and across subjects, resulting in an anisotropic penalty. This specification allows for a highly flexible modeling of individual trajectories over time.

Given the specification of a separate global intercept and subject-specific random intercepts, the constraints and for every are set in order to ensure identifiability. The necessary linear constraint is implemented for B-splines by transforming the marginal basis into an matrix for which it holds that as shown in Wood (2006, chapter 1.8), and adjusting the penalty accordingly. Transforming the marginal basis and constructing the row tensor product using the transformed basis matrix with correspondingly adjusted marginal penalty ensures that the identification constraint for every is also fulfilled.

As a second extension to existing shared-parameter models we also specify the association between longitudinal and survival model as a structured additive predictor . In consequence, this predictor can be modeled as a function of time and/or other covariates. Motivated by our applied research questions we model as a smooth function of time by using penalized splines, as specified for the baseline hazard. This allows us to find patterns beyond the standard joint model specification to explain the relationship between longitudinal marker and survival process. These patterns could for example be critical time windows in which a non-zero effect of is present or a potential change in the direction of the association over time.

2.3 Further potential specifications

The presented general framework of structured additive joint models allows for a variety of different effect specifications by making use of the flexibility of Bayesian structured additive regression models (Fahrmeir et al., 2004) as well as adding functional extensions (Scheipl et al., 2015). Besides the presented smooth, time-varying, random effects and functional random intercept terms, a variety of further effects can be incorporated. Table 1 gives an overview of possible terms. All these terms can be specified by formulating the desired effect in a basis function representation with an appropriate penalty term.

covariate (subset of ) constant over varying over
no covariate scalar intercept smooth effect of time
scalar covariate linear effect linear effect varying over time
smooth effect smooth effect over time
spatial covariate(s) spatial effect spatial effect over time


grouping variable
random intercept functional random intercept
scalar and grouping variable random slope functional random slope
vector of scalars linear interaction linear interaction over time
varying coefficient
smooth effect
Table 1: Effects that can be specified within a predictor in structured additive joint models; modified from a similar table in Scheipl et al. (2015).

For details on the specification of such effects please refer to Fahrmeir et al. (2004); Scheipl et al. (2015); Wood (2006). Further details on the practical aspects within our implementation are given in section 3.4.

3 Estimation

We estimate the model in a Bayesian framework using Newton-Raphson and Markov chain Monte Carlo (MCMC) algorithms.

3.1 Likelihood

Under the assumption of conditional independence of the survival outcomes and the longitudinal outcome , given the random effects, the likelihood of the specified joint model is the product of the two submodel likelihoods and for the survival and the longitudinal model

where is the vector of all parameters in the model and , , and are the response vectors. The additive predictors implicitely also depend on covariates and model parameters. The log-likelihood of the survival part is

(7)

where is the vector of the cumulative hazard rates
and denotes the vector of the full predictors evaluated at the subject-specific survival times. The log-likelihood of the longitudinal part of the model is

(8)

and are the predictor vectors of length corresponding to the longitudinal response and , where can reflect the error structure of interest. In our case, we assume so that reduces to a diagonal matrix.

3.2 Priors and Posterior

In this general framework above, a variety of terms (cf. Table 1) can be specified using corresponding priors. For linear or parametric terms we use vague normal priors on the vectors of the regression coefficients, e.g. , approximately corresponding to the precision matrices as explained above. Smooth and random effect terms are regularized by placing suitable multivariate normal priors on the coefficients

with precision matrix as specified in the penalty (3). We use independent inverse Gamma hyperpriors to obtain an inverse Gamma full conditional for the variance parameters. In addition to the inverse gamma distribution, different priors are possible for the variance parameters in our implementation, such aus Half-Cauchy and Half-normal distributions. The variance parameters control the trade-off between flexibility and smoothness in the nonlinear modeling of effects. As such they can be interpreted analogous to inverse smoothing parameters in a frequentist approach.

For anisotropic smooths, when multiple variance parameters are involved as in (6), we use the prior

(9)

The resulting posterior of the model is

where are the priors of the vectors of regression parameters and are the priors of the variance parameters for each term and predictor .

3.3 Bayesian Estimation

Point estimates of can be obtained by posterior mode and posterior mean estimation. We estimate the posterior mode by maximizing the log-posterior of the model using a Newton-Raphson procedure, the posterior mean is obtained via derivative-based Metropolis-Hastings sampling and thus computationally demanding. We therefore recommend to use posterior mode estimates for a first quick assessment of the model and in order to obtain starting values for the posterior mean sampling.

In the maximization of the log-posterior to obtain the posterior mode, we update blockwise each term of predictor in each iteration as

with potentially varying steplength and with the score vector and the Hessian , which can be found in Appendix A. We optimize the variance parameters in each updating step to minimize the corrected AIC (AICc, Hurvich et al., 1998), which showed good performance in smoothing parameter estimation in Belitz and Lang (2008). Additionally we optimize the steplength over in each step to maximize the log-posterior. We assume the coefficients to have an approximately normal posterior distribution and derive credibility intervals from for quick approximate inference.

For the posterior mean sampling we construct approximate full conditionals based on a second order Taylor expansion of the log-posterior centered at the last state , similar to Fahrmeir et al. (2004), Klein et al. (2015a) and Klein et al. (2015b). The proposal density from this approximate full conditional is proportional to a multivariate normal distribution with the precision matrix and the mean . In each iteration of the sampler and for updating block a candidate is drawn from the proposal density

and is accepted with the probability

where is the full conditional for the candidate and is the full conditional for the current iterate. By drawing candidates from a close approximation of the full conditional, using the log-posterior centered at the previous state, we approximate a Gibbs sampler and achieve high acceptance rates and good mixing.

For the sampling of the variance parameters Gibbs sampling is employed, as the full conditionals follow an inverse Gamma distribution, if inverse Gamma hyperpriors are used. Slice sampling is employed when no simple closed-form full conditional can be obtained as for example in the sampling of variance parameters for anisotropic smooths (9) or for other hyperpriors.

3.4 Implementation details

The model estimation is implemented within R (R Core Team, 2016) in the package bamlss (Umlauf et al., 2016) that allows the Bayesian estimation of a variety of models within the framework of Bayesian Additive Models for Location, Scale and Shape. The specification of appropriate design matrices and penalties for the desired effects is conducted internally via the R-package mgcv (Wood, 2011). In consequence the full range of implemented smoothing approaches, such as P-splines, thin-splate regression splines, random effects, and Markov Random Fields, can be used within our implementation. We refer to Wood (2006) and Wood et al. (2016) for further information on model terms, bases and penalities. In our model specification in the simulations and the application we make use of Bayesian P-splines (Lang and Brezger, 2004) to model smooth effects. As the integrals in the survival likelihood as well as in the respective scores and Hessians have no analytical solution, they are approximated numerically using the trapezoidal rule and a fixed number of 25 integration points. Starting values for the posterior mean sampling are obtained by estimating the posterior mode of the joint model. The posterior mean sampling is implememented to potentially run in parallel on a number of specified cores on Linux systems. More details can be found in the documentation of the bamlss R-package.

4 Simulation

We assess the estimation of our model by means of a simulation study with focus on two aspects: First, comparing our results with the established joint model implementation in JMbayes (Rizopoulos, 2016a) for models with time-constant . Second, we want to assess the ability to model highly complex longitudinal trajectories as well as a time-varying effect of , the two important new extensions within our framework. With this simulation we also aim to gain insights into the estimation quality of the model when applied to real data sets of T1D cohorts that motivated our methods development. Therefore we simulate two differing data situations, mimicking real cohort data. The first simulated data setting, corresponding to the cohort data presented in the Application Section, has less subjects, at more variably spaced time points but with a longer follow up, than the other. Finally we aim to assess how well the posterior mode estimation can approximate the effects in comparison with the posterior mean estimates.

4.1 Simulation design

For every setting we generate longitudinal measurements for subjects at a fixed grid of time points based on a true longitudinal model as specified in (4) with the time effect , the random intercepts where , the functional random intercepts , and the global intercept and covariate effect and with . We simulate the functional random intercepts flexibly by P-Splines where we draw the true vector of spline-coefficients for all subjects from as in (6) with , and . The hazard for every subject is calculated according to (1) using the true survival predictor functions , , with and varying for the two simulation settings. Based on , survival times are generated for every subject as described in Bender et al. (2005) and Crowther and Lambert (2013). Every subject is censored after and we additionally apply uniform censoring to the survival times. In order to mimic missing measurements in the real data, of the remaining longitudinal data are randomly set to missing after censoring in line with the survival times. Longitudinal obervations are obtained from by adding independent errors for each in .

The influence of different data structures on the estimation is assessed by simulating two different data settings in each of the two simulations settings. In the smaller data setting, , observations for subjects are generated at the measurements points where % of the longitudinal measurements are missing and on average 108 (72 %) events occur, compared to subjects at the time points with % missings and 165 (55 %) events in the larger data setting, .

In each data and simulation setting we draw samples. To ensure convergence, we run the model estimation with 23000 samples, a burn-in of 3000 and a thinning of 20, yielding 1000 samples, as assessed in preliminary simulations. For each estimated model within a simulation setting we assess bias, mean-squared error (MSE) and frequentist coverage of the 95% credibility intervals, defined by the 2.5th and the 97.5th percentiles of the MCMC samples for the posterior mean and the approximate normal intervals for the posterior mode. We evaluate bias, MSE and coverage both averaged over all time points and averaged per time point. For the predictors in the longitudinal model, i.e. , the average bias in each sample is where denotes the estimate. To assess the model fit over time we also evaluate the bias per timepoint for all in . The computations for MSE and coverage are analoguous. For the survival predictors, i.e. , the average bias is using evaluations at the subject’s event times. The bias of the time-varying survival predictor , and for setting 2 also , is additionally evaluated at the fixed grid of time points in as with MSE and coverage computed accordingly. These error measures are then averaged over all samples per setting.

For the comparison with the joint model implementation in JMbayes in settings 1a and 1b, data is generated with as time-constant. In our implementation we model the longitudinal submodel by P-splines with cubic B-splines, a second order difference penalty and 12 knots (4 internal knots), for both the overall mean as well as the individual trajectories. After application of the constraints this yields basis functions. For the time-varying effect of the baseline hazard, , as well as the nonlinear effect in we use 10 knots (2 internal knots) resulting in 5 basis functions per effect after application of the constraints. In order to achieve a comparable model in the package JMbayes we model nonlinear effects in the longitudinal submodel and survival covariate effects by B-splines and determine the number of knots to minimize the DIC in preliminary simulations. Details on the inclusion of nonlinear effects in both submodels can be found in the source code of the ancillary material. As a result we model the longitudinal part by cubic B-splines for both the fixed and random effects with 1 internal knot for the larger data setting and without internal knots for the smaller data setting, resulting in 4 and 3 basis functions for both the fixed and random effects of time, respectively. As prior simulations had shown convergence issues when the covariance matrix of the random effects was estimated as unrestricted, we restrict it to be diagonal, resulting in independent random effects. Also based on DIC from preliminary simulations we specify the effect in in the survival part with cubic B-splines with 3 internal knots using 5 basis functions. We model the baseline hazard with P-splines using the default settings from JMbayes, i.e. a cubic B-spline basis with 17 basis functions and a second order difference penalty. For the MCMC procedure we also use the default settings of 20000 iterations, including a burn-in of 3000 and a thinning such that 2000 samples are kept.

In our second simulation, i.e. settings 2a and 2b, we specify the longitudinal trajectories as before but generate data using a time-varying association predictor for data in and for in order to achieve a similar shape despite a differing time scale. We fit the model using the same specification as in setting 1. Additionally is modeled as a P-spline with 10 knots (2 internal knots) resulting in 5 basis functions after application of the constraints.

4.2 Simulation results

The focus of the first simulation is the comparison with the package JMbayes regarding the accuracy of the modeling of the longitudinal trajectories and the time-constant association parameter in settings 1a and 1b. Table 2 shows the MSE, bias and coverage for the estimation of .

MSE bias coverage
a b a b a b
bamlss

JMbayes

bamlss

JMbayes

bamlss

JMbayes

bamlss

JMbayes

  • No credibilty intervals and thus no coverage could be calculated for these predictors.

Table 2: Posterior mean simulation results from bamlss and results from JMbayes from setting 1 (time-constant ) for small () and large () data sets.

For both methods is estimated more precisely and with a higher coverage in the larger data setting compared to . In both data settings bamlss achieves lower MSE, less bias and a higher coverage in the estimation of the association compared to JMbayes. For JMbayes the coverage for is not satisfactory in both settings (0.840 and 0.890). The further survival predictors, and , are parameterized differently in the two estimation methods with regard to the intercept term and sum-to-zero constraints. Therefore we assess only the prediction quality of . We observe that JMbayes shows a higher bias in the estimation of the sum of these two predictors.

Regarding the longitudinal submodel for both methods are fairly equal regarding the average MSE over the larger data setting (bamlss: 0.028 vs. JMbayes: 0.029), but our approach seems to be more precise in the smaller data setting (bamlss: 0.022 vs. JMbayes: 0.031). To further understand the cause of this difference we look at the bias in the estimation of over the whole observed time course for the smaller data setting. As shown in Figure 4, JMbayes seems to underestimate some nonlinearity of the true predictor. Both methods show higher uncertainty for later time points when, due to censoring and the occurrence of events, less information is available.

Figure 4: Comparison of the bias over time for estimates from bamlss and JMbayes in setting 1a.

Finally, the estimation of the error variance is more precise in bamlss. For the longitudinal predictors we did not achieve to calculate credibility intervals in JMbayes.

There are large runtime differences where JMbayes models took on average 4 minutes and 7 minutes for data setting and , respectively, and the implementation bamlss, due to the more flexible functional random effects specification, took on average 6 hours and 39 hours to run on a single core of a 2.60 GHz Intel Xeon Processor E5-2650. Through parallel computating, e.g. on 10 cores of a Linux system, the run times would reduce to 1.3 and 8.5 hours, respectively.

The aim of the second simulation setting is to shed light on the precision of the estimation of all predictors in the model with a special focus on the estimation of , which is nonlinear in time. Additionally, we also compare the precision of the posterior mode to the posterior mean estimation. Table 3 gives an overview of the estimation precision of all predictors.

MSE bias coverage
a b a b a b
mean
mode
mean
mode
mean
mode
mean
mode
mean
mode
Table 3: Posterior mode and posterior mean simulation results for setting 2 (time-varying ) for small and large data sets.

Similarly to setting 1 we observe an effect of sample size: All survival predictors () show a smaller MSE for data setting compared to probably due to the higher number of events. In contrast, the MSE is smaller for the estimation of in data setting compared to potentially due to the longer follow-up and a slightly higher number of longitudinal observations per subject. Whereas the precision of the point estimates is overall similar or only slightly worse for the posterior mode compared to the posterior mean estimation, the coverage is not acceptable for the posterior mode but close to 95% for the posterior mean. The only exception is the estimation of , where the coverage is somewhat lower for the posterior mean. As is very precisely estimated and formal inference is usually not of interest for this predictor, we do not rate this under-coverage as too problematic.

In order to illustrate the precision in the time-varying effect estimates and to assess the cause of differences in MSE, Figure 5 displays the true and estimated predictors and . Overall the estimated predictors match the true functions quite well. For the smaller data sets there is more uncertainty in the estimation, especially at later time points, when less subjects are still observed.

Figure 5: True (black) and estimated (coloured) predictors from posterior mean und posterior mode estimation for small () and large () data sets in simulation setting 2. Left: ; right: .

With on average only 15 and 22 minutes to run for data setting and respectively, the posterior mode estimation has clear advantages in computation time over the more precise posterior mean estimation with 7 and 43 hours on average in this setting. Again through parallel computating on 10 cores, the time for the posterior mean estimation would reduce to 1.5 and 9.3 hours, respectively.

In conclusion, our simulations show that the estimation of models with constant associations between marker and event performs well, even outperforming the implementation in JMbayes in some aspects. The estimation of more flexible models that are newly covered by our approach in contrast to existing implementations, i.e. with a time-varying assocation parameter and the specification of flexible trajectories, is equally satisfactory. While the more precise posterior mean estimation is time-consuming, the posterior mode offers a computationally efficient way to quickly assess the point estimates in a given model specification, even though credibility bands are only approximate.

5 Application

In order to gain insights into our motivating research question we apply the model to a combined data set of two ongoing German T1D risk cohorts to investigate whether longitudinal trajectories of insulin autoantibodies (IAA) are associated with the rate of progression to T1D. Whereas different autoantibodies are diagnostic for a preclinical stage of the disease, our focus lies on the analysis of the levels of IAA as a marker from the time when it first exceeded a specific threshold, called seroconversion, to the onset of T1D or loss to follow-up. The marker IAA is most often the first autoantibody to appear (Ziegler et al., 1993, 1999; Hummel et al., 2004a). Both its initial value at seroconversion as well as its mean over time have been shown to be positively associated with the emergence of T1D and negatively related to the age at T1D diagnosis (Steck et al., 2011, 2015).

The BABYDIAB and BABYDIET studies, both propective birth cohorts with a joint study protocol, aim to investigate the natural history of T1D development. In these studies children with familial increased risk of T1D were followed from birth to the development of T1D or loss to follow-up for up to 21 years (Ziegler et al., 1993, 1999; Hummel et al., 2004b, 2011). In both studies, autoantibody measurements were taken at age 9 months and 2, 5, 8, 11, 14 and 17 years and additionally every 6 months after positive islet autoantibodies had emerged. The exact age at the emergence of clinical T1D was assessed also between study visits.

In our joint model we use data of children who developed IAA during follow-up of which 69 (54%) progressed to T1D. The subject’s progression times are censored at 15 years after seroconversion due to the extremely low sample size at later time points. In total longitudinal measurements of IAA after seroconversion were used and log-transformed for the analysis. We model subject’s transformed autoantibody levels using functional random intercepts and two further covariates. First, the age at seroconversion is included as a linear effect and second a binary variable indicates whether the autoantibody was among the first autoantibodies to appear. We model the association between marker and event, , to be a non-linear function of time. Further we allow the covariates in the longitudinal model to also influence the survival process directly and expect a positive association between the age at seroconversion and the time to T1D (Steck et al., 2011; Ziegler et al., 2013). In our Bayesian model estimation we sample for 33000 iterations with a burnin of 3000 and thinning of 30 to obtain 1000 samples, with starting values for the posterior mean estimation obtained from the posterior mode estimates. Convergence is assessed by the inspection of traceplots, of which a subset is presented in Appendix B. In order to assess the sensitivity of the results to the number of knots we specify three models with differing numbers of knots. We specify two models using either 12 (i.e. 4 internal) knots or 20 (i.e. 8 internal) knots for the overall mean as well as the individual trajectories in the functional random intercepts and 10 (i.e. 2 internal) knots in the survival submodel. Additionally we specify a model with 20 (i.e. 8 internal) knots for nonlinear terms in both, the longitudinal and the survival submodels.

The results from the three specified models in our sensitivity analysis are highly similar for all predictors with regard to mean estimates and the credibility intervals. However we observe lower DIC for the models with more knots in the functional random intercepts along with a closer fit of the individual trajectories and more narrow credibility intervals for the estimated association (cf. Figure A3 in Appendix B). Using more knots in the survival submodel results in a better mixing in the traceplots but a slightly higher DIC. Hence we assume the results to be robust regarding the exact number of knots and present results of the model with the lowest DIC in the following.

As shown in Figure (b)b for 5 randomly selected subjects, we are able to closely approximate the individual non-linear trajectories of IAA. The association between the marker and the onset of clinical T1D is estimated as stable over time with an average slope of -0.01 [95% credibility interval: -0.08, 0.06].

Figure 6: Estimated posterior mean of with 95% pointwise credibility bands (shaded area), observed event times (rugs bottom) and censoring times (rugs top), and number of subjects at risk per time point (bottom).

The average slope was defined as the mean over the first derivative of the association evaluated at all observed event and follow-up times , and its posterior distribution can be easily obtained by numerically deriving in every sample. The credibility interval for the estimated association is above 0 from 0.5 to 6.5 years after seroconversion and there is more uncertainty when less information is available, i.e. when less event and follow-up times are observed and when less subjects remain in the risk set, as indicated by the credibility intervals (Figure 6). In the longitudinal submodel we observe that trajectories have a lower level, if subjects seroconverted at an older age (in years, ; 95% credibility interval: [-0.14, -0.01]) and a higher level if IAA was amongst the first markers to appear (; [0.24, 1.55]). In the survival submodel the log-hazard is decreased if IAA was amongst the first markers to appear (; [-1.69, -0.14]). In sum if IAA is amongst the first markers to appear the log-hazard is reduced by 0.71. This net effect can be derived as the sum of the direct effect in and the indirect effect in with an average association of . Additionally we do not observe a direct effect of the age at seroconversion (; [-0.20, 0.01]).

In line with previous findings (Steck et al., 2011, 2015) these results indicate that the quantitative levels of the marker IAA are informative for the rate of progression to T1D in the first years after seroconversion with higher levels increasing the hazard of T1D. The direct relationship between the hazard and the baseline covariate age at seroconversion is not supported by the model, suggesting that the previously established influence of this covariate on T1D progression may be mediated by the marker levels, i.e. the effect in the respective log-hazard is reduced if the marker levels over time are taken into account as in our flexible parameterization. We do not observe a time-varying association between IAA and the hazard of T1D over time. There is much uncertainty around the nonlinear time-varying estimate of the association , potentially as a result of the flexibility in the estimation in combination with the amount of data in the survival part.

6 Discussion and Outlook

We presented a flexible joint model that allows to fit a broad range of joint model specifications using structured additive predictors for all model components. The approach is fully implemented in the R-package bamlss. While the framework is very flexible, as illustrated by Table 1, the focus in this work lies on the flexible modeling of individual trajectories and the specification of a time-varying association between marker and event.

The proposed model shows satisfactory performance in various simulation settings and has the potential to offer new insights into complex relationships between biomarkers and time-to-event processes. Our methods development was motivated by a specific research question from T1D studies and two corresponding data sets. We saw that even by combining the two cohorts, the sample size of the data set considered in the application in Section 5 is at the lower limit for the complexity of our model, as indicated by our simulation study and by the width of the credibility intervals in the applied results. Nevertheless we found a positive association between a disease-related biomarker and the occurrence of clinical T1D. Although our model allows for a time-varying association between the biomarker and the event process at least in this small data set it was estimated to be roughly constant. In consequence our flexible model can also be used to check the modeling assumptions of simpler models that are commonly used. We aim to further explore the relationship between T1D-specific autoantibodies and the progression to T1D in a larger data set from a different, multinational T1D cohort (with sample size exceeding data setting in our simulations) as in Steck et al. (2015).

Due to the complexity of the model and its estimation, the computation speed is still a drawback in our implementation. Hence we are constantly working on speeding up the computations further. As shown in simulation 2, the posterior mode estimation offers a computationally efficient way to obtain point estimates from a flexible joint model before starting the full MCMC sampling. These posterior mode estimates show a precision similar to that of the posterior mean estimates. However, the credibility intervals obtained from posterior modes are not wide enough, potentially due to the fact that the uncertainty around the variance parameters is not included in the credibility intervals. In consequence, only the credibility intervals of the posterior mean estimates should be used for inference.

As is well known in the survival context, the number of potential parameters in the model is limited by the number of observed events (Harrell et al., 1996). This also holds in our approach for the predictors in the survival part of the model, , , and . We achieve to alleviate this issue to some extent by the penalized approach, which decreases the effective number of degrees of freedom and thus allows for a richer model than would be possible without a penalty. Still, we recommend to model only those effects as non-linear functions, where a strong indication for non-linearity is given.

Within the framework of the presented additive joint model several further extensions are possible. As a next step we aim to extend the model by including the derivative of the longitudinal trajectories to model the event process similar to Ye et al. (2008), Brown (2009) and Rizopoulos et al. (2014), allowing to model the potentially time-varying association between changes in the marker and the hazard. Further, functional historical effects of the trajectories, including information on the history of the marker (Malfait and Ramsay, 2003; Gellar et al., 2014), could potentially offer additional insights into complex relationships between markers and event processes.

{acknowledgement}

We thank Lorenz Lachmann, Claudia Matzke, Joanna Stock, Stephanie Krause, Annette Knopff, Florian Haupt, Maren Pflüger, Marlon Scholz and Anita Gavrisan (all: Institute for Diabetes Research, Helmholtz Zentrum München) for data collection and expert technical assistance, Ramona Puff (Institute for Diabetes Research, Helmholtz Zentrum München) for laboratory management, and Peter Achenbach (Institute for Diabetes Research, Helmholtz Zentrum München) and Ezio Bonifacio (Center for Regenerative Therapies Dresden and Paul Langerhans Institute Dresden, Technische Universität Dresden) for overseeing antibody measurement and for fruitful discussions and advice on modeling T1D-specific autoantibodies. We also thank all pediatricians and family doctors in Germany for participating in the BABYDIAB Study. Furthermore we thank Fabian Scheipl (Ludwig-Maximilians-Universität München) for advice on modeling functional random intercepts. The work was supported by the JDRF (JDRF-2-SRA-2015-13-Q-R) and by grants from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.). Meike Köhler’s work was supported by a grant from the Helmholtz International Research Group (HIRG-0018) and Sonja Greven acknowledges funding from the German research foundation (DFG) through Emmy Noether grant GR 3793/1-1.

Conflict of Interest

The authors have declared no conflict of interest.

Appendix A

We derive score vectors and Hessians for the regression coefficients of every predictor. We introduce some further notation to formulate these derivatives. For the time-varying predictors of the survival part the design matrix denotes the matrix of evaluations at the vector of survival times . For the time-varying predictors of the longitudinal part the design matrix contains the evaluations at all observed subject-specific timepoints . Let denote the log-likelihood, i.e. the sum of the contributions of the longitudinal and survival submodels defined in (7) and (8). In more detail, the full likelihood is

Score Vectors

Hessian