[
Abstract
The joint modeling of longitudinal and timetoevent data is an important tool of growing popularity to gain insights into the association between a biomarker and an event process. We develop a general framework of flexible additive joint models that allows the specification of a variety of effects, such as smooth nonlinear, timevarying and random effects, in the longitudinal and survival parts of the models. Our extensions are motivated by the investigation of the relationship between fluctuating diseasespecific markers, in this case autoantibodies, and the progression to the autoimmune disease type 1 diabetes. By making use of Bayesian Psplines we are in particular able to capture highly nonlinear subjectspecific marker trajectories as well as a timevarying association between the marker and the event process allowing new insights into disease progression. The model is estimated within a Bayesian framework and implemented in the Rpackage bamlss.
1
Flexible additive joint models]Flexible Bayesian additive joint models with an application to type 1 diabetes research Meike Köhler et al.]Meike Köhler ^{1}^{1}1Corresponding author: email: meike.koehler@helmholtzmuenchen.de, Phone: +49(0)8930682917, Fax: +49(0)8931873144 ]Nikolaus Umlauf ]Andreas Beyerlein ]Christiane Winkler ]AnetteGabriele Ziegler ]Sonja Greven
1 Introduction
The joint modeling of longitudinal biomarkers and the time to disease onset or death offers unique insights into disease progression in various medical domains (Taylor et al., 2013; Gras et al., 2013; Daher Abdi et al., 2013). Depending on the disease and the respective biomarker different challenges have to be faced in joint modeling. In the following, a general framework for the flexible joint modeling of longitudinal data and timetoevent is presented, which was motivated by unique cohort data from studies exploring the development of type 1 diabetes (T1D). The research on T1D underwent a paradigm shift in the past decade, when diseasespecific autoantibodies where shown to be diagnostic for the disease before the onset of clinical symptoms and thus paving the way for a preclinical diagnosis of T1D (Ziegler et al., 2013; Bonifacio, 2015; Insel et al., 2015). Prior to the onset of clinical symptoms, i.e. the need of insulin substitution, the disease is already progressing and insulinproducing betacells in the pancreas are gradually destroyed by the body’s own immune system. This immune process, leading to an onset of clinical symptoms within months up to a decade, can be diagnosed by the emergence of T1Dspecific autoantibodies. However, it remains an open question whether the longitudinal patterns of these autoantibodies might be associated with the rate of progression to T1D.
In recent years joint models gained larger popularity in the modeling of associations between timevarying biomarkers and timetoevent. By estimating a submodel for a longitudinal biomarker, usually a mixed model, jointly with the survival submodel of a timetoevent process, one can account for the informative censoring and the withinsubject errors in the longitudinal model and can incorporate the longitudinal information, observed only at personspecific discrete timepoints, as a continuoustime covariate in the survival model. Comprehensive overviews on the topic are given in Tsiatis and Davidian (2004), Rizopoulos (2012) and Gould et al. (2015). In our work we focus on extensions of socalled shared parameter models. These assume that a set of parameters influences both the longitudinal and the survival model, and that there is conditional independence given those parameters.
In T1D research little is known concerning typical trajectories of autoantibodies as biomarkers. At the same time the observed trajectories show highly nonlinear patterns over time and differ strongly between subjects, see Figure (a)a. In consequence, a flexible specification of individual trajectories in the longitudinal model is needed in our application.
Much work on joint models has focused on simple parametric longitudinal trajectories, while only few approaches allow for more flexible, potentially nonparametric longitudinal models. Ding and Wang (2008) model mean trajectories by Bsplines and allow for one multiplicative random effect per subject. For our application however it remains questionable if such a model is flexible enough to capture the highly different trajectories. Spline based approaches, that allow also the random effects to be nonlinear functions in time, are mentioned by Song and Wang (2008) and were employed by Rizopoulos and Ghosh (2011) and Rizopoulos et al. (2014) as well as Brown et al. (2005) and Brown (2009). While allowing for flexibility, a disadvantage of all these approaches is finding an optimal number of knots to specify the flexible longitudinal model, e.g. by AIC or DIC. As the number of random effects increases with the number of knots, this number is limited in practice. We aim to avoid the explicit choice of knots and number of basis functions by using a penalized spline approach, where a larger number of knots is specified and smoothness penalties are employed (Lang and Brezger, 2004). Tang and Tang (2015) also make use of PSplines in modeling longitudinal trajectories, but do so only in estimating the mean function, whereas we model also the individual trajectories as smooth functions of time. This is similar in spirit to the specification of individual trajectories in Jiang et al. (2015), however we do not assume an underlying class membership for the random effects.
The estimation of joint models with complex subjectspecific trajectories poses a challenge to frequentist estimation approaches due to the necessary integration over potentially highdimensional random effects distributions. Due to this drawback and further advantages of the Bayesian approach in joint modeling, such as straightforward model assessment and the potential integration of previous knowledge via priors (Gould et al., 2015), many complex joint models, like e.g. the aforementioned models, are specified within a Bayesian framework. The most widely used sampling approach for the parameter distributions in Bayesian joint models is Gibbs Sampling, e.g. Faucett and Thomas (1996); Guo and Carlin (2004); Brown and Ibrahim (2003), also in conjunction with MetropolisHastings algorithms (Tang and Tang, 2015). In addition, the well established Rpackage JMbayes (Rizopoulos, 2016a, b) implementing Rizopoulos et al. (2014) employs a random walk MetropolisHastings algorithm. Our Bayesian estimation approach is different as we employ a derivativebased MetropolisHastings algorithm, where we draw samples from approximations of the full conditionals using score vectors and Hessians of the parameters. Despite being computationally demanding this algorithm shows a high stability in the model estimation, as we also show in our simulations.
In addition to the need for a flexible longitudinal model, a further generalization of existing joint models seems necessary in our application, namely a timevarying association between the biomarker and the timetoevent. Here, the biomarker indicates an ongoing immune process eventually leading to the destruction of the insulinproducing beta cells. As the activity of the immune system is constantly regulated, it is plausible that the association between a biomarker and the hazard of T1D varies over time. For example a recent paper by Meyer et al. (2016) indicated that patients with an autoimmune disease can also present unique diseaseameliorating autoantibodies. Such a timevarying association has rarely been studied in the context of joint models. Using a discretized timescale and a probit model for the discrete hazard function, Barrett et al. (2015) allow for the association to vary over the discrete time points in their model. However this flexible specification is not considered in their simulations, the applied examples or the code provided to fit the models. A timevarying coefficient to associate the marker and the event process is the focus of the conditional score estimation approach in Song and Wang (2008). This approach can be seen as a weighted local partial likelihood without any assumptions on the distribution of the random effects. While this approach accounts for measurement error and shortterm biological fluctuations in the longitudinal marker when modeling the hazard, it only permits inference on the survival parameters and not on the longitudinal model.
In order to allow for these two extensions, the flexible longitudinal trajectories and a potentially nonlinear timevarying association, both modeled by penalized splines, we develop and implement a highly flexible framework for joint models available within the Rpackage bamlss. As we represent all parts of this flexible joint model as structured additive predictors, which can include linear, parametric but also nonparametric penalized terms, we are able to allow potentially nonlinear, smooth, random, and timevarying effects in both submodels. In consequence the possibilities of this implementation go way beyond the two extensions that originally triggered the development. By applying this flexible model to the combined data set from two German highrisk T1D birth cohorts we aim to shed further light on the complex relationship between T1Dassociated autoantibodies and the onset of clinical disease.
The remainder of this paper is structured as follows: The general model structure and potential extensions are outlined in Section 2. In Section 3, details on the Bayesian estimation procedure are given. A thorough testing of the model estimation through simulations is presented in Section 4 and the application to our T1D research question in Section 5. Concluding remarks are given in Section 6 and technical details as well as additional figures can be found in the Appendix. The presented model is implemented in the Rpackage bamlss (Umlauf et al., 2016). Source code to reproduce the simulation results is available in the ancillary material.
2 Methods
In the following, the general setup for additive joint models is presented with a special focus on two extensions in the present work compared to existing approaches: the flexible specification of longitudinal trajectories as well as the timevarying association between longitudinal marker and event. An overview of potential further model specifications illustrates the flexibility of the presented model family.
2.1 General Setup
For every subject we observe a potentially rightcensored followup time and the event indicator (1 if subject experiences the event, 0 if it is censored). We model the hazard of an event at time as
(1) 
including in the full predictor a predictor for all survival covariates that are timevarying or have a timevarying coefficient including the log baseline hazard, a predictor for baseline survival covariates as well as a predictor representing the potentially timevarying association between the longitudinal marker and the hazard.
We also observe a longitudinal response at the potentially subjectspecific ordered time points with . denotes the vector of the longitudinal measurement time points of all subjects. The longitudinal response at with is modeled as
(2) 
with independent errors allowing to also model the error variance. Thus represents the longitudinally observed marker value without error at timepoint . This “true” marker value serves as a continuoustime covariate in the hazard in (1) and links the two model equations.
Each predictor with is a structured additive predictor, i.e. a sum of functions of covariates ,
Different subsets of can serve as covariates for the different predictors, with each typically depending on one or two covariates. For timevarying predictors the functions can also dependent on time with a potentially timevarying covariate vector . We express the vector of predictors for all subjects as . These vectors are of length for the survival part of the model (1), where for denotes that predictors are evaluated at time . In the longitudinal part of the model (2) the vector for is of length , containing entries for all , , i.e. evaluations at all observed time points for the corresponding subjects.
The functions can model a variety of effects, such as smooth, spatial, timevarying or random effects terms which can be expressed in a straightforward notation for every term of predictor by using suitable basis function expansions and corresponding penalties . In a generic setup we let
(3) 
with the vector of function evaluations stacked for each subject, the design matrix , the coefficient vector , the penalty matrix and the variance parameter that controls the amount of penalization of the respective term. In the Bayesian setting a penalization is imposed by specifying an appropriate prior distribution for the parameters, with denoting the generalized inverse of , as presented in more detail in section 3.3. Note that these basic penalties can be extended further as shown in more detail in the next subsection.
In analogy to the differences in form in the generic vector of predictors, i.e. , and , the form of the generic vectors of function evaluations and the generic design matrices also differs between predictors and submodels. For ease of notation we drop the subscript for the different terms per predictor in this illustration. For the timeconstant survival predictor , we observe a vector of covariates for every subject and stack these in the design matrix of size resulting in the vector of function evaluations . For the timevarying predictors of the survival part, i.e. , denotes the subject covariate vector, including basis evaluations for nonlinear effects over time at time , resulting in the design matrix of evaluations of size and the vector of length for each . Finally for predictors in the longitudinal submodel, i.e. , we observe the covariate matrix for every subject at the subjectspecific time points, resulting in the stacked design matrix for all subjects at all timepoints with the vector .
To illustrate how different effects are subsumed under this notation by the specification of the respective design matrices, we formulate a standard shared parameter joint model within this framework. Note that we drop the index for predictors which consist of only one term. We specify the logbaseline hazard as a smooth function in time by Psplines with a Bspline basis, , and the penalty matrix with as the th difference matrix of appropriate dimension (Eilers and Marx, 1996). In the Bayesian setting, using Bayesian PSplines, smoothing is induced by appropriate prior specification, where the difference penalties are replaced by their stochastic analogues, i.e. random walks (Lang and Brezger, 2004). Here contains the evaluation of the Bspline basis functions at time . As the baseline hazard is not subjectspecific, contains stacked replications of . Parametric effects of baseline survival covariates are modeled as , where each row of contains the subjectspecific covariate vector and is taken as the zeromatrix . The usual timeconstant association between longitudinal and survival model is implemented as , where is a vector of ones of length and . The predictor vector for the longitudinal part with a random intercept in the linear mixed effects model can be specified as , where are the design matrix and coefficient vector of the fixed effects potentially including a parametric effect of time with , and is a basis matrix representation of a random intercept. In more detail is an indicator matrix, where the th column indicates which longitudinal measurements belong to subject , denotes the coefficient vector and an identity matrix as penalty ensures independently. Finally the error variance is modeled as constant using with .
2.2 Important extensions of current models
A special focus in our joint model approach lies on the flexibility of the longitudinal predictor . We model the trajectory for every subject as the sum of fixed covariate effects, a smooth function of time, a random intercept as well as smooth subjectspecific deviations from this function over time,
(4) 
In this parameterization is a smooth effect of time, constructed like , and is a random intercept as illustrated in the previous subsection. The term denotes the smooth subjectspecific deviations from the global time effect using functional random intercepts (Scheipl et al., 2015). Additionally linear or parametric effects, including a global intercept, as well as further smooth effects of covariates can be represented by an extra term in . The basis for the functional random intercepts can be specified within the basis function approach as row tensor products of the marginal basis of a random intercept, marked by the subscript , and the marginal basis for a smooth effect of time, marked by the subscript . We denote the vector of function evaluations at every observed longitudinal time point in for the corresponding subjects in as
(5) 
where is an indicator matrix as the basis for a random intercept as specified for in the previous subsection, is an matrix of evaluations of a marginal spline basis at and is the basis matrix resulting from the row tensor product. The row tensor product of a matrix and a matrix is defined as the matrix with denoting elementwise multiplication.
The corresponding penalty term is constructed from the marginal penalty matrices:
(6) 
where denotes the Kronecker product, is the penalty matrix for the random effect and is an appropriate penalty matrix for the smooth effect of time such as a difference penalty for Bsplines. The enlarged penalty matrices and yield a penalization for every subject, resulting in a random effects structure and a smoothness penalization across time for each subject. Note that by specifying two variance parameters, and , the amount of penalization can differ in the direction of time and across subjects, resulting in an anisotropic penalty. This specification allows for a highly flexible modeling of individual trajectories over time.
Given the specification of a separate global intercept and subjectspecific random intercepts, the constraints and for every are set in order to ensure identifiability. The necessary linear constraint is implemented for Bsplines by transforming the marginal basis into an matrix for which it holds that as shown in Wood (2006, chapter 1.8), and adjusting the penalty accordingly. Transforming the marginal basis and constructing the row tensor product using the transformed basis matrix with correspondingly adjusted marginal penalty ensures that the identification constraint for every is also fulfilled.
As a second extension to existing sharedparameter models we also specify the association between longitudinal and survival model as a structured additive predictor . In consequence, this predictor can be modeled as a function of time and/or other covariates. Motivated by our applied research questions we model as a smooth function of time by using penalized splines, as specified for the baseline hazard. This allows us to find patterns beyond the standard joint model specification to explain the relationship between longitudinal marker and survival process. These patterns could for example be critical time windows in which a nonzero effect of is present or a potential change in the direction of the association over time.
2.3 Further potential specifications
The presented general framework of structured additive joint models allows for a variety of different effect specifications by making use of the flexibility of Bayesian structured additive regression models (Fahrmeir et al., 2004) as well as adding functional extensions (Scheipl et al., 2015). Besides the presented smooth, timevarying, random effects and functional random intercept terms, a variety of further effects can be incorporated. Table 1 gives an overview of possible terms. All these terms can be specified by formulating the desired effect in a basis function representation with an appropriate penalty term.
covariate (subset of )  constant over  varying over 

no covariate  scalar intercept  smooth effect of time 
scalar covariate  linear effect  linear effect varying over time 
smooth effect  smooth effect over time  
spatial covariate(s)  spatial effect  spatial effect over time 
grouping variable 
random intercept  functional random intercept 
scalar and grouping variable  random slope  functional random slope 
vector of scalars  linear interaction  linear interaction over time 
varying coefficient  
smooth effect 
3 Estimation
We estimate the model in a Bayesian framework using NewtonRaphson and Markov chain Monte Carlo (MCMC) algorithms.
3.1 Likelihood
Under the assumption of conditional independence of the survival outcomes and the longitudinal outcome , given the random effects, the likelihood of the specified joint model is the product of the two submodel likelihoods and for the survival and the longitudinal model
where is the vector of all parameters in the model and , , and are the response vectors. The additive predictors implicitely also depend on covariates and model parameters. The loglikelihood of the survival part is
(7) 
where is the vector of the cumulative hazard rates
and denotes the vector of the full predictors evaluated at the subjectspecific survival times. The loglikelihood of the longitudinal part of the model is
(8) 
and are the predictor vectors of length corresponding to the longitudinal response and , where can reflect the error structure of interest. In our case, we assume so that reduces to a diagonal matrix.
3.2 Priors and Posterior
In this general framework above, a variety of terms (cf. Table 1) can be specified using corresponding priors. For linear or parametric terms we use vague normal priors on the vectors of the regression coefficients, e.g. , approximately corresponding to the precision matrices as explained above. Smooth and random effect terms are regularized by placing suitable multivariate normal priors on the coefficients
with precision matrix as specified in the penalty (3). We use independent inverse Gamma hyperpriors to obtain an inverse Gamma full conditional for the variance parameters. In addition to the inverse gamma distribution, different priors are possible for the variance parameters in our implementation, such aus HalfCauchy and Halfnormal distributions. The variance parameters control the tradeoff between flexibility and smoothness in the nonlinear modeling of effects. As such they can be interpreted analogous to inverse smoothing parameters in a frequentist approach.
For anisotropic smooths, when multiple variance parameters are involved as in (6), we use the prior
(9) 
The resulting posterior of the model is
where are the priors of the vectors of regression parameters and are the priors of the variance parameters for each term and predictor .
3.3 Bayesian Estimation
Point estimates of can be obtained by posterior mode and posterior mean estimation. We estimate the posterior mode by maximizing the logposterior of the model using a NewtonRaphson procedure, the posterior mean is obtained via derivativebased MetropolisHastings sampling and thus computationally demanding. We therefore recommend to use posterior mode estimates for a first quick assessment of the model and in order to obtain starting values for the posterior mean sampling.
In the maximization of the logposterior to obtain the posterior mode, we update blockwise each term of predictor in each iteration as
with potentially varying steplength and with the score vector and the Hessian , which can be found in Appendix A. We optimize the variance parameters in each updating step to minimize the corrected AIC (AICc, Hurvich et al., 1998), which showed good performance in smoothing parameter estimation in Belitz and Lang (2008). Additionally we optimize the steplength over in each step to maximize the logposterior. We assume the coefficients to have an approximately normal posterior distribution and derive credibility intervals from for quick approximate inference.
For the posterior mean sampling we construct approximate full conditionals based on a second order Taylor expansion of the logposterior centered at the last state , similar to Fahrmeir et al. (2004), Klein et al. (2015a) and Klein et al. (2015b). The proposal density from this approximate full conditional is proportional to a multivariate normal distribution with the precision matrix and the mean . In each iteration of the sampler and for updating block a candidate is drawn from the proposal density
and is accepted with the probability
where is the full conditional for the candidate and is the full conditional for the current iterate. By drawing candidates from a close approximation of the full conditional, using the logposterior centered at the previous state, we approximate a Gibbs sampler and achieve high acceptance rates and good mixing.
For the sampling of the variance parameters Gibbs sampling is employed, as the full conditionals follow an inverse Gamma distribution, if inverse Gamma hyperpriors are used. Slice sampling is employed when no simple closedform full conditional can be obtained as for example in the sampling of variance parameters for anisotropic smooths (9) or for other hyperpriors.
3.4 Implementation details
The model estimation is implemented within R (R Core Team, 2016) in the package bamlss (Umlauf et al., 2016) that allows the Bayesian estimation of a variety of models within the framework of Bayesian Additive Models for Location, Scale and Shape. The specification of appropriate design matrices and penalties for the desired effects is conducted internally via the Rpackage mgcv (Wood, 2011). In consequence the full range of implemented smoothing approaches, such as Psplines, thinsplate regression splines, random effects, and Markov Random Fields, can be used within our implementation. We refer to Wood (2006) and Wood et al. (2016) for further information on model terms, bases and penalities. In our model specification in the simulations and the application we make use of Bayesian Psplines (Lang and Brezger, 2004) to model smooth effects. As the integrals in the survival likelihood as well as in the respective scores and Hessians have no analytical solution, they are approximated numerically using the trapezoidal rule and a fixed number of 25 integration points. Starting values for the posterior mean sampling are obtained by estimating the posterior mode of the joint model. The posterior mean sampling is implememented to potentially run in parallel on a number of specified cores on Linux systems. More details can be found in the documentation of the bamlss Rpackage.
4 Simulation
We assess the estimation of our model by means of a simulation study with focus on two aspects: First, comparing our results with the established joint model implementation in JMbayes (Rizopoulos, 2016a) for models with timeconstant . Second, we want to assess the ability to model highly complex longitudinal trajectories as well as a timevarying effect of , the two important new extensions within our framework. With this simulation we also aim to gain insights into the estimation quality of the model when applied to real data sets of T1D cohorts that motivated our methods development. Therefore we simulate two differing data situations, mimicking real cohort data. The first simulated data setting, corresponding to the cohort data presented in the Application Section, has less subjects, at more variably spaced time points but with a longer follow up, than the other. Finally we aim to assess how well the posterior mode estimation can approximate the effects in comparison with the posterior mean estimates.
4.1 Simulation design
For every setting we generate longitudinal measurements for subjects at a fixed grid of time points based on a true longitudinal model as specified in (4) with the time effect , the random intercepts where , the functional random intercepts , and the global intercept and covariate effect and with . We simulate the functional random intercepts flexibly by PSplines where we draw the true vector of splinecoefficients for all subjects from as in (6) with , and . The hazard for every subject is calculated according to (1) using the true survival predictor functions , , with and varying for the two simulation settings. Based on , survival times are generated for every subject as described in Bender et al. (2005) and Crowther and Lambert (2013). Every subject is censored after and we additionally apply uniform censoring to the survival times. In order to mimic missing measurements in the real data, of the remaining longitudinal data are randomly set to missing after censoring in line with the survival times. Longitudinal obervations are obtained from by adding independent errors for each in .
The influence of different data structures on the estimation is assessed by simulating two different data settings in each of the two simulations settings. In the smaller data setting, , observations for subjects are generated at the measurements points where % of the longitudinal measurements are missing and on average 108 (72 %) events occur, compared to subjects at the time points with % missings and 165 (55 %) events in the larger data setting, .
In each data and simulation setting we draw samples. To ensure convergence, we run the model estimation with 23000 samples, a burnin of 3000 and a thinning of 20, yielding 1000 samples, as assessed in preliminary simulations. For each estimated model within a simulation setting we assess bias, meansquared error (MSE) and frequentist coverage of the 95% credibility intervals, defined by the 2.5th and the 97.5th percentiles of the MCMC samples for the posterior mean and the approximate normal intervals for the posterior mode. We evaluate bias, MSE and coverage both averaged over all time points and averaged per time point. For the predictors in the longitudinal model, i.e. , the average bias in each sample is where denotes the estimate. To assess the model fit over time we also evaluate the bias per timepoint for all in . The computations for MSE and coverage are analoguous. For the survival predictors, i.e. , the average bias is using evaluations at the subject’s event times. The bias of the timevarying survival predictor , and for setting 2 also , is additionally evaluated at the fixed grid of time points in as with MSE and coverage computed accordingly. These error measures are then averaged over all samples per setting.
For the comparison with the joint model implementation in JMbayes in settings 1a and 1b, data is generated with as timeconstant. In our implementation we model the longitudinal submodel by Psplines with cubic Bsplines, a second order difference penalty and 12 knots (4 internal knots), for both the overall mean as well as the individual trajectories. After application of the constraints this yields basis functions. For the timevarying effect of the baseline hazard, , as well as the nonlinear effect in we use 10 knots (2 internal knots) resulting in 5 basis functions per effect after application of the constraints. In order to achieve a comparable model in the package JMbayes we model nonlinear effects in the longitudinal submodel and survival covariate effects by Bsplines and determine the number of knots to minimize the DIC in preliminary simulations. Details on the inclusion of nonlinear effects in both submodels can be found in the source code of the ancillary material. As a result we model the longitudinal part by cubic Bsplines for both the fixed and random effects with 1 internal knot for the larger data setting and without internal knots for the smaller data setting, resulting in 4 and 3 basis functions for both the fixed and random effects of time, respectively. As prior simulations had shown convergence issues when the covariance matrix of the random effects was estimated as unrestricted, we restrict it to be diagonal, resulting in independent random effects. Also based on DIC from preliminary simulations we specify the effect in in the survival part with cubic Bsplines with 3 internal knots using 5 basis functions. We model the baseline hazard with Psplines using the default settings from JMbayes, i.e. a cubic Bspline basis with 17 basis functions and a second order difference penalty. For the MCMC procedure we also use the default settings of 20000 iterations, including a burnin of 3000 and a thinning such that 2000 samples are kept.
In our second simulation, i.e. settings 2a and 2b, we specify the longitudinal trajectories as before but generate data using a timevarying association predictor for data in and for in order to achieve a similar shape despite a differing time scale. We fit the model using the same specification as in setting 1. Additionally is modeled as a Pspline with 10 knots (2 internal knots) resulting in 5 basis functions after application of the constraints.
4.2 Simulation results
The focus of the first simulation is the comparison with the package JMbayes regarding the accuracy of the modeling of the longitudinal trajectories and the timeconstant association parameter in settings 1a and 1b. Table 2 shows the MSE, bias and coverage for the estimation of .
MSE  bias  coverage  

a  b  a  b  a  b  
bamlss  

JMbayes  

bamlss  

JMbayes  

bamlss  

JMbayes  

bamlss  

JMbayes  


No credibilty intervals and thus no coverage could be calculated for these predictors.
For both methods is estimated more precisely and with a higher coverage in the larger data setting compared to . In both data settings bamlss achieves lower MSE, less bias and a higher coverage in the estimation of the association compared to JMbayes. For JMbayes the coverage for is not satisfactory in both settings (0.840 and 0.890). The further survival predictors, and , are parameterized differently in the two estimation methods with regard to the intercept term and sumtozero constraints. Therefore we assess only the prediction quality of . We observe that JMbayes shows a higher bias in the estimation of the sum of these two predictors.
Regarding the longitudinal submodel for both methods are fairly equal regarding the average MSE over the larger data setting (bamlss: 0.028 vs. JMbayes: 0.029), but our approach seems to be more precise in the smaller data setting (bamlss: 0.022 vs. JMbayes: 0.031). To further understand the cause of this difference we look at the bias in the estimation of over the whole observed time course for the smaller data setting. As shown in Figure 4, JMbayes seems to underestimate some nonlinearity of the true predictor. Both methods show higher uncertainty for later time points when, due to censoring and the occurrence of events, less information is available.
Finally, the estimation of the error variance is more precise in bamlss. For the longitudinal predictors we did not achieve to calculate credibility intervals in JMbayes.
There are large runtime differences where JMbayes models took on average 4 minutes and 7 minutes for data setting and , respectively, and the implementation bamlss, due to the more flexible functional random effects specification, took on average 6 hours and 39 hours to run on a single core of a 2.60 GHz Intel Xeon Processor E52650. Through parallel computating, e.g. on 10 cores of a Linux system, the run times would reduce to 1.3 and 8.5 hours, respectively.
The aim of the second simulation setting is to shed light on the precision of the estimation of all predictors in the model with a special focus on the estimation of , which is nonlinear in time. Additionally, we also compare the precision of the posterior mode to the posterior mean estimation. Table 3 gives an overview of the estimation precision of all predictors.
MSE  bias  coverage  

a  b  a  b  a  b  
mean  
mode  
mean  
mode  
mean  
mode  
mean  
mode  
mean  
mode 
Similarly to setting 1 we observe an effect of sample size: All survival predictors () show a smaller MSE for data setting compared to probably due to the higher number of events. In contrast, the MSE is smaller for the estimation of in data setting compared to potentially due to the longer followup and a slightly higher number of longitudinal observations per subject. Whereas the precision of the point estimates is overall similar or only slightly worse for the posterior mode compared to the posterior mean estimation, the coverage is not acceptable for the posterior mode but close to 95% for the posterior mean. The only exception is the estimation of , where the coverage is somewhat lower for the posterior mean. As is very precisely estimated and formal inference is usually not of interest for this predictor, we do not rate this undercoverage as too problematic.
In order to illustrate the precision in the timevarying effect estimates and to assess the cause of differences in MSE, Figure 5 displays the true and estimated predictors and . Overall the estimated predictors match the true functions quite well. For the smaller data sets there is more uncertainty in the estimation, especially at later time points, when less subjects are still observed.
With on average only 15 and 22 minutes to run for data setting and respectively, the posterior mode estimation has clear advantages in computation time over the more precise posterior mean estimation with 7 and 43 hours on average in this setting. Again through parallel computating on 10 cores, the time for the posterior mean estimation would reduce to 1.5 and 9.3 hours, respectively.
In conclusion, our simulations show that the estimation of models with constant associations between marker and event performs well, even outperforming the implementation in JMbayes in some aspects. The estimation of more flexible models that are newly covered by our approach in contrast to existing implementations, i.e. with a timevarying assocation parameter and the specification of flexible trajectories, is equally satisfactory. While the more precise posterior mean estimation is timeconsuming, the posterior mode offers a computationally efficient way to quickly assess the point estimates in a given model specification, even though credibility bands are only approximate.
5 Application
In order to gain insights into our motivating research question we apply the model to a combined data set of two ongoing German T1D risk cohorts to investigate whether longitudinal trajectories of insulin autoantibodies (IAA) are associated with the rate of progression to T1D. Whereas different autoantibodies are diagnostic for a preclinical stage of the disease, our focus lies on the analysis of the levels of IAA as a marker from the time when it first exceeded a specific threshold, called seroconversion, to the onset of T1D or loss to followup. The marker IAA is most often the first autoantibody to appear (Ziegler et al., 1993, 1999; Hummel et al., 2004a). Both its initial value at seroconversion as well as its mean over time have been shown to be positively associated with the emergence of T1D and negatively related to the age at T1D diagnosis (Steck et al., 2011, 2015).
The BABYDIAB and BABYDIET studies, both propective birth cohorts with a joint study protocol, aim to investigate the natural history of T1D development. In these studies children with familial increased risk of T1D were followed from birth to the development of T1D or loss to followup for up to 21 years (Ziegler et al., 1993, 1999; Hummel et al., 2004b, 2011). In both studies, autoantibody measurements were taken at age 9 months and 2, 5, 8, 11, 14 and 17 years and additionally every 6 months after positive islet autoantibodies had emerged. The exact age at the emergence of clinical T1D was assessed also between study visits.
In our joint model we use data of children who developed IAA during followup of which 69 (54%) progressed to T1D. The subject’s progression times are censored at 15 years after seroconversion due to the extremely low sample size at later time points. In total longitudinal measurements of IAA after seroconversion were used and logtransformed for the analysis. We model subject’s transformed autoantibody levels using functional random intercepts and two further covariates. First, the age at seroconversion is included as a linear effect and second a binary variable indicates whether the autoantibody was among the first autoantibodies to appear. We model the association between marker and event, , to be a nonlinear function of time. Further we allow the covariates in the longitudinal model to also influence the survival process directly and expect a positive association between the age at seroconversion and the time to T1D (Steck et al., 2011; Ziegler et al., 2013). In our Bayesian model estimation we sample for 33000 iterations with a burnin of 3000 and thinning of 30 to obtain 1000 samples, with starting values for the posterior mean estimation obtained from the posterior mode estimates. Convergence is assessed by the inspection of traceplots, of which a subset is presented in Appendix B. In order to assess the sensitivity of the results to the number of knots we specify three models with differing numbers of knots. We specify two models using either 12 (i.e. 4 internal) knots or 20 (i.e. 8 internal) knots for the overall mean as well as the individual trajectories in the functional random intercepts and 10 (i.e. 2 internal) knots in the survival submodel. Additionally we specify a model with 20 (i.e. 8 internal) knots for nonlinear terms in both, the longitudinal and the survival submodels.
The results from the three specified models in our sensitivity analysis are highly similar for all predictors with regard to mean estimates and the credibility intervals. However we observe lower DIC for the models with more knots in the functional random intercepts along with a closer fit of the individual trajectories and more narrow credibility intervals for the estimated association (cf. Figure A3 in Appendix B). Using more knots in the survival submodel results in a better mixing in the traceplots but a slightly higher DIC. Hence we assume the results to be robust regarding the exact number of knots and present results of the model with the lowest DIC in the following.
As shown in Figure (b)b for 5 randomly selected subjects, we are able to closely approximate the individual nonlinear trajectories of IAA. The association between the marker and the onset of clinical T1D is estimated as stable over time with an average slope of 0.01 [95% credibility interval: 0.08, 0.06].
The average slope was defined as the mean over the first derivative of the association evaluated at all observed event and followup times , and its posterior distribution can be easily obtained by numerically deriving in every sample. The credibility interval for the estimated association is above 0 from 0.5 to 6.5 years after seroconversion and there is more uncertainty when less information is available, i.e. when less event and followup times are observed and when less subjects remain in the risk set, as indicated by the credibility intervals (Figure 6). In the longitudinal submodel we observe that trajectories have a lower level, if subjects seroconverted at an older age (in years, ; 95% credibility interval: [0.14, 0.01]) and a higher level if IAA was amongst the first markers to appear (; [0.24, 1.55]). In the survival submodel the loghazard is decreased if IAA was amongst the first markers to appear (; [1.69, 0.14]). In sum if IAA is amongst the first markers to appear the loghazard is reduced by 0.71. This net effect can be derived as the sum of the direct effect in and the indirect effect in with an average association of . Additionally we do not observe a direct effect of the age at seroconversion (; [0.20, 0.01]).
In line with previous findings (Steck et al., 2011, 2015) these results indicate that the quantitative levels of the marker IAA are informative for the rate of progression to T1D in the first years after seroconversion with higher levels increasing the hazard of T1D. The direct relationship between the hazard and the baseline covariate age at seroconversion is not supported by the model, suggesting that the previously established influence of this covariate on T1D progression may be mediated by the marker levels, i.e. the effect in the respective loghazard is reduced if the marker levels over time are taken into account as in our flexible parameterization. We do not observe a timevarying association between IAA and the hazard of T1D over time. There is much uncertainty around the nonlinear timevarying estimate of the association , potentially as a result of the flexibility in the estimation in combination with the amount of data in the survival part.
6 Discussion and Outlook
We presented a flexible joint model that allows to fit a broad range of joint model specifications using structured additive predictors for all model components. The approach is fully implemented in the Rpackage bamlss. While the framework is very flexible, as illustrated by Table 1, the focus in this work lies on the flexible modeling of individual trajectories and the specification of a timevarying association between marker and event.
The proposed model shows satisfactory performance in various simulation settings and has the potential to offer new insights into complex relationships between biomarkers and timetoevent processes. Our methods development was motivated by a specific research question from T1D studies and two corresponding data sets. We saw that even by combining the two cohorts, the sample size of the data set considered in the application in Section 5 is at the lower limit for the complexity of our model, as indicated by our simulation study and by the width of the credibility intervals in the applied results. Nevertheless we found a positive association between a diseaserelated biomarker and the occurrence of clinical T1D. Although our model allows for a timevarying association between the biomarker and the event process at least in this small data set it was estimated to be roughly constant. In consequence our flexible model can also be used to check the modeling assumptions of simpler models that are commonly used. We aim to further explore the relationship between T1Dspecific autoantibodies and the progression to T1D in a larger data set from a different, multinational T1D cohort (with sample size exceeding data setting in our simulations) as in Steck et al. (2015).
Due to the complexity of the model and its estimation, the computation speed is still a drawback in our implementation. Hence we are constantly working on speeding up the computations further. As shown in simulation 2, the posterior mode estimation offers a computationally efficient way to obtain point estimates from a flexible joint model before starting the full MCMC sampling. These posterior mode estimates show a precision similar to that of the posterior mean estimates. However, the credibility intervals obtained from posterior modes are not wide enough, potentially due to the fact that the uncertainty around the variance parameters is not included in the credibility intervals. In consequence, only the credibility intervals of the posterior mean estimates should be used for inference.
As is well known in the survival context, the number of potential parameters in the model is limited by the number of observed events (Harrell et al., 1996). This also holds in our approach for the predictors in the survival part of the model, , , and . We achieve to alleviate this issue to some extent by the penalized approach, which decreases the effective number of degrees of freedom and thus allows for a richer model than would be possible without a penalty. Still, we recommend to model only those effects as nonlinear functions, where a strong indication for nonlinearity is given.
Within the framework of the presented additive joint model several further extensions are possible. As a next step we aim to extend the model by including the derivative of the longitudinal trajectories to model the event process similar to Ye et al. (2008), Brown (2009) and Rizopoulos et al. (2014), allowing to model the potentially timevarying association between changes in the marker and the hazard. Further, functional historical effects of the trajectories, including information on the history of the marker (Malfait and Ramsay, 2003; Gellar et al., 2014), could potentially offer additional insights into complex relationships between markers and event processes.
We thank Lorenz Lachmann, Claudia Matzke, Joanna Stock, Stephanie Krause, Annette Knopff, Florian Haupt, Maren Pflüger, Marlon Scholz and Anita Gavrisan (all: Institute for Diabetes Research, Helmholtz Zentrum München) for data collection and expert technical assistance, Ramona Puff (Institute for Diabetes Research, Helmholtz Zentrum München) for laboratory management, and Peter Achenbach (Institute for Diabetes Research, Helmholtz Zentrum München) and Ezio Bonifacio (Center for Regenerative Therapies Dresden and Paul Langerhans Institute Dresden, Technische Universität Dresden) for overseeing antibody measurement and for fruitful discussions and advice on modeling T1Dspecific autoantibodies. We also thank all pediatricians and family doctors in Germany for participating in the BABYDIAB Study. Furthermore we thank Fabian Scheipl (LudwigMaximiliansUniversität München) for advice on modeling functional random intercepts. The work was supported by the JDRF (JDRF2SRA201513QR) and by grants from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.). Meike Köhlerâs work was supported by a grant from the Helmholtz International Research Group (HIRG0018) and Sonja Greven acknowledges funding from the German research foundation (DFG) through Emmy Noether grant GR 3793/11.
Conflict of Interest
The authors have declared no conflict of interest.
Appendix A
We derive score vectors and Hessians for the regression coefficients of every predictor. We introduce some further notation to formulate these derivatives. For the timevarying predictors of the survival part the design matrix denotes the matrix of evaluations at the vector of survival times . For the timevarying predictors of the longitudinal part the design matrix contains the evaluations at all observed subjectspecific timepoints . Let denote the loglikelihood, i.e. the sum of the contributions of the longitudinal and survival submodels defined in (7) and (8). In more detail, the full likelihood is