Disease Knowledge Transfer across Neurodegenerative Diseases
Abstract
We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a jointdisease generative model of biomarker progressions, which exploits biomarker relationships that are shared across diseases. Our proposed method allows, for the first time, the estimation of plausible multimodal biomarker trajectories in Posterior Cortical Atrophy (PCA), a rare neurodegenerative disease where only unimodal MRI data is available. For this we train DKT on a combined dataset containing subjects with two distinct diseases and sizes of data available: 1) a larger, multimodal typical AD (tAD) dataset from the TADPOLE Challenge, and 2) a smaller unimodal Posterior Cortical Atrophy (PCA) dataset from the Dementia Research Centre (DRC), for which only a limited number of Magnetic Resonance Imaging (MRI) scans are available. Although validation is challenging due to lack of data in PCA, we validate DKT on synthetic data and two patient datasets (TADPOLE and PCA cohorts), showing it can estimate the ground truth parameters in the simulation and predict unseen biomarkers on the two patient datasets. While we demonstrated DKT on Alzheimer’s variants, we note DKT is generalisable to other forms of related neurodegenerative diseases. Source code for DKT is available online: https://github.com/mrazvan22/dkt.
Keywords:
Disease Progression Modelling, Transfer Learning, Manifold Learning, Alzheimer’s Disease, Posterior Cortical Atrophy1 Introduction
The estimation of accurate biomarker signatures in Alzheimer’s disease (AD) and related neurodegenerative diseases is crucial for understanding underlying disease mechanisms, predicting subjects’ progressions, and enrichment in clinical trials. Recently, datadriven disease progression models were proposed to reconstruct long term biomarker signatures from collections of short term individual measurements [1, 2]. When applied to large datasets of typical AD, disease progression models have shown important benefits in understanding the earliest events in the AD cascade [1], quantifying biomarkers’ heterogeneity [3] and they showed improved predictions over standard approaches [1]. However, by necessity these models require large datasets – in addition they should be both multimodal and longitudinal. Such data is not always available in rare neurodegenerative diseases. In particular, most datasets for rare neurodegenerative diseases come from local clinical centres, are unimodal (e.g. MRI only) and limited both crosssectionally and longitudinally – this makes the application of disease progression models extremely difficult. Moreover, such a model estimated from common diseases such as typical AD may not generalise to specific variants. For example, in Posterior Cortical Atrophy (PCA) – a neurodegenerative syndrome causing visual disruption – posterior regions such as the occipital lobe are affected early, instead of the hippocampus and temporal regions in typical AD.
The problem of limited data in medical imaging has so far been addressed through transfer learning methods. These were successfully used to improve the accuracy of AD diagnosis [4] or prediction of MCI conversion [5], but have two key limitations. First, they use deep learning or other machine learning methods, which are not easily interpretable and don’t allow us to understand underlying disease mechanisms that are either specific to rare diseases, or shared across related diseases. Secondly, these models cannot be used to forecast the future evolution of subjects at risk of disease, which is important for selecting the right subjects in clinical trials.
We propose Disease Knowledge Transfer (DKT), a generative model that estimates continuous multimodal biomarker progressions for multiple diseases simultaneously – including rare neurodegenerative diseases – and which inherently performs transfer learning between the modelled phenotypes. This is achieved by exploiting biomarker relationships that are shared across diseases, whilst accounting for differences in the spatial distribution of brain pathology. DKT is interpretable, which allows us to understand underlying disease mechanisms, and can also predict the future evolution of subjects at risk of diseases. We apply DKT on Alzheimer’s variants and demonstrate its ability to predict nonMRI trajectories for patients with Posterior Cortical Atrophy, in lack of such data. This is done by fitting DKT to two datasets simultaneously: (1) the TADPOLE Challenge [6] dataset containing subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) with MRI, FDGPET, DTI, AV45 and AV1451 scans and (2) MRI scans from patients with Posterior Cortical Atrophy from the Dementia Research Centre (DRC), UK. We finally validate DKT on three datasets: 1) simulated data with known ground truth, 2) TADPOLE subpopulations with different progressions and 3) 20 DTI scans from controls and PCA patients from our clinical center.
2 Method
Fig. 1 shows the diagram of the DKT framework. We assume that the progression of each disease can be modelled as a unique evolution of dysfunction trajectories representing regionspecific multimodal pathology, further modelled as the progression of several biomarkers within that same region, but acquired using different modalities (Fig. 1 bottom). Each group of biomarkers in the bottom row will be called a diseaseagnostic unit or simply agnostic unit, because biomarker dynamics here are assumed to be shared across all diseases modelled.
The assumption that the dynamics of some biomarkers are diseaseagnostic (i.e. shared across diseases), is key to DKT. We can make this assumption for two reasons. First, pathology in many related neurodegenerative diseases (e.g. Alzheimer’s variants) is hypothesised to share the same underlying mechanisms (e.g. amyloid and tau accumulation), and within one region, such mechanisms lead to similar pathology dynamics across all the disease variants modelled [7], with the key difference that distinct brain regions are affected at different times and with different pathology rates and extent, likely caused by selective vulnerability of networks within these regions [8]. Secondly, even if the diseases share different upstream mechanisms (e.g. amyloid vs tau accumulation), downstream biomarkers measuring hypometabolism, white matter degradation and atrophy are likely to follow the same pathological cascade and will have similar dynamics.
We now model the biomarker dynamics that are specific to each disease, by mapping the subjects’ disease stages to dysfunction scores. We assume that each subject at each visit has an underlying disease stage , where represents the months since baseline visit for subject at visit and represents the time shift of subject . We then assume that each subject at visit has a dysfunction score corresponding to multimodal pathology in brain region , which is a function of its disease stage:
(1) 
where is a smooth monotonic function mapping each disease stage to a dysfunction score, having parameters corresponding to agnostic unit , where is the set of all agnostic units. Moreover, represents the index of the disease corresponding to subject , where is the set of all diseases modelled. For example, MCI and tAD subjects from ADNI as well as tAD subjects from the DRC cohort can all be assigned , while PCA subjects can be assigned . We implement as a parametric sigmoidal curve similar to [2], to enable a robust optimisation and because this accounts for floor and ceiling effects present in AD biomarkers – the monotonicity of this sigmoidal family is also very appropriate for many neurodegenerative diseases due to irreversability.
We further model the biomarker dynamics that are diseaseagnostic, by constructing the mapping from the dysfunction scores to the biomarker measurements. We assume a set of given biomarker measurements for subject at visit in biomarker , where is the set of available biomarker measurements. We further denote by the trajectory parameters for biomarker within its agnostic unit , where : {1, …, K} maps each biomarker to a unique agnostic unit . These definitions allow us to formulate the likelihood for a single measurement as follows:
(2) 
where represents the trajectory of biomarker within agnostic unit , with parameters , and is again implemented using a sigmoidal function for reasons outlined above. Parameters are used to define based on Eq. 1, where agnostic unit is now referred to as , to clarify this is the unit where biomarker has been allocated. Variable denotes the variance of measurements for biomarker .
We extend the above model to multiple subjects, visits and biomarkers to get the full model likelihood:
(3) 
where is the vector of all biomarker measurements, while represents the stacked parameters for the trajectories of biomarkers in agnostic units, are the parameters of the dysfunction trajectories within the disease models, are the subjectspecific time shifts and estimates measurement noise.
We estimate the model parameters using loopy belief propagation – see algorithm in supplementary material. One key advantage of DKT is that the subject’s time shift can be estimated using only a subset (e.g. MRI) of the subject’s data – the model can then infer the missing modalities (e.g. nonMRI) using Eq. 3.
2.1 Generating Synthetic Data
We first test DKT on synthetic data, to assess its performance against known ground truth. More precisely, we generate data that follows the DKT model exactly, and test DKT’s ability to recover biomarker trajectories and subject timeshifts. We generate synthetic data from two diseases (50 subjects with ”synthetic PCA” and 100 subjects with ”synthetic AD”) using the parameters from the bottomleft table in Fig. 2, emulating the TADPOLE and DRC cohorts – see supplementary material for full details. The six biomarkers () have been apriori allocated to two agnostic units and . To simulate the lack of multimodal data in the synthetic PCA subjects, we discarded the data from biomarkers , , and for all these subjects.
2.2 Data Acquisition and Preprocessing
We trained DKT on ADNI data from the TADPOLE challenge [6], since it contained a large number of multimodal biomarkers already preprocessed and aggregated into one table. From the TADPOLE dataset we selected a subset of 230 subjects which had an MRI scan and at least one FDG PET, AV45, AV1451 or DTI scan. In order to model another disease, we further included MRI scans from 76 PCA subjects from the DRC cohort, along with scans from 67 tAD and 87 agematched controls.
For both datasets, we computed multimodal biomarker measurements corresponding to each brain lobe: MRI volumes using the Freesurfer software, FDG, AV45 and AV1451PET standardised uptake value ratios (SUVR) extracted with the standard ADNI pipeline, and DTI fractional anisotropy (FA) measures from adjacent whitematter regions. For every lobe, we regressed out the following covariates: age, gender, total intracranial volume (TIV) and dataset (ADNI vs DRC). Finally, biomarkers were normalized to the [0,1] range.
3 Results on Synthetic and Patient Datasets
Results on synthetic data in the presence of ground truth (Fig. 2) suggest that DKT can robustly estimate the trajectory parameters (MAE 0.058) as well as the subjectspecific timeshifts ( 0.98). While some errors in trajectory estimation can be noticed, these are due to the informed priors on the model parameters in order to ensure identifiability and convergence of parameters.
Biomarker allocation: 
Agnostic unit : 
Agnostic unit : 
Synthetic AD: 
Synthetic PCA: 
We then apply DKT to real patient data, with the aim of transferring multimodal biomarker trajectories from tAD to PCA. The inferred PCA trajectories, shown in Fig. 3, recapitulate known patterns in PCA [9], where posterior regions such as occipital and parietal lobes are predominantly affected in later stages. As opposed to typical AD, we find that the hippocampus is affected later on, further suggesting the model did not transfer too much tAD specific information. Here, we demonstrate the possibility of inferring plausible nonMRI biomarkers in a rare neurodegenerative disease, in lack of such data for these subjects. As far as we are aware, this is the first time a continuous signature of nonMRI biomarkers is estimated for PCA, due to its rarity and lack of data.
3.1 Validation on DTI Data in tAD and PCA
We further validated DKT by predicting unseen DTI data from two patient datasets: 1) TADPOLE subjects with a different progression from the training subjects, and 2) a separate test set of 20 DTI scans from controls and PCA patients from the DRC – full demographics are given in the supplementary material. To split TADPOLE into subgroups with different progression, we used the SuStaIn model by [3], which resulted into three subgroups: hippocampal, cortical and subcortical, with prominent early atrophy in the hippocampus, cortical and subcortical regions respectively. To evaluate prediction accuracy, we computed the rank correlation between the DKTpredicted biomarker values and the measured values in the test data. We compute the rank correlation instead of mean squared error as it is not susceptible to systemic biases of the models when predicting ”unseen data” in a certain disease.
Validation results are shown in Table 1, for hippocampal to cortical TADPOLE subgroups (other pairs of subgroups not shown due to lack of space) as well as PCA subjects. When predicting missing DTI markers of the TADPOLE cortical subgroup as well as PCA subjects from the DRC cohort (Table 1), the DKT correlations are generally high for the cingulate, hippocampus and parietal, and lower for the frontal lobe. DKT also shows favourable performance compared to four other models: the latentstage model from [2], a multivariate Gaussian Process model with RBF kernel that predicts a DTI ROI marker from multiple MRI markers, as well as cubic spline and linear models that predict a regional DTI biomarker directly from its corresponding MRI marker. In particular for predicting DTI FA in the parietal and temporal lobes, DKT has significantly better predictions that almost all methods tested.
Model  Cingulate  Frontal  Hippocam.  Occipital  Parietal  Temporal 

TADPOLE: Hippocampal subgroup to Cortical subgroup  
DKT (ours)  0.56 0.23  0.35 0.17  0.58 0.14  0.10 0.29  0.71 0.11  0.34 0.26 
Latent stage  0.44 0.25  0.34 0.21  0.34 0.24*  0.07 0.22  0.64 0.16  0.08 0.24* 
Multivariate  0.60 0.18  0.11 0.22*  0.12 0.29*  0.22 0.22  0.44 0.14*  0.32 0.29* 
Spline  0.24 0.25*  0.06 0.27*  0.58 0.17  0.16 0.27  0.23 0.25*  0.10 0.25* 
Linear  0.24 0.25*  0.20 0.25*  0.58 0.17  0.16 0.27  0.23 0.25*  0.13 0.23* 
typical Alzheimer’s to Posterior Cortical Atrophy  
DKT (ours)  0.77 0.11  0.39 0.26  0.75 0.09  0.60 0.14  0.55 0.24  0.35 0.22 
Latent stage  0.80 0.09  0.53 0.17  0.80 0.12  0.56 0.18  0.50 0.21  0.32 0.24 
Multivariate  0.73 0.09  0.45 0.22  0.71 0.08  0.28 0.21*  0.53 0.22  0.25 0.23* 
Spline  0.52 0.20*  0.03 0.35*  0.66 0.11*  0.09 0.25*  0.53 0.20  0.30 0.21* 
Linear  0.52 0.20*  0.34 0.27  0.66 0.11*  0.64 0.17  0.54 0.22  0.30 0.21* 
4 Discussion
In this work we made initial steps at the challenging problem of transfer learning between different neurodegenerative diseases. Our proposed DKT method enabled the estimation of quantitative nonMRI trajectories in a rare disease (PCA) where very limited data was available. To our knowledge, this is the first time a multimodal continuous signature is derived for PCA, as the only other longitudinal study of PCA only computed atrophy measures from MRI scans [10]. Our work has however several limitations, which can be addressed in future research: 1) to account for population heterogeneity, DKT can be easily extended to include subjectspecific effects; 2) improved schemes for biomarker allocation to agnostic units can take connectivity into account, or derive it from the data automatically; 3) DKT can be further validated on more complex synthetic experiments with a range of datasets generated with different parameters.
5 Acknowledgements
This work was supported by the EPSRC Centre For Doctoral Training in Medical Imaging with grant EP/L016478/1 and in part by the Neuroimaging Analysis Center through NIH grant NIH NIBIB NAC P41EB015902. Data collection and sharing for this project was funded by the Alzheimerâs Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). The Dementia Research Centre is an ARUK coordination center.
References
 [1] Oxtoby, N.P., Young, A.L., Cash, D.M., Benzinger, T.L., Fagan, A.M., Morris, J.C., Bateman, R.J., Fox, N.C., Schott, J.M. and Alexander, D.C., 2018. Datadriven models of dominantlyinherited Alzheimerâs disease progression. Brain, 141(5), pp.15291544.
 [2] Jedynak, B.M., Lang, A., Liu, B., Katz, E., Zhang, Y., Wyman, B.T., Raunig, D., Jedynak, C.P., Caffo, B., Prince, J.L. and ADNI, 2012. A computational neurodegenerative disease progression score: method and results with the Alzheimer’s disease Neuroimaging Initiative cohort. Neuroimage, 63(3), pp.14781486.
 [3] Young, A.L., Marinescu, R.V., Oxtoby, N.P., Bocchetta, M., Yong, K., Firth, N.C., Cash, D.M., Thomas, D.L., Dick, K.M., Cardoso, J. and van Swieten, J., 2018. Uncovering the heterogeneity and temporal complexity of neurodegenerative diseases with Subtype and Stage Inference. Nature communications, 9(1), p.4273.
 [4] Hon, M. and Khan, N., 2017. Towards Alzheimer’s Disease Classification through Transfer Learning. arXiv preprint arXiv:1711.11117.
 [5] Cheng, B., Liu, M., Zhang, D., Munsell, B.C. and Shen, D., 2015. Domain transfer learning for MCI conversion prediction. IEEE Transactions on Biomedical Engineering, 62(7), pp.18051817.
 [6] Marinescu, R.V., Oxtoby, N.P., Young, A.L., Bron, E.E., Toga, A.W., Weiner, M.W., Barkhof, F., Fox, N.C., Klein, S. and Alexander, D.C., 2018. TADPOLE Challenge: Prediction of Longitudinal Evolution in Alzheimer’s Disease. arXiv:1805.03909.
 [7] Jack Jr, C.R., Knopman, D.S., Jagust, W.J., Shaw, L.M., Aisen, P.S., Weiner, M.W., Petersen, R.C. and Trojanowski, J.Q., 2010. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. The Lancet Neurology, 9(1), pp.119128.
 [8] Seeley, W.W., Crawford, R.K., Zhou, J., Miller, B.L. and Greicius, M.D., 2009. Neurodegenerative diseases target largescale human brain networks. Neuron, 62(1), pp.4252.
 [9] Crutch, S.J., Lehmann, M., Schott, J.M., Rabinovici, G.D., Rossor, M.N. and Fox, N.C., 2012. Posterior cortical atrophy. The Lancet Neurology, 11(2), pp.170178.
 [10] Lehmann, M., Crutch, S.J., Ridgway, G.R., Ridha, B.H., Barnes, J., Warrington, E.K., Rossor, M.N. and Fox, N.C., 2011. Cortical thickness and voxelbased morphometry in posterior cortical atrophy and typical Alzheimer’s disease. Neurobiology of aging, 32(8), pp.14661476.
Appendix A Supplementary material
a.1 Parameter Estimation
We estimate the model parameters using a twostage approach. In the first stage, we perform belief propagation within each agnostic unit and then within each disease model. In the second stage we jointly optimise across all agnostic units and disease models using loopy belief propagation. An overview of the algorithm is given in Figure 4. Given the initial parameters estimated from the first stage (line 1), the algorithm continuously updates the biomarker trajectories within the agnostic units (lines 45), dysfunction trajectories (line 8) and subjectspecific time shifts (line 10) until convergence. The cost function for all parameters is nearly identical, the main difference being the measurements over subjects , visits and biomarkers that are selected for computing the measurement error. For estimating the trajectory of biomarker within agnostic unit , measurements are taken from representing all measurements of biomarker from all subjects and visits. For estimating the dysfunction trajectories, represents the measurement indices from all subjects with disease (i.e. ) and all biomarkers that belong to agnostic unit (i.e. ). Finally, (line 10) represents all measurements from subject , for all biomarkers and visits.
a.2 Generation of synthetic dataset
We tested DKT on synthetic data, to assess its performance against known ground truth. More precisely, we generated data that follows the DKT model exactly, and tested DKT’s ability to recover biomarker trajectories and subject timeshifts.
We generated the synthetic data as follows, using parameters from Table 2:

We simulate two synthetic diseases, ”synthetic PCA” and ”synthetic AD”

We define 6 biomarkers that we allocate to agnostic units and (Table 2 top)

Within each agnostic unit, we define the parameters , …, corresponding to biomarker trajectories within the agnostic unit.

For each disease, we define the parameters corresponding to trajectories of dysfunction scores.

We then sample data from 100 synthetic AD subjects and 50 PCA subjects with as given in Table 2 bottom using the model likelihood (Eq. 2 from main paper). For each subject, we generate data for 4 visits, each 1 year apart.
Trajectory parameters  
Biomarker allocation  , 
Agnostic unit  , , 
Agnostic unit  , , 
”Synthetic AD”  and 
”Synthetic PCA”  and 
Subject parameters  
Number of subjects  100 (synthetic AD) and 50 (synthetic PCA) 
Timeshifts  years 
Diagnosis  , 
Data generation  4 visits/subject, 1 year apart, 
a.3 Demographics of test sets
The cohort from the Dementia Research Centre UK used for validation consisted of 10 subjects diagnosed with Posterior Cortical Atrophy, with a mean age of 59.4, 40% females, as well as 10 agematched controls with a mean age of 59.3, 50% females.
For the validation on TADPOLE subgroups, we used applied the SuStaIn model on TADPOLE to split the population into three subgroups with different progression: hippocampal, cortical and subcortical subypes with prominent atrophy in the hippocampus, cortical and subcortical areas respectively. The resulting subgroups had the following demographics:
Cohort  Nr. subjects  Nr. visits  Age (baseline)  Gender (%F) 

Controls (Hippocampal)  31  2.3 1.8  74.4 6.9  38% 
AD (Hippocampal)  21  1.5 0.8  74.5 5.5  42% 
Controls (cortical)  21  2.3 1.3  70.9 5.4  42% 
AD (cortical)  35  1.7 0.9  72.8 7.4  28% 
Controls (subcortical)  28  3.0 1.5  73.7 6.5  42% 
AD (subcortical)  27  1.6 0.9  73.7 7.5  33% 