Fast Predictive Simple Geodesic Regression
Abstract
Deformable image registration and regression are important tasks in medical image analysis. However, they are computationally expensive, especially when analyzing largescale datasets that contain thousands of images. Hence, cluster computing is typically used, making the approaches dependent on such computational infrastructure. Even larger computational resources are required as study sizes increase. This limits the use of deformable image registration and regression for clinical applications and as component algorithms for other image analysis approaches. We therefore propose using a fast predictive approach to perform image registrations. In particular, we employ these fast registration predictions to approximate a simplified geodesic regression model to capture longitudinal brain changes. The resulting method is orders of magnitude faster than the standard optimizationbased regression model and hence facilitates largescale analysis on a single graphics processing unit (GPU). We evaluate our results on 3D brain magnetic resonance images (MRI) from the ADNI datasets.
1Introduction
Longitudinal image data provides us with a wealth of information to study aging processes, brain development and disease progression. Such studies, for example ADNI [?] and the Rotterdam study [?], involve analyzing thousands of images. In fact, even larger studies will be available in the near future. For example, the UK Biobank [?] targets on the order of 100,000 images once completed. With the number of images increasing, largescale image analysis typically resorts to using compute clusters for parallel processing. While this is, in principle, a viable solution, increasingly larger compute clusters will become necessary for such studies. Alternatively, more efficient algorithms can reduce computational requirements, which then facilitates computations on individual computers or much smaller compute clusters, interactive (e.g., clinical) applications, efficient algorithm development, and use of these efficient algorithms as components in more sophisticated analysis approaches (which may use them as part of iterative processes).
Image registration is a key task in medical image analysis to study deformations between images. Building on image registration approaches, image regression models [?] have been developed to analyze deformation trends in longitudinal imaging studies. One such approach is geodesic regression (GR) [?] which (for images) build on the large displacement diffeomorphic metric mapping model (LDDMM) [?]. In general, GR generalizes linear regression to Riemannian manifolds. When applied to longitudinal image data, it can compactly express spatial image transformations over time. However, the solution to the underlying optimization problem is computationally expensive. Hence, a simplified, approximate, GR approach has been proposed [?] (SGR) to decouple the computation of the regression geodesic into pairwise image registrations. However, even such a simplified GR approach would require months of computation time on a single graphics processing unit (GPU) to process thousands of 3D image registrations for largescale imaging studies such as ADNI [?]. The primary reason computational bottleneck for SGR are the optimization required to compute pairwise registrations.
Recently, efficient approaches have been proposed for deformable image registration [?]. In particular, for LDDMM, which is the basis of GR approaches for images, registrations can be dramatically sped up, by either working with finitedimensional Lie algebras [?] and frequency diffeomorphisms [?], or by fast predictive image registration (FPIR) [?]. FPIR predicts the initial conditions (specifically, the initial momentum) of LDDMM, which fully characterize the geodesic and the spatial transformation using a learned a patchbased deep regression model. Because numerical optimization of standard LDDMM registration is replaced by a single prediction step, followed by optional correction steps [?], FPIR is dramatically faster than optimizationbased LDDMM without compromising registration accuracy, as measured on several registration benchmarks [?].
Besides FPIR, other predictive image registration approaches have been proposed. Dosovitskiy et al. [?] use a convolutional neural network (CNN) to directly predict optical flow. Liu et al. [?] use an encoderdecoder network to synthesize video frames. Schuster et al. [?] investigate strategies to improve optical flow prediction via a CNN. Cao et al. [?] use a sampling strategy and CNN regression to directly learn the mapping from moving and target image pairs to the final deformation field. Miao et al. [?] use CNN regression for 2D/3D rigid registration. Sokooti et al.[?] use CNNs to directly predict a 3D displacement vector field from input image pairs. An unsupervised approach for image registration was proposed by de Vos et al. [?]; here, the loss function is the image similarity measure between images themselves and a deformation is parameterized via a spatial transformer (which essentially amounts to a parameterized model of deformation in image registration) which generates the soughtfor displacement vector field. In [?], Hong et al. employ a lowdimensional bandlimited representation of velocity fields in Fourier space [?] to speed up SGR [?] for populationbased image analysis.
In this work, we will build on FPIR, as it is a desirable approach for brain image registration for the following reasons: First, FPIR predicts the initial momentum of LDDMM and therefore inherits the theoretical properties of LDDMM. Consequently, FPIR results in diffeomorphic transformations, even though predictions are computed in a patchbypatch manner; this can not be guaranteed by most other prediction methods. Second, patchwise prediction allows for training of the prediction models based on a very small number of images, containing a large number of patches. Third, by using a patchwise approach, even highresolution image volumes can be processed without running into memory issues on a GPU. Fourth, none of the existing predictive methods address longitudinal data. However, as both FPIR and SGR are based on LDDMM, they naturally integrate and hence result in our proposed fast predictive simple geodesic regression (FPSGR) approach.
Our contributions can be summarized as follows:
 Predictive geodesic regression
We use a fast predictive registration approach for image geodesic regression. Different to [?], we specifically validate that our approach can indeed capture the frequently subtle deformation trends of longitudinal image data.
 Largescale dataset capability
Our predictive regression approach facilitates largescale image regression within a short amount of time on a single GPU, instead of requiring months of computation time for standard optimizationbased methods on a single computer, or on a compute cluster.
 Accuracy
We assess the accuracy of FPSGR by (1) studying linear models of atrophy scores (which are derived from the nonlinear SGR model) over time, as well as (2) correlations between atrophy scores and various diagnostic groups.
 Validation
We demonstrate the performance of FPSGR by analyzing images of the
ADNI1
/ADNI2
datasets. For comparison, we also perform SGR using numerical optimization for the registrations, again on the completeADNI1
/ADNI2
datasets.
This work is an extension of a recent conference paper [?]. In particular, all our experiments are now in 3D. We also added significantly more results to further explore the behavior of FPSGR in comparison to optimizationbased SGR. In particular, we added (a) an example to visualize the performance of regression models and associated quantitative comparisons (Sec. Section 4.1); (b) an analysis of local atrophy score correlated with clinical variables (Sec. Section 4.3); (c) correlations within diagnostic groups (Sec. Section 4.3); (d) a comparison with pairwise registration (Sec. Section 4.4); (e) and experiments on extrapolation on unseen data (Sec. Section 4.5, Section 4.6).
Organization. The remainder of this article is organized as follows: Section 2 describes FPSGR, Section 3 discusses the experimental setup and the training of the prediction models. In Section 4, we present experimental results for 3D MR brain images. The paper concludes with a summary and an outlook on future work.
2Fast predictive simple geodesic regression
Our fast predictive simple geodesic regression approach is a combination of two methods: First, fast predictive image registration (FPIR) and, second, integration of FPIR with simple geodesic regression (SGR). Both FPIR and SGR are based on the shooting formulation of LDDMM [?]; Fig. ? illustrates our overall approach. The individual components are described in the following.
2.1Lddmm
Shootingbased LDDMM and geodesic regression minimize
where is the initial image (known for imagetoimage registration and to be determined for geodesic regression), is the initial momentum, is a smoothing operator that connects velocity and momentum as and with , is a weight, is the measured image at time (there will be only one such image for imagetoimage registration at ), and denotes the image similarity measure between and (for example or geodesic distance); is the dual of the negative JacobiLie bracket of vector fields: and denotes the Jacobian. The deformation of the source image can be computed by solving , where denotes the identity map.
2.2Fpir
Fast predictive image registration [?] aims at predicting the initial momentum, , between a source and a target image patchbypatch. Specifically, we use a deep encoderdecoder network to predict the patchwise momentum. As shown in Fig. ?, in 3D the inputs are two layers of image patches ( in 2D), where the two layers are from the source and target images respectively. Two patches are taken at the same position by two parallel encoders, which learn features independently. The output is the predicted initial momentum in the , and directions (obtained by numerical optimization on the training samples). Basically, the network is split into an encoder and a decoder part. An encoder consists of 2 blocks of three 3 3 3 convolutional layers with PReLU activations, followed by another 2 2 2 convolution+PReLU with a stride of two, serving as a “pooling” operation. The number of features in the first convolutional layer is 64 and increases to 128 in the second. In the decoder, three parallel decoders share the same input generated from the encoder. Each decoder is the inverse of the encoder except for using 3D transposed convolution layers with a stride of two to perform “unpooling”, and no nonlinearity at the end. To speed up computations, we use patch pruning (i.e., for brain imaging, e.g., patches outside the brain are not predicted as the momentum is expected to be zero there) and a large pixel stride (e.g., 14 for patches) for the sliding window of the predicted patches.
2.3Correction network
We follow [?] and use a twostep approach to improve overall prediction accuracy. An additional correction step, i.e., a correction network, corrects the prediction of the initial prediction network. Figure 1 illustrates this twostep approach graphically. The correction network has the same structure as the prediction network. Only the inputs and outputs differ. For the prediction network, the inputs are the original moving image and the original target image; output is the predicted initial momentum. For the correction network, the inputs are the original moving image and the warped target image; the output is the momentum difference.
Data Percentile 
0.3%  5%  25%  50%  75%  95%  99.7% 
Longitudinal Training  0.0156  0.0407  0.0761  0.1098  0.1559  0.2681  0.3238 
Crosssectional Training  0.0544  0.1424  0.2641  0.3723  0.5067  0.7502  0.8425 
Data Percentile 
0.3%  5%  25%  50%  75%  95%  99.7% 
Longitudinal Training  0.1694  0.4802  1.0765  1.7649  2.7630  4.8060  5.6826 
Crosssectional Training  0.1123  0.3024  0.5863  0.8737  1.2743  2.2659  2.7836 
2.4Sgr
Determining the initial image, , and the initial momentum, , of Eq. is computationally costly. However, in simple geodesic regression, the initial image is fixed to the first image of a subject’s longitudinal image set (leftmost part of Fig. ?). Furthermore, the similarity measure is chosen as the geodesic distance between images and approximated so that the geodesic regression problem can be solved by computing pairwise image registrations with respect to the first image. Specifically, we define the quadratic distance between two images and as
Assume we have an image at time as well as two images and . Further, assume that the spatial transformation maps to and maps to . Then and . Furthermore, assume that maps to , i.e., . Then . Assuming that the geodesic between and is parameterized by the initial velocity and between and by the initial velocity and that we travel between and in time (and similarly for ) we can rewrite the map between and based on the exponential map as
which can be approximated to first order as
Hence, the squared geodesic distance between the two images can be approximated as
where and . Hence, Eq. becomes
where is the soughtfor initial momentum of the regression geodesic and are the initial momenta corresponding to the geodesic connecting (the starting image of the geodesic) and the measurements in time . Differentiating Eq. w.r.t. results in
Thus,
In practice, is very small and can thus be omitted. Furthermore, is obtained by either registering to in unit time or, as in our FPSGR approach, by predicting the momenta via FPIR, denoted as . As Equation 4 was derived assuming that images are transformed into each other in time instead of unit time, the obtained unittime predicted momenta correspond in fact to the approximation . Finally, we obtain the approximated optimal of the energy functional in Eq. , for a fixed as
3Setup / Training
All our experiments use 3D images from the ADNI
datasetADNI1
contains 3479 images from 833 subjects and ADNI2
contains 2992 images from 823 subjects. Images belong to various types of diagnostic categories which we will discuss later.
We perform the following two types of studies:
 Registration
We assess our hypothesis that training FPIR on longitudinal data for longitudinal registrations is preferred over training using crosssectional data. Vice versa, training FPIR on crosssectional data for crosssectional registrations is preferred over training using longitudinal data. Comparisons are with respect to registration results obtained by numerical optimization (i.e., LDDMM).
 Regression
As for regression, we compare linear models fitted to atrophy scores over time, where scores are either obtained from FPSGR or optimizationbased SGR. Additionally, we study correlations between atrophy scores and diagnostic groups. Our hypothesis is that FPSGR is accurate enough to achieve comparable performance to optimizationbased SGR, at much lower computational cost, in both situations.
3.1Training of the prediction models
We use a randomly selected set of 120 patients’ MRI images from ADNI
for training the prediction models and to test the performance of FPIR. We use all of the ADNI
data for our regression experiments.
Training for registration. We randomly selected 120 subjects from ADNI1
and registered their baseline images to their 24 month followup images. We used the first 100 subjects for training and the remaining 20 subjects for testing. For longitudinal training, we registered the baseline image of a subject to the subject’s 24month image. For crosssectional training, we registered a subject’s baseline image to another subject’s 24month image. To assess the performance of prediction models trained on these two types of paired data, we (1) perform the same type of registrations on the heldout 20 subjects and (2) compare the 2norm of the deformation error computed from the output of the prediction models with respect to the result obtained by numerical optimization of LDDMMADNI
data, we only train our 3D models using longitudinal registrations in the following.
Training for regression. The ADNI1
dataset contains 228 normal controls, 257 subjects with mild cognitive impairment (MCI), 149 with late mild cognitive impairment (LMCI), as well as 199 subjects suffering from Alzheimer’s disease (AD). We randomly picked roughly 1/6 of patients from each diagnostic category to form a set of 139 subjects for training in ADNI1
, i.e., 38 normal controls, 43 MCI, 25 LMCI, as well as 33 AD subjects; this results in 139 subjects overall. The baseline images of each subject were registered to all the later timepoints within the same subject. To maintain the diagnostic ratio, we picked (out of all registrations) 45 registrations from the normal group, 50 registrations from the MCI group, 30 registrations from the LMCI group, and 40 registrations from the AD group, resulting in 165 longitudinal registration cases for training.
The same strategy was applied to ADNI2
. In detail, ADNI2
contains 200 normal controls, 111 subjects with significant memory complaint (SMC), 182 subjects with early mild cognitive impairment (EMCI), 175 with late mild cognitive impairment (LMCI), and 155 subjects with Alzheimer’s disease (AD). We picked 150 subjects and 140 longitudinal registrations, consisting of 35 registrations from the control group, 20 registrations from the SMC group, 30 registrations from the EMCI group, 30 registrations from the LMCI group, and 25 registrations from the AD group. Note that there are fewer registrations than subjects (140 vs. 150) in this setup, as our priority is to maintain the overall diagnostic ratio.
For both, ADNI1
and ADNI2
, the remaining 5/6 of the data is used for testing. We trained four prediction models and their four corresponding correction models, leading to eight prediction models in total, listed in Table ?. We also note that the training sets within ADNI1
and ADNI2
, resp., were not overlapping.
6mo  12mo  18mo  24mo  36mo  48mo  
NC  182  172  8  151  128  38 
MCI  274  221  165  122  80  11 
AD  153  173  66  163  69  20 
Total 
609  566  239  436  277  69 
6mo  12mo  18mo  24mo  36mo  48mo  
NC  182  168  9  144  119  33 
MCI  272  224  169  124  70  10 
AD  152  168  64  160  67  22 
Total 
606  560  242  428  256  65 
3mo  6mo  12mo  24mo  36mo  
NC  173  141  153  119  3 
MCI  256  232  207  142  4 
AD  93  95  105  66  1 
Total 
522  468  465  327  8 
3mo  6mo  12mo  24mo  36mo  
NC  172  142  159  122  3 
MCI  257  230  202  149  4 
AD  94  98  101  52  1 
Total 
523  470  462  323  8 
3.2Parameter selection
We use the regularization kernel
with set to . The parameter , from equation , is set to . We train our network (using ADAM
[?]) over 10 epochs with a learning rate of .
3.3Efficiency
Once trained, the prediction models allow fast computations of registrations. We use a TITAN X (Pascal) GPU and PyTorch
ADNI1
dataset ( MR images), FPSGR took about one day to predict 2646 pairwise registrations (i.e., 25 [s]/prediction) and to compute the regression result. Optimizationbased LDDMMADNI2
is identical to ADNI1
as the images have the same spatial dimension.
Compared to thestateofart fast geodesic regression model [?], FPSGR is also at least twice as fast. The model in [?] achieves times speedup compared with SGR [?] for the same setting (parallel computing with the same number of cores). In our case, we achieve a more than 40 times speedup compared with SGR for the same setting (a single GPU).
4Experimental results for 3D ADNI data
For our experiments, we created 10 different (dataset, registration approach) combinations, each combination specifically designed to assess certain properties of our proposed strategy. These combinations are described next.
All subjects from the
ADNI1
dataset in combination with optimizationbased LDDMM.Two subgroups of
ADNI1
(i.e., different training data portions) in combination with FPSGR without a correction network.The same two subgroups as in 2), but in combination with FPSGR with a correction network.
The same five groups of 13, but for
ADNI2
.
Our general hypothesis is that the prediction models (for ADNI1/2
) show similar performance to optimizationbased LDDMM and that using the correction network for the predictions improves results. To assess differences, we compare differences in deformations. Specifically, for every deformation produced by the different approaches, we compute its Jacobian determinant (JD). The JDs are then warped to a common coordinate system for the entire ADNI
dataset using existing nonlinear deformations from [?]. Each such spatially normalized JD is then averaged within a region where the rate of atrophy is significantly associated with Alzheimer’s disease (AD), i.e., within a statistical region of interest (statROI) (see Figure 2). Specifically, we quantify atrophy as
where denotes the determinant and the cardinality/size of a set; is an area in the temporal lobes which was determined in prior studies [?] to be significantly associated with accelerated atrophy in Alzheimer’s disease. The resulting scalar value is an estimate of the relative volume change experienced by that region between the baseline and a followup image. Hence, its sign is positive when the region has lost volume over time and is negative if the region has gained volume over time.
We limited our experiments to the applications in [?], wherein nonlinear registration/regression is used to quantify atrophy within regions known to be associated to varying degrees with AD (), mild cognitive impairment (MCI) () (including LMCIADNI1
. For ADNI2
, there are also 3 diagnostic categories
Specifically, we investigate the following six questions:
Can the prediction models for regression qualitatively capture similar trends to the regression model obtained by numerical optimization? (Sec. Section 4.1)
Are atrophy measurements derived from FPSGR biased to overestimate or underestimate volume changes? (Sec. Section 4.2)
Are FPSGR atrophy measurements consistent with those derived from deformations via numerical optimization (LDDMM) which produced the training dataset? (Sec. Section 4.3)
Are regression results more stable and hence capture trends better than pairwise registrations? (Sec. Section 4.4)
Is the predictive power of the regression models strong enough to forecast deformations for unseen future timepoints (Sec. Section 4.5)
Do the prediction results capture expected trends in deformation? (Sec. Section 4.6)
If these experiments resolve favorably, then the substantially improved computational efficiency of FPSGR justifies its use for largescale imaging studies. Tables Table 2 and Table 3 show the distributions of the prediction cases per timepoint and the diagnostic groups in ADNI1
and ADNI2
, respectively.
4.1Regression results
Table 1 indicates that FPIR can predict deformation fields similar to the ones obtained using optimizationbased LDDMM, even for the subtle changes seen in longitudinal imaging data. However, it remains to be seen how a predictive model performs for image regression. Figure 3 shows an exemplary regression result. In this specific case, large changes can be observed around the ventricles. To illustrate differences between the methods, Figure 3 shows regression results based on optimizationbased LDDMM, for FPSGR without a correction network, and for FPSGR with a correction network. All three methods successfully capture the expanding ventricles and generally capture the image changes. Both FPSGR methods show results that are highly similar to SGR using optimizationbased LDDMM. Hence, FPSGR is useful for longitudinal image regression. To further quantify the regression accuracy, we compute the overlay error between measured images and the images on the geodesic as
where is the brain area, is the regressed image at time and is the measured image at time . Table 4 shows the overlay error for the population of 100 subjects which includes all diagnostic groups in ADNI1
. Both FPSGR methods obtain results comparable with optimizationbased LDDMM. This justifies the use of the proposed methods. The correction network generally increases the prediction accuracy over using the prediction network only.
Measured Images 

Original  0.0770 0.0212  0.0764 0.0207  0.0890 0.0220  0.0810 0.0223  0.0899 0.0341  0.0940 0.0415 
LDDMM  0.0750 0.0194  0.0686 0.0176  0.0734 0.0190  0.0609 0.0168  0.0628 0.0177  0.0663 0.0221 
Pred1  0.0754 0.0213  0.0694 0.0182  0.0742 0.0195  0.0621 0.0188  0.0654 0.0184  0.0698 0.0238 
Pred+Corr1  0.0754 0.0211  0.0691 0.0182  0.0734 0.0192  0.0615 0.0166  0.0642 0.0188  0.0688 0.0235 
4.2Bias
Slope  Intercept  #data  

NCNC 
LDDMM1  [0.62, 0.70, 0.78]  [0.25,0.08, 0.09]  154 
Pred1  [0.37, 0.44, 0.50]  [0.21, 0.08, 0.05]  
Pred+Corr1  [0.61, 0.68, 0.75]  [0.15, 0.01, 0.13]  
LDDMM2  [0.57, 0.66, 0.75]  [0.21, 0.04, 0.14]  156 

Pred2  [0.43, 0.50, 0.57]  [0.16, 0.02, 0.11]  
Pred+Corr2  [0.51, 0.58, 0.65]  [0.12, 0.01, 0.15]  
NCMCI 
LDDMM1  [0.72, 0.94, 1.16]  [0.45, 0.03, 0.39]  24 
Pred1  [0.39, 0.58, 0.78]  [0.43, 0.05, 0.33]  
Pred+Corr1  [0.71, 0.90, 1.10]  [0.40, 0.01, 0.37]  
LDDMM2  [0.88, 1.19, 1.50]  [0.65, 0.05, 0.55]  22 

Pred2  [0.72, 0.99, 1.26]  [0.68, 0.16, 0.36]  
Pred+Corr2  [0.80, 1.07, 1.34]  [0.66, 0.14, 0.38]  
MCIMCI 
LDDMM1  [0.97, 1.17, 1.38]  [0.28, 0.05, 0.39]  146 
Pred1  [0.65, 0.80, 0.96]  [0.29, 0.03, 0.22]  
Pred+Corr1  [0.92, 1.09, 1.26]  [0.14, 0.14, 0.42]  
LDDMM2  [0.83, 1.00, 1.17]  [0.21, 0.06, 0.33]  148 

Pred2  [0.69, 0.82, 0.96]  [0.20, 0.02, 0.24]  
Pred+Corr2  [0.77, 0.90, 1.04]  [0.15, 0.07, 0.29]  
MCINC 
LDDMM1  [0.48, 0.72, 0.96]  [0.85, 0.42, 0.01]  16 
Pred1  [0.26, 0.44, 0.62]  [0.61, 0.29, 0.03]  
Pred+Corr1  [0.51, 0.68, 0.86]  [0.52, 0.20, 0.13]  
LDDMM2  [0.54, 0.79, 1.03]  [0.79, 0.36, 0.07]  17 

Pred2  [0.40, 0.61, 0.83]  [0.62, 0.24, 0.14]  
Pred+Corr2  [0.49, 0.70, 0.91]  [0.59, 0.21, 0.17]  
MCIAD 
LDDMM1  [1.94, 2.10, 2.27]  [0.28, 0.02, 0.31]  148 
Pred1  [1.28, 1.40, 1.53]  [0.24, 0.02, 0.20]  
Pred+Corr1  [1.70, 1.84, 1.98]  [0.17, 0.08, 0.33]  
LDDMM2  [1.75, 1.92, 2.09]  [0.16, 0.14, 0.44]  147 

Pred2  [1.42, 1.56, 1.70]  [0.11, 0.14, 0.39]  
Pred+Corr2  [1.49, 1.64, 1.78]  [0.08, 0.17, 0.43]  
ADAD 
LDDMM1  [1.97, 2.33, 2.69]  [0.17, 0.27, 0.70]  143 
Pred1  [1.23, 1.50, 1.77]  [0.13, 0.21, 0.54]  
Pred+Corr1  [1.74, 2.05, 2.35]  [0.04, 0.33, 0.70]  
LDDMM2  [1.92, 2.28, 2.65]  [0.20, 0.24, 0.68]  140 

Pred2  [1.56, 1.85, 2.15]  [0.13, 0.22, 0.57]  
Pred+Corr2  [1.65, 1.95, 2.24]  [0.10, 0.25, 0.60]  
Slope  Intercept  
NCNC 
LDDMM1  [0.55, 0.65, 0.75]  [0.08, 0.03, 0.13]  170 
Pred1  [0.41, 0.48, 0.55]  [0.03, 0.04, 0.12]  
Pred+Corr1  [0.50, 0.57, 0.65]  [0.04, 0.05, 0.13]  
LDDMM2  [0.51, 0.62, 0.72]  [0.10, 0.01, 0.12]  175  
Pred2  [0.47, 0.55, 0.62]  [0.03, 0.05, 0.13]  
Pred+Corr2  [0.35, 0.44, 0.52]  [0.09, 0.00, 0.08]  
NCMCI 
LDDMM1  [0.56, 0.79, 1.02]  [0.22, 0.01, 0.25]  16 
Pred1  [0.53, 0.68, 0.82]  [0.14, 0.01, 0.16]  
Pred+Corr1  [0.63, 0.80, 0.97]  [0.16, 0.02, 0.19]  
LDDMM2  [0.62, 0.90, 1.18]  [0.32, 0.02, 0.28]  17  
Pred2  [0.58, 0.77, 0.97]  [0.19, 0.01, 0.22]  
Pred+Corr2  [0.46, 0.68, 0.91]  [0.25, 0.02, 0.22]  
MCIMCI 
LDDMM1  [0.71, 0.83, 0.94]  [0.13, 0.00, 0.12]  184 
Pred1  [0.53, 0.61, 0.68]  [0.06, 0.02, 0.10]  
Pred+Corr1  [0.64, 0.73, 0.82]  [0.08, 0.02, 0.11]  
LDDMM2  [0.71, 0.82, 0.92]  [0.14, 0.02, 0.09]  183  
Pred2  [0.58, 0.66, 0.73]  [0.05, 0.03, 0.12]  
Pred+Corr2  [0.50, 0.59, 0.67]  [0.12, 0.02, 0.07]  
MCINC 
LDDMM1  [0.03, 0.39, 0.74]  [0.38, 0.05, 0.47]  16 
Pred1  [0.05, 0.29, 0.52]  [0.24, 0.05, 0.33]  
Pred+Corr1  [0.08, 0.36, 0.64]  [0.28, 0.05, 0.38]  
LDDMM2  [0.14, 0.40, 0.67]  [0.28, 0.04, 0.35]  21  
Pred2  [0.24, 0.42, 0.61]  [0.17, 0.05, 0.28]  
Pred+Corr2  [0.05, 0.26, 0.48]  [0.22, 0.03, 0.29]  
MCIAD 
LDDMM1  [1.65, 1.95, 2.25]  [0.21, 0.13, 0.47]  70 
Pred1  [1.09, 1.27, 1.46]  [0.12, 0.09, 0.30]  
Pred+Corr1  [1.39, 1.62, 1.85]  [0.15, 0.11, 0.37]  
LDDMM2  [1.59, 1.91, 2.23]  [0.16, 0.19, 0.53]  65  
Pred2  [1.15, 1.35, 1.56]  [0.09, 0.14, 0.36]  
Pred+Corr2  [1.20, 1.45, 1.69]  [0.13, 0.14, 0.41]  
ADAD 
LDDMM1  [2.49, 2.76, 3.04]  [0.15, 0.07, 0.30]  101 
Pred1  [1.74, 1.90, 2.07]  [0.09, 0.04, 0.18]  
Pred+Corr1  [2.14, 2.34, 2.54]  [0.09, 0.08, 0.24]  
LDDMM2  [2.72, 2.99, 3.27]  [0.15, 0.07, 0.29]  103  
Pred2  [1.97, 2.14, 2.31]  [0.07, 0.07, 0.21]  
Pred+Corr2  [2.16, 2.36, 2.56]  [0.15, 0.02, 0.18]  
Estimates of atrophy are susceptible to bias [?]. To quantitatively assess this potential bias, we separately considered different diagnostic groups. Specifically, we considered six diagnostic change groups in our experiments: (1) NC for all time points (NCNC), (2) starting with NC and changing to MCI or AD at a later time point (NCMCI), (3) MCI for all time points (MCIMCI), (4) starting with MCI and reversing to NC at later time points (MCINC), (5) starting with MCI and changing to AD at later time points (MCIAD), and (6) AD for all the time points (ADAD)ADNI1
and ADNI2
, respectively. LDDMM1 and LDDMM2 denote the optimizationbased results split into the same testing groups used for Pred1 and Pred2 to allow for a direct comparison. All of the results show intercepts that are near zero relative to the range of changes observed and all prediction intercept confidence intervals contain zero. For all diagnostic change groups the prediction and prediction+correction models exhibit more stable results than the optimizationbased LDDMM method as indicated by the tighter confidence intervals. Furthermore, all slopes are positive, indicating average volume loss over time. This is consistent with expectations for an aging and neurodegenerative population. The slopes capture increasing atrophy with disease severity. In ADNI1
/ADNI2
, we expect and all six experimental groups (i.e. LDDMM1, Pred1, Pred+Corr1, LDDMM2, Pred2, and Pred+Corr2) are generally consistent with this expectation. Exceptions happen in ADNI2
for the NCMCI and MCINC cases. As the number of subjects involved is relatively small, i.e., fewer than 20, compared with the other cases (roughly 100), one may speculate that this observation is caused by the limited number of data points for NCMCI and MCINC as shown in the #data column of Table 5. However, the behavior within each starting diagnostic category, is consistent, i.e., for NC and for MCI . Hence, all six groups’ slope results in ADNI1/ADNI2
are generally consistent with our expectation (and also consistent with results in [?]). The slope estimated from the prediction+correction results is larger than the slope estimated from the prediction model results and closer to the slope obtained from the optimizationbased LDDMM results. This indicates that the correction network can improve prediction accuracy. Figure 5 shows linear regression results for the estimated atrophy scores in ADNI1/2
for the Pred+Corr1 model. Both the data points themselves (i.e., the atrophy scores), as well as kernel density estimates for the linear trends for each subject are shown. These results are consistent with the results of Table 5 discussed above. We conclude that (1) neither LDDMM optimization nor FPSGR produced deformations with significant bias to overestimate or underestimate volume change; (2) a linear model of atrophy scores generated by FPSGR can capture intrinsic volume change (i.e., slope) among different diagnostic change groups. Note that our LDDMM optimization results and the prediction results show the same trends. Further, they are directly comparable as the results are based on the same test images (also for the atrophy measurements).
4.3Atrophy
MMSE  value  DX  value  #data  

6mo 
LDDMM1  0.4957  5.17e39  0.5140  2.66e42  608 
Pred1  0.4642  8.09e34  0.4754  1.30e35  
Pred+Corr1  0.5104  1.22e41  0.5259  1.53e44  
LDDMM2  0.4667  4.17e34  0.4814  1.75e36  606 

Pred2  0.4711  8.48e35  0.4849  4.58e37  
Pred+Corr2  0.4734  3.54e35  0.4890  9.67e38  
12mo 
LDDMM1  0.5749  5.23e51  0.5313  1.81e42  565 
Pred1  0.5328  9.46e43  0.4898  1.97e35  
Pred+Corr1  0.5799  4.39e52  0.5406  3.44e44  
LDDMM2  0.5301  6.81e42  0.5055  1.17e37  560 

Pred2  0.5351  9.79e43  0.5120  1.11e38  
Pred+Corr2  0.5374  3.73e43  0.5155  2.89e39  
18mo 
LDDMM1  0.4939  4.86e16  0.4776  5.76e15  238 
Pred1  0.4659  3.18e14  0.4313  3.37e12  
Pred+Corr1  0.4924  6.16e16  0.4643  3.98e14  
LDDMM2  0.4385  9.50e13  0.4000  1.12e10  241 

Pred2  0.4389  9.06e13  0.3818  8.80e10  
Pred+Corr2  0.4384  9.75e13  0.3790  1.19e9  
24mo 
LDDMM1  0.6064  5.01e45  0.5978  1.69e43  435 
Pred1  0.5664  2.83e38  0.5607  2.18e37  
Pred+Corr1  0.6001  6.55e44  0.5943  6.82e43  
LDDMM2  0.5822  4.11e40  0.5534  1.24e35  427 

Pred2  0.5911  1.41e41  0.5714  2.26e38  
Pred+Corr2  0.5898  2.28e41  0.5709  2.65e38  
36mo 
LDDMM1  0.5142  4.29e20  0.5300  1.81e21  277 
Pred1  0.4731  7.38e17  0.4926  2.42e18  
Pred+Corr1  0.5069  1.71e19  0.5296  1.99e21  
LDDMM2  0.4334  3.79e13  0.4815  2.93e16  256 

Pred2  0.4425  1.07e13  0.4894  7.99e17  
Pred+Corr2  0.4393  1.67e13  0.4863  1.34e16  
48mo 
LDDMM1  0.7456  2.01e13  0.6635  5.20e10  69 
Pred1  0.7294  1.18e12  0.6458  2.08e9  
Pred+Corr1  0.7443  2.30e13  0.6575  8.43e10  
LDDMM2  0.6889  2.25e10  0.5927  1.98e7  65  
Pred2  0.6995  9.08e11  0.6048  9.49e8  
Pred+Corr2  0.7005  8.31e11  0.6067  8.49e8  
MMSE  value  DX  value  #data  
3mo 
LDDMM1  N/A  N/A  0.4254  2.34e24  522 
Pred1  N/A  N/A  0.4142  4.72e23  
Pred+Corr1  N/A  N/A  0.4353  1.52e25  
LDDMM2  N/A  N/A  0.4409  2.77e26  523 

Pred2  N/A  N/A  0.4280  1.05e24  
Pred+Corr2  N/A  N/A  0.4445  9.64e27  
6mo 
LDDMM1  0.4989  8.01e31  0.4688  6.09e27  468 
Pred1  0.4768  6.22e28  0.4625  3.47e26  
Pred+Corr1  0.5128  9.64e33  0.4846  6.19e29  
LDDMM2  0.5072  4.29e32  0.4883  1.58e29  470 

Pred2  0.4718  2.02e27  0.4742  9.96e28  
Pred+Corr2  0.5066  5.25e32  0.4913  6.33e30  
12mo 
LDDMM1  0.4756  1.43e27  0.4859  7.22e29  464 
Pred1  0.4530  7.32e25  0.4771  9.39e28  
Pred+Corr1  0.4908  1.67e29  0.5064  1.37e31  
LDDMM2  0.4937  1.07e29  0.5026  7.05e31  461 

Pred2  0.4626  7.94e26  0.4913  2.21e29  
Pred+Corr2  0.4987  2.35e30  0.5149  1.44e32  
24mo 
LDDMM1  0.4120  9.53e15  0.4476  2.06e17  325 
Pred1  0.3670  8.51e12  0.4331  2.71e16  
Pred+Corr1  0.4109  1.15e14  0.4632  1.09e18  
LDDMM2  0.4095  2.09e14  0.4375  1.93e16  321 

Pred2  0.3411  3.46e10  0.3940  2.29e13  
Pred+Corr2  0.3943  2.20e13  0.4336  3.79e16  
36mo 
LDDMM1  0.2474  0.55  0.2869  0.49  8 
Pred1  0.2474  0.55  0.2869  0.49  
Pred+Corr1  0.2474  0.55  0.2869  0.49  
LDDMM2  0.0935  0.83  0.1695  0.69  8 

Pred2  0.0935  0.83  0.1695  0.69  
Pred+Corr2  0.0935  0.83  0.1695  0.69  
MMSE  LDDMM  Pred  Pred+Corr 
LDDMM  N/A  0.1507  0.5361 
Pred  0.1507  N/A  0.0183 
Pred+Corr  0.5361  0.0183  N/A 
MMSE  LDDMM  Pred  Pred+Corr 
LDDMM  N/A  0.0005484  0.09469173 
Pred  0.9994516  N/A  0.9999718 
Pred+Corr  0.0530827  0.0000282  N/A 
DX  LDDMM  Pred  Pred+Corr 
LDDMM  N/A  0.1963  0.2356 
Pred  0.1963  N/A  0.3208 
Pred+Corr  0.2356  0.3208  N/A 
DX  LDDMM  Pred  Pred+Corr 
LDDMM  N/A  0.0010944  0.9813582 
Pred  0.9989056  N/A  0.9999869 
Pred+Corr  0.0186418  0.0000131  N/A 
Atrophy estimates have also been shown to correlate with clinical variables [?]. To quantify this effect, we computed the Spearman rankorder correlationADNI1/2
, we randomly selected 200
To further explore the correlations of atrophy with MMSE scores, we visualize them separated by diagnostic group where diagnosis did not change (i.e., NCNC, MCIMCI, ADAD) in Figure 7. For the ADNI1
dataset, we observe (as expected) very low correlations for the normal diagnostic group (with no clear trend), and much stronger correlations for the MCI and AD groups. MCI and AD also exhibit increasingly stronger correlations with time. In case of ADNI2
, the MCI group shows modest correlations, which remain consistent across time. Correlations are relatively low for the normal groups. The AD groups show increasingly strong correlations over time. In contrast to ADNI1
, ADNI2
focuses mainly on earlier stages of the diagnostic groups [?]. Hence, the deformations in ADNI2
are generally smaller than in ADNI1
. This may explain why the NC and MCI diagnostic groups show consistent correlation values over time (instead of stronger correlations as for AD in ADNI2
or the MCI and AD groups in ADNI1
).
Slope  Intercept  

NCNC 
SGR Pred1  [0.37, 0.44, 0.50]  [0.21, 0.08, 0.05] 
Pairwise Pred1  [0.44, 0.52, 0.60]  [0.46, 0.30, 0.14]  
SGR Pred2  [0.43, 0.50, 0.57]  [0.16, 0.02, 0.11]  
Pairwise Pred2  [0.48, 0.57, 0.65]  [0.34, 0.18, 0.01]  
NCMCI 
SGR Pred1  [0.39, 0.58, 0.78]  [0.43, 0.05, 0.33] 
Pairwise Pred1  [0.39, 0.63, 0.87]  [0.63, 0.16, 0.30]  
SGR Pred2  [0.72, 0.99, 1.26]  [0.68, 0.16, 0.36]  
Pairwise Pred2  [0.65, 0.96, 1.27]  [0.69, 0.10, 0.50]  
MCIMCI 
SGR Pred1  [0.65, 0.80, 0.96]  [0.29, 0.03, 0.22] 
Pairwise Pred1  [0.69, 0.86, 1.03]  [0.43, 0.15, 0.12]  
SGR Pred2  [0.69, 0.82, 0.96]  [0.20, 0.02, 0.24]  
Pairwise Pred2  [0.70, 0.85, 1.01]  [0.29, 0.04, 0.21]  
MCINC 
SGR Pred1  [0.26, 0.44, 0.62]  [0.61, 0.29, 0.03] 
Pairwise Pred1  [0.21, 0.45, 0.68]  [0.74, 0.31, 0.12]  
SGR Pred2  [0.40, 0.61, 0.83]  [0.62, 0.24, 0.14]  
Pairwise Pred2  [0.29, 0.56, 0.83]  [0.61, 0.14, 0.34]  
MCIAD 
SGR Pred1  [1.28, 1.40, 1.53]  [0.24, 0.02, 0.20] 
Pairwise Pred1  [1.28, 1.42, 1.56]  [0.31, 0.06, 0.19]  
SGR Pred2  [1.42, 1.56, 1.70]  [0.11, 0.14, 0.39]  
Pairwise Pred2  [1.44, 1.60, 1.75]  [0.22, 0.06, 0.33]  
ADAD 
SGR Pred1  [1.23, 1.50, 1.77]  [0.13, 0.21, 0.54] 
Pairwise Pred1  [1.25, 1.55, 1.85]  [0.23, 0.13, 0.49]  
SGR Pred2  [1.56, 1.85, 2.15]  [0.13, 0.22, 0.57]  
Pairwise Pred2  [1.53, 1.85, 2.16]  [0.15, 0.23, 0.60]  
Slope  Intercept  
NCNC 
SGR Pred1  [0.41, 0.48, 0.55]  [0.03, 0.04, 0.12] 
Pairwise Pred1  [0.25, 0.33, 0.41]  [0.15, 0.24, 0.33]  
SGR Pred2  [0.47, 0.55, 0.62]  [0.03, 0.05, 0.13]  
Pairwise Pred2  [0.26, 0.35, 0.44]  [0.22, 0.32, 0.43]  
NCMCI 
SGR Pred1  [0.53, 0.68, 0.82]  [0.14, 0.01, 0.16] 
Pairwise Pred1  [0.37, 0.57, 0.77]  [0.06, 0.14, 0.33]  
SGR Pred2  [0.58, 0.77, 0.97]  [0.19, 0.01, 0.22]  
Pairwise Pred2  [0.42, 0.65, 0.88]  [0.07, 0.18, 0.42]  
MCIMCI 
SGR Pred1  [0.53, 0.61, 0.68]  [0.06, 0.02, 0.10] 
Pairwise Pred1  [0.43, 0.52, 0.61]  [0.04, 0.14, 0.23]  
SGR Pred2  [0.58, 0.66, 0.73]  [0.05, 0.03, 0.12]  
Pairwise Pred2  [0.45, 0.54, 0.63]  [0.09, 0.19, 0.29]  
MCINC 
SGR Pred1  [0.05, 0.29, 0.52]  [0.24, 0.05, 0.33] 
Pairwise Pred1  [0.10, 0.17, 0.45]  [0.12, 0.21, 0.53]  
SGR Pred2  [0.24, 0.42, 0.61]  [0.17, 0.05, 0.28]  
Pairwise Pred2  [0.03, 0.26, 0.49]  [0.02, 0.29, 0.57]  
MCIAD 
SGR Pred1  [1.09, 1.27, 1.46]  [0.12, 0.09, 0.30] 
Pairwise Pred1  [0.88, 1.10, 1.32]  [0.08, 0.33, 0.58]  
SGR Pred2  [1.15, 1.35, 1.56]  [0.09, 0.14, 0.36]  
Pairwise Pred2  [0.89, 1.13, 1.37]  [0.18, 0.44, 0.70]  
ADAD 
SGR Pred1  [1.74, 1.90, 2.07]  [0.09, 0.04, 0.18] 
Pairwise Pred1  [1.57, 1.77, 1.96]  [0.01, 0.17, 0.34]  
SGR Pred2  [1.97, 2.14, 2.31]  [0.07, 0.07, 0.21]  
Pairwise Pred2  [1.79, 1.99, 2.19]  [0.05, 0.21, 0.37]  
MMSE  value  DX  value  #data  

6mo 
SGR Pred1  0.4642  8.09e34  0.4754  1.30e35  608 
Pairwise Pred1  0.3138  2.31e15  0.3369  1.32e17  
SGR Pred2  0.4711  8.48e35  0.4849  4.58e37  606 

Pairwise Pred2  0.3431  3.51e18  0.3680  7.24e21  
12mo 
SGR Pred1  0.5328  9.46e43  0.4898  1.97e35  565 
Pairwise Pred1  0.4393  4.67e28  0.3996  4.51e23  
SGR Pred2  0.5351  9.79e43  0.5120  1.11e38  560 

Pairwise Pred2  0.4465  9.61e29  0.4154  1.00e24  
18mo 
SGR Pred1  0.4659  3.18e14  0.4313  3.37e12  238 
Pairwise Pred1  0.4164  2.12e11  0.3882  5.56e10  
SGR Pred2  0.4389  9.06e13  0.3818  8.80e10  241 

Pairwise Pred2  0.4078  4.52e11  0.3356  9.38e8  
24mo 
SGR Pred1  0.5664  2.83e38  0.5607  2.18e37  435 
Pairwise Pred1  0.5805  1.51e40  0.5791  2.55e40  
SGR Pred2  0.5911  1.41e41  0.5714  2.26e38  427 

Pairwise Pred2  0.5927  7.34e42  0.5811  6.26e40  
36mo 
SGR Pred1  0.4731  7.38e17  0.4926  2.42e18  277 
Pairwise Pred1  0.4470  5.20e15  0.4798  2.36e17  
SGR Pred2  0.4425  1.07e13  0.4894  7.99e17  256 

Pairwise Pred2  0.4538  2.08e14  0.4990  1.59e17  
48mo 
SGR Pred1  0.7294  1.18e12  0.6458  2.08e9  69 
Pairwise Pred1  0.7100  8.43e12  0.6168  1.67e8  
SGR Pred2  0.6995  9.08e11  0.6048  9.49e8  65  
Pairwise Pred2  0.6709  9.65e10  0.5924  2.01e7  
MMSE  pvalue  DX  pvalue  #data  
3mo 
SGR Pred1  N/A  N/A  0.4142  4.72e23  522 
Pairwise Pred1  N/A  N/A  0.1744  6.17e5  
SGR Pred2  N/A  N/A  0.4280  1.05e24  523 

Pairwise Pred2  N/A  N/A  0.1503  5.64e4  
6mo 
SGR Pred1  0.4768  6.22e28  0.4625  3.47e26  468 
Pairwise Pred1  0.3378  5.93e14  0.2633  7.29e9  
SGR Pred2  0.4718  2.02e27  0.4742  9.96e28  470 

Pairwise Pred2  0.3312  1.70e13  0.2849  3.14e10  
12mo 
SGR Pred1  0.4530  7.32e25  0.4771  9.39e28  464 
Pairwise Pred1  0.4305  2.34e22  0.4472  3.40e24  
SGR Pred2  0.4626  7.94e26  0.4913  2.21e29  461 

Pairwise Pred2  0.4223  2.30e21  0.4374  5.72e23  
24mo 
SGR Pred1  0.3670  8.51e12  0.4331  2.71e16  325 
Pairwise Pred1  0.3772  3.06e12  0.4515  9.99e18  
SGR Pred2  0.3411  3.46e10  0.3940  2.29e13  321 

Pairwise Pred2  0.3517  8.89e11  0.4239  1.99e15  
36mo 
SGR Pred1  0.2474  0.55  0.4536  0.26  8 
Pairwise Pred1  0.1650  0.70  0.2869  0.49  
SGR Pred2  0.0935  0.83  0.1695  0.69  8 

Pairwise Pred2  0.0935  0.83  0.2608  0.53  
ShapiroWilk normality test  Wilcoxon signedrank test  

MMSE  0.01943  0.0005226 
DX  0.03286  0.0005083 
To address the question how statROI specific measures behave over time, we explore how atrophy locally (i.e., voxelbyvoxel) correlates with MMSE. The local atrophy is defined as
I.e., each voxel in a statROI has an associated atrophy score. Figure 8 shows kernel density estimates of the highest 10% local correlations in a violin plot. For the ADNI1
MCI and AD groups, a clear shift toward stronger correlations can be observed over time, consistent with the boxplots of Figure 7. This indicates the progression of the disease. Correlations for the normal groups in ADNI 1/2
are mostly centered around a modest correlation (as expected). In ADNI2
, only the AD diagnostic group shows a shift towards stronger correlations over time. All the other diagnostic groups show a relatively consistent distribution over time. This is also consistent with Figure 7.
4.4Justification of SGR
For simple geodesic regression to be a useful model it should outperform pairwise image registration. The main conceptual difference is that the regression model will recover an average trend based on multiple image timepoints, i.e., the resulting regression geodesic will be a compromise between all the measurements. In contrast, for pairwise image registration (which can be seen as a trivial case of geodesic regression with two images only) the deformation will in general be able to match the target image well. However, just as in linear regression, this may accentuate the effects of noise. In both setups, images can be interpolated or extrapolated based on the estimated geodesic.
Tables Table 8 and Table 9 justify the use of SGR. Specifically, Table 8 shows linear regression results of atrophy measures over time as obtained via SGR (i.e., using an SGR fit over all timepoints followed by atrophy computations based on the deformations of the regression geodesic) compared with atrophy measures obtained by pairwise registration. For both the ADNI1
and the ADNI2
datasets, SGR outperforms the pairwise registration approach in two aspects: (1) the estimated intercept of SGR is generally closer to zero than for the pairwise method and the intercept 95% confidence interval is narrower; (2) 11 out of 24 of the 95% confidence intervals of the pairwise methods show bias to either overestimate or underestimate volume change, while none of the SGR results show such significant bias. Table 9 compares the correlations between atrophy and clinical measures (MMSE and diagnostic category) of SGR and the pairwise approach. SGR performs better than the pairwise approach in 13 out of 18 cases for MMSE and in 15 out of 20 cases for the diagnostic category. Furthermore, when the pairwise method is better than SGR, the difference is much smaller compared to the differences observed for the cases where SGR is better than the pairwise method. Also note that the pairwise method shows better performance in later months compared to earlier months. This could, for example, be because the deformations are larger for later timepoints and hence the registration result becomes more stable, or because SGR is also heavily influenced by the last timepoint. To address the above observation, we used a ShapiroWilk normality test and a Wilcoxon signedrank test. From Table ? we see that we can reject the nullhypothesis of normality and hence, a paired test is not appropriate. As an alternative, we conducted a Wilcoxon signedrank test to compare the SGR prediction model and the pairwise prediction model. Table ? shows that the SGR prediction model is statistically significantly better than the pairwise prediction model. Based on the above points, we conclude that SGR is more stable over time than the pairwise method and in general also results in stronger correlations.
4.5Forecasting
MMSE  pvalue  DX  value  #data  

60mo 
Forecast 
LDDMM1  0.5242  1.34e13  0.5157  3.85e13  173 
Pred1  0.4727  5.16e11  0.4816  1.98e11  
Pred+Corr1  0.5193  2.48e13  0.5240  1.38e13  
LDDMM2  0.4501  2.32e10  0.4761  1.43e11  180  
Pred2  0.4527  1.77e10  0.4620  6.63e11  
Pred+Corr2  0.4582  9.97e11  0.4652  4.73e11  
72mo 
Forecast 
LDDMM1  0.4607  1.60e10  0.4507  4.37e10  174 
Pred1  0.4132  1.45e8  0.4364  1.75e9  
Pred+Corr1  0.4615  1.47e10  0.4667  8.52e11  
LDDMM2  0.3662  3.18e7  0.4233  2.15e9  184  
Pred2  0.3793  1.09e7  0.4273  1.46e9  
Pred+Corr2  0.3793  1.09e7  0.4259  1.67e9  
84mo 
Forecast 
LDDMM1  0.3986  1.40e6  0.4108  6.17e7  137 
Pred1  0.3495  2.84e5  0.4018  1.13e6  
Pred+Corr1  0.3946  1.83e6  0.4211  2.98e7  
LDDMM2  0.3293  4.65e5  0.3622  6.53e6  147  
Pred2  0.3199  7.81e5  0.3629  6.25e6  
Pred+Corr2  0.3187  8.35e5  0.3609  7.12e6  
MMSE  pvalue  DX  pvalue  #data  

36mo 
Original 
LDDMM1  0.5142  4.29e20  0.5300  1.81e21  277 
Pred1  0.4731  7.38e17  0.4926  2.42e18  
Pred+Corr1  0.5069  1.71e19  0.5296  1.99e21  
Forecast 
Pred1  0.4583  1.09e15  0.4825  1.93e17  
Pred+Corr1  0.4708  1.42e16  0.4980  1.21e18  
Replace 
Pred1  0.4923  3.43e18  0.5104  1.21e19  
Pred+Corr1  0.5097  1.37e19  0.5375  5.47e22  
Original 
LDDMM2  0.4334  3.79e13  0.4815  2.93e16  256 

Pred2  0.4425  1.07e13  0.4894  7.99e17  
Pred+Corr2  0.4393  1.67e13  0.4863  1.34e16  
Forecast 
Pred2  0.4078  1.36e11  0.4398  1.95e13  
Pred+Corr2  0.4005  3.34e11  0.4301  7.40e13  
Replace 
Pred2  0.4202  2.75e12  0.4635  6.27e15  
Pred+Corr2  0.4164  4.51e12  0.4582  1.38e14  
48mo 
Original 
LDDMM1  0.7456  2.01e13  0.6635  5.20e10  69 
Pred1  0.7294  1.18e12  0.6458  2.08e9  
Pred+Corr1  0.7443  2.30e13  0.6575  8.43e10  
Forecast 
Pred1  0.6332  5.29e9  0.6165  1.70e8  
Pred+Corr1  0.6541  1.10e9  0.6317  5.86e9  
Replace 
Pred1  0.6446  2.27e9  0.6478  1.78e9  
Pred+Corr1  0.6668  3.98e10  0.6800  1.31e10  
Original 
LDDMM2  0.6889  2.25e10  0.5927  1.98e7  65  
Pred2  0.6995  9.08e11  0.6048  9.49e8  
Pred+Corr2  0.7005  8.31e11  0.6067  8.49e8  
Forecast 
Pred2  0.6528  3.79e9  0.5568  1.46e6  
Pred+Corr2  0.6403  9.25e9  0.5460  2.55e6  
Replace 
Pred2  0.6334  1.49e8  0.5970  1.53e7  
Pred+Corr2  0.6307  1.79e8  0.5973  1.50e7  
Another interesting question for SGR and geodesic regression in general is the suitability of the model for the data. To address this question, we evaluate if SGR can forecast unseen future timepoints. Specifically we consider this question in two different scenarios:
Extrapolateclinical: Can we extrapolate the SGR results into the future (to timepoints that do not exist in the ADNI image dataset, but for the clinical data) while still obtaining strong correlations.
Extrapolateimage: How well can correlations between atrophy and clinical measures be predicted for timepoints when we do or do not use image data at that very timepoint. We artificially leave out image measurements so that we can compare prediction results to results when we have the image measurement.
For both scenarios we use two different forecasting approaches. In the first approach (Forecast) we simply compute SGR results with the available image timepoints and then extrapolate using the resulting regression geodesic to the desired timepoint in the future. In the second approach (Replace), we artificially impute the missing image timepoints by simply replacing them by the image at the closest measured timepoint. For example, if we have images at 6, 12, and 18 month, but we want to forecast at 24 month, we use the 18 month image as the imputed 24 month image and then perform SGR on the 6, 12, 18, and the imputed 24 month images. We then obtain the deformation at 24 months from the SGR result.
Table 10 shows correlations between atrophy and the clinical measures for the Forecast results for 60 month, 72 month and 84 month. The resulting correlations of atrophy with diagnostic category are all above 0.3 (or below 0.3). Furthermore, the Forecast correlations show a downward trend with respect to time, which means that the prediction of “faraway“ points is not as accurate as for the “near” future. On the other hand, SGR using the 6 month to 48 month time points results in correlation around 0.5 for MMSE and 0.5 for DX on average. Hence the correlation with the diagnostic category is consistent for that of 60 months. In other words, using 6 month to 48 month data, our prediction model can predict accurately up to 60 month. Our prediction+correction network performs as well as and even slightly better than SGR using optimizationbased LDDMM. Figure 10 shows that these forecasting results capture the trends of the changes in the temporal lobes near the hippocampus and changes in the ventricles.
Table 11 and Figure 9 show Forecast and Replace results for correlations between atrophy and clinical measures in comparison to using all images. Specifically, for the Forecast and Replace results we did not use the available images at 36 and 48 month so we could compare against the results obtained when using these images. If FPSGR is a good model, it should results in correlation results as close to the correlation results using all images as possible. The Forecast correlations are only slightly weaker (0.02 to 0.05 lower) than the original correlations using all images illustrating that FPSGR can approximately forecast future changes.
The overall correlations in Table 11 show that the Replace group performs better than the Forecast group. In particular, we are also interested in the prediction of MCI converters, namely, MCI to NC, MCI to MCI, and MCI to AD. The boxplots in Figure 9 show the correlations for such predictions. The Replace group in Figure 9 show relatively worse correlation performance than the Forecast group in ADNI1 Pred1 and consistent performance in ADNI1 Pred2. Hence SGR on a longitudinal image data can achieves good forecasting result for MCI converters.
Thus, both Extrapolateclinical and Extrapolateimage experiments justify the use of FPSGR in predicting near future longitudinal trends especially for MCI converters.
4.6Jacobian Determinant (JD)
The average JD images qualitatively agree with prior results [?]: severity of volume change increases with severity of diagnosis and time. Change is most substantial in the temporal lobes near the hippocampus (see Figure 10). In Figure 10, 6 month to 48 month are existing data points; 60 month to 84 month are forecast results. Blue indicates volume loss. Red indicates expansion. Results are consistent with expectations: volume loss increases with time and severity of diagnosis in temporal lobes; volume expansion increases with respect to time and severity of diagnosis around the ventricles / cerebrospinal fluid. The forecast results capture visually sensible volume loss or expansion over time, qualitatively illustrating the performance of our method.
5Conclusion and future work
In this work, we proposed a fast approach for geodesic regression (FPSGR) to study longitudinal image data. FPSGR incorporates the recently proposed FPIR [?] into the SGR [?] framework, thus leading to a computationally efficient solution to geodesic regression. Since FPSGR replaces the computationally intensive intermediate step of computing pairwise initial momenta via a deeplearning prediction method, it is orders of magnitude faster than existing approaches [?], without compromising accuracy. Consequently, FPSGR facilitates the analysis of largescale imaging studies. Experiments on the ADNI1/ADNI2
datasets demonstrate that FPSGR captures expected atrophy trends of normal aging, MCI and AD. It further (1) exhibits negligible bias towards volume changes within statROIs, (2) shows high correlations with clinical variables (MMSE and diagnosis) and (3) produces consistent forecasting results on unseen data.
In future work, it will be be interesting to explore FPSGR for the task of classifying stable Mild Cognitive Impairment (sMCI) and progressive Mild Cognitive Impairment (pMCI). Currently, FPSGR only shows modest accuracy for distinguishing these types within MCI. Extending our approach to timewarped geodesic regression models [?] might improve the accuracy in this context. Furthermore, endtoend prediction of averaged initial momenta would be an interesting future direction, as this would allow learning representations that characterize the geodesic path among multiple timeseries images, not only based on averages of momenta for two images as in FPIR [?].
Acknowledgements
Research reported in this publication was supported by the National Institutes of Health (NIH) and the National Science Foundation (NSF) under award numbers NIH R01AR072013, NSF ECCS1148870, and EECS1711776. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the NSF. We also thank Nvidia for the donation of a TitanX GPU.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
References
Footnotes
 Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).
 LDDMM results are generated using a vector momentum formulation: https://bitbucket.org/scicompanat/vectormomentum
 http://pytorch.org
 Here, we used 300 fixed iterations for each registration. 300 iterations can guarantee almost all the results converge. Note that the optimizationbased LDDMM also uses a GPU implementation.
 We combine MCI and LMCI mainly because (a) the diagnostic changes available on the IDA website (https://ida.loni.usc.edu/login.jsp) only provide these three diagnostic groups; (b) to be consistent with the experiments conducted by Hua et al. [?], where only Normal, MCI and AD were used as labels to classify
ADNI1
. Hereafter, in all discussions ofADNI1
, MCI is a combination of MCI and LMCI ofADNI1
 Similar to
ADNI1
, a detailed diagnosis forADNI2
is only available for the baseline images; MR images at later time points are only labeled as NC, MCI, and AD. Thus, we combine SMC and NC, as well as EMCI and LMCI to be consistent with the diagnostic changes in the ADNI Diagnosis Summary available on the IDA website. Hereafter, in all discussions ofADNI2
, NC includes NC and SMC and MCI includes EMCI and LMCI.  In
ADNI1/ADNI2
, there are two patients who show a reversion from AD to MCI. We omitted these cases in our experiment because the number of such cases is too small.  We used Spearman rankorder correlation instead of Pearson correlation, because the diagnostic groups imply an ordering only.
 In
ADNI1
48 month, the number was 60 because there was not enough data;ADNI2
36 month was omitted due to lack of data.