# A Stochastic Large Deformation Model for Computational Anatomy

## Abstract

In the study of shapes of human organs using computational anatomy, variations are found to arise from inter-subject anatomical differences, disease-specific effects, and measurement noise. This paper introduces a stochastic model for incorporating random variations into the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. By accounting for randomness in a particular setup which is crafted to fit the geometrical properties of LDDMM, we formulate the template estimation problem for landmarks with noise and give two methods for efficiently estimating the parameters of the noise fields from a prescribed data set. One method directly approximates the time evolution of the variance of each landmark by a finite set of differential equations, and the other is based on an Expectation-Maximisation algorithm. In the second method, the evaluation of the data likelihood is achieved without registering the landmarks, by applying bridge sampling using a stochastically perturbed version of the large deformation gradient flow algorithm. The method and the estimation algorithms are experimentally validated on synthetic examples and shape data of human corpora callosa.

## 1Introduction

Computational anatomy (CA) concerns the modelling and computational analysis of shapes of human organs. In CA, observed shapes or images of shapes exhibit variations due to multiple factors, such as inter-subject anatomical differences, disease-specific effects, and measurement noise. These variations in anatomy occur naturally over all populations and they must be handled in cross-sectional studies. In addition, neurodegenerative diseases such as Alzheimer’s disease can cause temporal shape changes as the disease progresses. Finally, image acquisition and processing algorithms for extracting shape information can cause measurement variation in the estimated shapes.

While variation can be incorporated into shape models via different approaches, e.g. via random initial conditions as in the random orbit model [11] or in a mixed-effects setting [1], most models specify random variation relative to a base object, usually a template, and they involve some form of linearization. Here we will model these variations by inserting stochasticity directly into the nonlinear dynamics of shape transformations using Large Deformation Diffeomorphic Metric Mapping (LDDMM) [16].

In this paper, we use the approach of [2], based on the stochastic fluid models of [9], to introduce a model for incorporating stochastic variation into the shape deformation paths. This approach is particularly designed to be compatible with four geometric properties of LDDMM. Namely, (1) the noise is right-invariant in the same way as the LDDMM metric is right-invariant. (2) Evolution equations arise as extremal paths for a stochastic variational principle. (3) The Euler-Poincaré (EPDiff) equations in the deterministic case have stochastic versions. (4) Remarkably, the momentum maps arising via reduction by symmetry, and used to reduce dimensionality from the infinite-dimensional diffeomorphism group to finite-dimensional shape spaces, such as landmarks in deterministic LDDMM, still persist in this stochastic geometric setting.

**Plan.** After a short description of large deformation models in Section 2 and the LDDMM framework, we discuss the model of [2] in the CA context and use it to develop the stochastic equations for the landmark trajectories in Section 3. We then formulate two approaches for estimating the noise fields and initial shape configuration and momentum from prescribed data in Section 4. The first approach is a deterministic approximation of the Fokker-Planck equation that allows matching of moments of the transition distribution. The second is an Energy-Maximization (EM) algorithm that requires sampling of diffusion bridges. From the bridge simulation scheme, we obtain (see Figure 1) a stochastic version of the large deformation gradient flow algorithm, see e.g. [16]. In particular, we can estimate data likelihood by sampling bridges without registering or matching the landmarks. Finally, we validate the approach in experiments on synthetic and real data in Section 5 before providing outlook and concluding remarks in Section 6.

## 2Background

This section briefly reviews large deformation shape modelling and LDDMM, referring to the monograph [16] for full details. A shape, defined as either a subset , , or an image can be modified by warping the domain under the action of a diffeomorphism . The diffeomorphism naturally acts on the left on a shape by . For images, a left-action is obtained by letting . Changes in shape are then represented by an element in the diffeomorphism group. The subset is obtained as endpoints of flows

where is a time-dependent vector field contained in a space . is typically a reproducing kernel Hilbert space (RKHS) with inner product defined in terms of a reproducing kernel and with corresponding momentum operator . Under reasonable assumptions, the RKHS structure defines a Riemannian metric on by right-invariance, i.e. by defining for tangent vectors . The corresponding Riemannian metric is

where evolves according to . Elements of act on shapes, and descends to shape spaces by and similarly for images. Shapes can be matched by finding a minimising in this equation. In inexact matching of shapes, the distance is weighted against a dissimilarity measure. For images, the inexact matching problem becomes , where . Here controls the dissimilarity penalty, and is the dissimilarity measure, e.g. the difference . Optimal paths in the LDDMM framework satisfy the EPDiff equation , where and is the coadjoint action on the dual of the Lie algebra of vector fields on the plane, see [8] for more details.

For a set of landmarks at positions with momentum , the singular momentum map of [8] is given by . The corresponding deformation velocity can be written as . The ODE for the positions and momentum are the canonical Hamiltonian equations and , with Hamiltonian . The 1D versions of these equations go back to the dynamics of singular solutions of the Camassa-Holm equation [5].

## 3Stochastic landmark dynamics

Stochastic differential equations (SDEs) have previously been considered in context of LDDMM by [14]. More recently [12] introduced a stochastic landmark model with dissipation in the momentum equation which corresponds to a Langevin equation. This allowed then to use the technology of statistical mechanics and Gibbs measure to study this stochastic system. We will use here the stochastic model of [2], based on the fluid models of [9]. In short, the noise is introduced in the reconstruction relation in the following sense

where we have introduced arbitrary fields . Here, the symbol denotes stochastic evolution and , when adjacent to the process , means stochastic integrals are interpreted in the Stratonovich sense. Given a realisation of the noise, we impose the stochastic dynamics and seek the that minimises the cost functional in . A short computation gives the stochastic EPDiff equation . As in the deterministic case, the landmarks can be introduced, and their stochastic ODE dynamics are given by

This is a Hamiltonian stochastic process in the sense of Bismut [3], where the stochastic extensions of the Hamiltonian, or stochastic potentials, are the momentum maps whose Hamilton equations generate the cotangent lift of the stochastic infinitesimal transformations of the landmark paths.

More details about the derivation of these equations can be found in [2]. The noise can be seen as almost additive in the position equation, and it couples to the momentum with the gradient of the noise field in the momentum equation. This last term ensures that the solution corresponds to a stochastic path in the diffeomorphism group. By comparison with previous works such as [14], this model has its noise amplitudes fixed to the domain, and not to each landmark. The noise is thus relative to a Eulerian frame and is right-invariant similar to the metric.

### 3.1The Fokker-Planck equation

For each time , we will denote the transition probability density by , a function which integrates to and represents the probability of finding the stochastic process at a given position in the phase space. For a given initial distribution , it is possible to compute the partial differential equation which governs the evolution of in time. Using the canonical Hamiltonian bracket , we directly compute the Fokker-Planck equation in a compact form as

The right-hand side of this equation has two parts, the first is an advection with first order derivatives of , arising from the deterministic Hamiltonian dynamics of the landmarks. The second part is a diffusion term with second order derivatives of which arises only from the noise. This is the term which will describe the error in each landmark path subject to noise.

### 3.2Equivalent formulation with ‘Lagrangian noise’

The covariance between the stochastic displacements , conditioned on the position of two landmarks , may be computed as

where is the variance of the -th Brownian motion . Thus, for a finite time discretization of the process in Itô form and an increment , . The stochastic differential can therefore formally be viewed as a Gaussian process on . The matrix is symmetric. If is positive definite, we let be a square root, i.e. . The following stochastic landmark dynamics is then equivalent to the original dynamics

where runs on the landmarks and the spatial dimensions. Note that in , as compared to previously. With this approach, instead of starting with the spatial noise fields , we can specify the stochastic system directly in terms of the matrix . One natural choice is to set for some scalar kernel and where is the identity matrix. The possible reduction in dimensionality of the noise from to has computational benefits that we will exploit in the following.

## 4Estimation of Noise and Initial Conditions

We now aim for estimating a set of parameters for the model from sets of observed landmarks at time . The parameters can be both parameters for the noise fields as described below and the initial landmark configuration and momentum . The initial configuration can be considered an estimated template of the dataset. The template will be optimal in the sense of reproducing the moments of the distribution or in being fitted by maximum likelihood.

We use spatial noise fields of the form

where is the spacial direction of the noise , its centre, and the kernel is either Gaussian or cubic B-spline with scale . The Gaussian kernels simplify calculations of the moment equations. The B-spline representation has the advantage of providing a partition of unity when used in a grid. The model is not limited to this set of noise functions.

### 4.1Matching of Moments

Our first method relies on the Fokker-Planck equation of the landmarks and aims at minimising the cost functional

where is the initial mean momentum, the final mean position, the target sample mean position estimated from the observed landmarks, the centred sample covariance of the observations, the centered final covariance, and two parameters. The covariances are in components and are matrices if . For the norm, we used a normalised norm for each landmark such that they contribute equally in the sum. More explicitly, we have , and . This cost functional corresponds to the error in matching the mean and covariance of the final probability density .

It is not possible to solve the Fokker-Planck equation completely; so we will derive a set of ODEs that approximates the evolution of the mean and covariance by applying the standard cluster expansion method of quantum mechanics to the Fokker-Planck equation. We refer to [2] for the explicit forms of these equations. The expected values of the higher-order products are approximated by only the and variables, upon neglecting higher-order correlations such as . In order to capture the effect of the noise in the momentum equation, the other correlations such as and must be taken into account. Their equations of motion are directly computed by taking the derivative in the definition of the correlation. For simplicity, we have used Gaussian kernels for both the Hamiltonian and noise fields so that derivatives have simple forms, and we have approximated . Higher-order corrections of this approximation are possible, but they would not result in significantly better results in practice.

To avoid local minima in the minimization of , we use a global optimisation method based on genetic algorithms, the differential evolution method [13]. This method turns out to be rather successful for the examples presented below.

### 4.2Maximum Likelihood

We now derive a method for parameter estimation using maximum likelihood (ML). Upon assuming the landmark observations are independent, the likelihood for the set of unknown parameters satisfies where is the transition density at marginalized over and with parameters . We will denote by the law of the corresponding process. An ML estimate of is then . The classic EM algorithm [7] finds parameter estimates converging to a by iterating the following two steps:

We describe below a method for obtaining a Monte Carlo approximation of the conditional expectation given in .

### 4.3Stochastically Perturbed Gradient Flow

The landmark trajectory for and momentum for can be considered the missing data for estimating the parameters . The conditional expectation in amounts to marginalizing out the stochasticity when finding the expected log-likelihood of the full data. We approximate this marginalization by sampling diffusion bridges, i.e. producing sample paths conditioned on hitting at time T. In [6], a guidance scheme is constructed that modifies a general diffusion process for a generic variable conditioned on hitting a point at time by adding a term of the form to the SDE. This scheme allows sample based approximation of for functions of the original stochastic process by the formula for the modified process with a -dependent constant and a path-dependent correction factor . We need to modify this scheme for application to landmarks, primarily because the diffusion field in may not be invertible as required in the scheme of [6]. A scheme based on the pseudo-inverse can be derived, but it is computationally infeasible for high-dimensional problems. Instead, we construct the following landmark guidance scheme for the modified variables

Here, is a bounded approximation of the drift term in with the Itô correction; is an approximation of the landmark position at time , given their position at time ; and . The boundedness condition together with appropriate conditions on is necessary to show that but it is not needed in practice. As in [6], we can compute the correction factor for the modified process . Because of the multiplication on , the correction factor does not depend on the inverse or pseudo-inverse of and the scheme is thus computationally much more efficient.

Interestingly, the forcing term of is a time-rescaled gradient flow. This can already be seen in the original scheme of [6] by noticing that has the form of a gradient flow. In the present case, with appropriate conditions on the noise fields , we can define an admissible norm . The forcing term in is then a derivative of a gradient flow for the norm in the variable. The gradient is taken with respect to the predicted endpoint and it couples to the variable through . The flow is time rescaled with , , and . This rescaling slows done the time when the original time is close to . This makes sure that the system has enough time to converge and reach the target . Deterministic gradient flows are used in greedy matching algorithms using the LDDMM right-invariant metric as described in e.g. [16]. With the present stochastic flow, sampling allows efficient evaluation of the -function in EM and a similar evaluation of the data likelihood. In both cases, no matching or registration of the data is needed.

## 5Numerical examples

In this section, we will illustrate the estimation algorithms on simple synthetic test cases, and on a data set of corpora callosa shapes.

### 5.1Synthetic Test Case

This synthetic test case addresses matching between two ellipses discretized by 5 landmarks. We used a low number of landmarks here in order to have readable pictures. (The algorithms scale well and are not limited by the number of landmarks.) For the noise fields , we use a grid of by Gaussian noise fields in the region with a fixed scale corresponding to the distance between the grid points. For testing purposes, we let the bottom of the noise fields have larger amplitude in the direction, and the top fields have larger amplitudes in the direction. With these parameters, we produced sample landmark configurations from the model to estimate the sample mean and covariance of the landmarks at the final simulation time .

On the left panel of Figure ?, we display the result of the moment algorithm. In black are shown the density and variance in black ellipses as well as the original fields with the correct parameters. The estimated parameters of the noise fields after running the differential evolution algorithm are shown in blue. The algorithm performs the estimation given only the final sample mean and covariance of the sampled landmarks. The result is in good agreement for the final variances and shows small differences for the estimation of the parameters of the fields . The differences arise from three sources: the approximation used in the moment evolution; errors in the estimation of sample mean and variance; and the error in the solution of the minimisation algorithm. The minimisation algorithm may not have found the global minimum, but it did converge, as shown on the right panel of Figure ? where we display the value of the cost for each iteration of the genetic algorithm. Standard derivative-based algorithms would typically be caught in non-optimal local minima and thereby give worse results.

### 5.2Stochastic Gradient Flows

We consider matching two ellipses using the stochastically perturbed gradient flow discussed in Section 4.3. In Figure 2 (left), the initial set of landmarks is displayed along the stochastic path that is pulled by the gradient term to target set . The guidance is computed from the predicted endpoint . The prediction and the guidance term is showed at . The guidance attraction in becomes increasingly strong with time as . This enforces .

Compared to the scheme of [6], the use of the function to predict the endpoint from removes much of the coupling between the momentum and the guidance term. Without , the scheme would guide based solely on the difference . However, this would result in the scheme overshooting the target and producing samples of very low probability. The term on the guidance makes the scheme computationally feasible since the inverse of is not needed in the computation of the correction factor.

### 5.3Corpus Callosum Variation and Template Estimation

From a dataset of 65 corpus callosum shapes represented by 77 2D landmarks , , evenly distributed along the shape outline, here we aim at estimating the template and noise correlation represented in this case by a correlation matrix . The parameters of the model are . For the MLE, we use the ‘Lagrangian’ scheme with components of the spatial correlation matrix . The range is . We initialize with the Euclidean mean and run the EM algorithm for estimation of until convergence. The evolution of the variances is plotted in the right panel of Figure 3. On the left panel of Figure 3 shows the estimated template . The variance magnitude is plotted with arrows on each landmark and can be compared with the landmark-wise empirical variance from the dataset. The variance specified by is axis-aligned, and results in differences when the eigenvalues of the empirical variance are not axis-aligned.

This experiment is repeated in Figure ? with moment matching algorithm but with several differences of the model. First, the noise fields are the original spatially fixed Gaussian represented by red arrows. We also allow to be non-axis aligned and we placed them at the position of every landmarks. We then applied the genetic algorithms using only the landmarks at the same position as the noise fields, fixing the initial momenta to and the initial position to the mean positions of the landmarks. In addition to its rapid convergence, as shown in the right panel of Figure ?, the global minimisation algorithm also gave a reliable estimate of the final variance for all the landmarks, even when fewer landmarks and noise fields were used.

## 6Conclusion and Open Problems

We have presented and implemented a model for large deformation stochastic variations in computational anatomy. The extremal paths are governed by stochastic EPDiff equations which arise from a right-invariant stochastic variational principle, and admit reduction from the diffeomorphism group to lower dimensional shape spaces (landmarks).

We have shown that accurate estimation of the noise fields of the model can be achieved either by approximating the Fokker-Planck equations with a finite set of moments, or by Monte Carlo sampling and EM-estimation. We have derived and implemented the methods in both cases. The second approach introduces the concept of stochastically perturbed gradient flows for data likelihood evaluation.

It can be hypothesised that shape evolution of human organs under the influence of diseases do not follow smooth geodesics as in conventional models used in computational anatomy, but rather it exhibits stochastic variations in shape as the disease progresses. The approaches presented enable modelling of such variations. We expect to test this hypothesis on additional shape spaces and with multiple time point longitudinal shape datasets in future work.

#### Acknowledgements

We are grateful to M. Bruveris, M. Bauer, N. Ganaba C. Tronci and T. Tyranowski for helpful discussions of this material. AA acknowledges partial support from an Imperial College London Roth Award. AA and DH are partially supported by the European Research Council Advanced Grant 267382 FCCA held by DH. DH is also grateful for support from EPSRC Grant EP/N023781/1. SS is partially supported by the CSGB Centre for Stochastic Geometry and Advanced Bioimaging funded by a grant from the Villum foundation.

### References

**Towards a coherent statistical framework for dense deformable template estimation.**

S. Allassonnière, Y. Amit, and A. Trouvé.*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 69(1):3–29, 2007.**Stochastic Models of Large Deformations.**

Alexis Arnaudon, Darryl Holm, and Stefan Sommer.*in preparation*, 2016.**Mécanique aléatoire.**

J.-M. Bismut. In*Tenth Saint Flour Probability Summer School—1980 (Saint Flour, 1980)*, volume 929 of*Lecture Notes in Math.*, pages 1–100. Springer, Berlin-New York, 1982.**Bayesian data assimilation in shape registration.**

C J Cotter, S L Cotter, and F-X Vialard.*Inverse Problems*, 29(4):045011, 2013.**An integrable shallow water equation with peaked solitons.**

Roberto Camassa and Darryl D. Holm.*Phys. Rev. Lett.*, 71(11):1661–1664, 1993.**Simulation of conditioned diffusion and application to parameter estimation.**

Bernard Delyon and Ying Hu.*Stochastic Processes and their Appl.*, 116(11):1660–1675, 2006.**Maximum likelihood from incomplete data via the EM algorithm.**

A. P. Dempster, N. M. Laird, and D. B. Rubin.*JRSS, Series B*, 39(1):1–38, 1977.**Momentum maps and measure-valued solutions (peakons, filaments, and sheets) for the epdiff equation.**

Darryl D Holm and Jerrold E Marsden. In*The breadth of symplectic and Poisson geometry*, pages 203–235. Springer, 2005.**Variational principles for stochastic fluid dynamics.**

Darryl D Holm.*Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences*, 471(2176):20140963, 2015.**Conditioning diffusions with respect to partial observations.**

Jean-Louis Marchand.*arXiv*, 2011.**Statistical methods in computational anatomy.**

M. Miller, A. Banerjee, G. Christensen, S. Joshi, N. Khaneja, U. Grenander, and L. Matejic.*Statistical Methods in Medical Research*, 6(3):267–299, 1997.**Langevin equations for landmark image registration with uncertainty.**

Stephen Marsland and Tony Shardlow.*arXiv preprint arXiv:1605.09276*, 2016.**Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces.**

Rainer Storn and Kenneth Price.*Journal of global optim.*, 11(4):341–359, 1997.**Shape splines and stochastic shape evolutions: a second order point of view.**

Alain Trouvé and François-Xavier Vialard.*Quarterly of Applied Mathematics*, 70(2):219–251, 2012.**Extension to infinite dimensions of a stochastic second-order model associated with shape splines.**

François-Xavier Vialard.*Stochastic Processes and their Applications*, 123(6):2110–2157, 2013.*Shapes and Diffeomorphisms*.

Laurent Younes. Springer, 2010.