A Nonlinear Variational Approach to
MotionCorrected Reconstruction of Density Images
Abstract.
The aim of this paper is to establish a nonlinear variational approach to the reconstruction of moving density images from indirect dynamic measurements. Our approach is to model the dynamics as a hyperelastic deformation of an initial density including preservation of mass. Consequently we derive a variational regularization model for the reconstruction, which  besides the usual data fidelity and total variation regularization of the images  also includes a motion constraint and a hyperelastic regularization energy.
Under suitable assumptions we prove the existence of a minimizer, which relies on the concept of weak diffeomorphisms for the motion. Moreover, we study natural parameter asymptotics and regularizing properties of the variational model. Finally, we develop a computational solution method based on alternating minimization and splitting techniques, with a particular focus on dynamic PET. The potential improvements of our approach compared to conventional reconstruction techniques are investigated in appropriately designed examples.
1. Introduction
With current advances of imaging devices, in particular in biomedical imaging, dynamic studies including motion attract more and more attention. Prominent examples are live microscopy, MRI, and emission tomography (PET and SPECT). For the latter cardiac applications are particularly interesting, which are subject to cardiac and respiratory motion. In most of such applications a density image (e.g. the density of a radiochemical or fluorescent tracer), which is subject to motion between consecutive time steps, is to be reconstructed from indirect measurements. This problem shall be tackled with a novel variational approach in this paper.
The need for advanced motioncorrected reconstruction methods is caused by the fact that separate reconstructions in small time intervals  in which the motion effect is negliglible  are of inferior quality. The latter problem is either caused by severe undersampling (e.g. in MRI due to time restrictions in acquiring slice data) or the bad signaltonoise ratio (e.g. in optical imaging and emission tomography techniques being based on counting photons) in small time steps. Several recent approaches exploiting sparsity properties of the images to be reconstructed achieve strong improvements in these two situations (cf. MR compressed sensing [39], BregmanEMTV for PET [44]). However, those approaches are still limited in certain practical situations, and since separate reconstruction in small time scales does not exploit natural temporal correlation, spatiotemporal image reconstruction methods are an obvious next step.
Several approaches tried to obtain improved reconstructions by appropriately averaging reconstructions at different time steps. To do so, the motion between those time steps needs to be estimated, which can be performed either by flowtype (optical flow) techniques [15] or by image registration [32]. Subsequently the motion in the image can be corrected and improved images can be obtained by averaging, cf. e.g. [32] for an implementation in a clinical PET setup. Further improvements are often achieved by iterating a reconstruction step (in separate time steps) and the motion estimation, where some motioncorrected density (inital or average) serves as a prior for the next reconstruction step. A remaining disadvantage are artefacts that can be created this way as well as unclear convergence properties of such alternating iteration approaches. Those can be cured by formulating an appropriate variational model, which is then minimized by an alternating iteration method. In this way consistent and convergent iteration schemes can be obtained and the properties of the limits can be understood from the structure of the underlying variational problem. A key issue is that smoothness and smallness of deformations or velocities can be used as reasonable priors (via appropriate energy terms in the variational problems), which finally leads to physically reasonable reconstructions with significantly improved signaltonoise ratio. Examples are the approaches of [41, 8, 30] to motioncorrected PET reconstruction.
Similar as in [41, 8] we are going to minimize a functional of the form
(1) 
for an appropriate transformation model. Here is the data fidelity between estimated and measured data, and respectively are regularization functionals on the image respectively motion (with nonnegative regularization parameters and ).
In our setup we reconstruct a sequence of nonnegative densities on , typically such that
(2) 
with a reasonably smooth deformation field . The different densities are thought of as evaluations at (ascending) time steps .
The measurements are noisy versions of the projected image at theth time step , with a stationary forward operator , e.g. a convolution in fluorescence microscopy or versions of the xray transform in PET and SPECT. Motivated by the above applications, in particular cardiac PET, we will focus on noisy data drawn from Poisson statistics, i.e. is interpreted as a Poisson distributed random variable with expected value , where is a typical length of the time interval. However, extension to various other commonly used noise models, such as additive noise or additional background noise, is straightforward in our variational setting by simply exchanging a data fidelity term related to the negative loglikelihood of the noise distribution.
The main contributions of the paper are the following:

An appropriate modelling of the massconserving density transformation and its implementation in the reconstruction process, which has not been considered previously.

A model with suitable regularization functionals for images with edges (total variation) and for physically reasonable, (weakly) onetoone deformations (hyperelastic).

A detailed mathematical analysis of the arising variational problem in a function space setting.

The construction and implementation of computational methods for a setup in cardiac PET, with applications to realistic datasets.
As we shall see below, a major issue in the analysis of the model is the fact that we only deal with BV images and Sobolev deformations, whose composition is not defined in a straightforward way and continuity with respect to weak convergence is a rather difficult issue. The analysis section is therefore split into two main parts: In the first we present some results from geometric measure theory, which allow for a generalized version of the change of variables formula. With help of this formula we are able to prove weak convergence of the composition of functions with Sobolev mappings (Theorem 18). This convergence result is the backbone of the existence result we formulate for injective transformations (Theorem 15).
The next section lays the focus on our numerical optimization. Since we assume time discrete data, we pick a standard space discretization and perform a FirstDiscretizeThenOptimize approach. Similar to [41] we end up with an alternating two step minimization: The first step is reconstruction with a fixed motion estimation, while we optimize the motion estimation with a fix reconstructed image in the second. We describe both steps in detail and show that we can perform standard EMTV algorithms also with motion corrected reconstruction. The minimization in the motion step leads to image registration, but we present a new distance measure which is defined on the detector domain. Despite from that we follow the widely known FAIRframework [43] for the registration.
2. Motion Models and Variational Formulation
In the following we discuss the appropriate mathematical modelling of motioncorrected reconstruction of density images. Our goal is to obtain the reconstructed image, which is a sequence of nonnegative,integrable probability densities
(3) 
as well as the motion (deformation)
(4) 
as a minimizer of a suitable variational problem. We will further restrict the set of admissible deformations below and sometimes also look for densities in different spaces, e.g. as functions of bounded variation or relaxed as probability measures instead of densities. We will investigate the following variational model: Minimize
(5) 
over nonnegative subject to (2). Here is assumed to be a compact operator and the are noisy measurements at time , we refer to the subsequent sections for precise definition of the BVseminorm and . This can be interpreted as a MAP estimate for a (formal) Bayesian model as we demonstrate in the following.
2.1. Bayesian Modelling
In order to derive a variational model we can resort to a formal Bayesian model, where for simplicity we use the above notations. For a rigorous modelling, discretized versions of the above variables or infinitedimensional probability distributions should be used, which is beyond the scope of this paper. Bayes theorem yields the posterior probability density
(6) 
where is the likelihood, which we simply assume as
(7) 
where is the stationary noise likelihood of observing given . Note that in our model we make the natural assumption that does not depend on explicitely, since the image formation and hence the involved noise is considered as an instantaneous snapshot given some density at time . For data obtained by collecting information over a larger time interval (or even the full interval , the modelling needs to be changed at this point.
The prior probability density for the image sequence given the deformations is given by
(8) 
where is the concentrated measure (centered at the origin) and is some apriori probability for the image at time step . Finally, is a prior probability density on the deformation sequence.
In order to compute a maximum aposteriori probability (MAP) estimator, we need to restrict to the set of images and deformations satisfying (2) due to the concentrated measure and minimize the negative loglikelihood of the remaining factors in the density. With we obtain a problem of the form
(9) 
As usual the negative logarithms of the prior probabilities are related to regularization terms in standard inverse problems theory [31]. Hence, we can directly specify those using appropriate models. Since we aim at reconstructing images with sharp edges we employ total variation regularization on the image, i.e.,
(10) 
For the deformation we use a hyperelastic energy (cf. [49]), i.e.,
(11) 
which is natural since it also regularizes the Jacobian determinant det. Note that hyperelastic regularization also enforces orientation preservation and thus a.e. [12]. In detail the hyperelastic energy is given by
(12) 
with the penalty functions
The three terms punish deviations from the identity and in volume, length and surface. This energy enforces locally onetoone transformations, but there might be globally non injective transformations, see [53].
Having defined regularization for the density as well as for the motion field, we look for a minimizer of (5), subject to (2)
In the particular case of Poisson data, the data fidelity equals the KullbackLeibler divergence (for being a continuous operator from to ) up to a constant independent of , more precisely
(13) 
3. Analysis
This section is devoted to the analysis of the functional we derived in the previous section. Resulting from the masspreservation condition we derived a functional with transformed images of the form . The transformation theorem for integrals is a powerful tool for dealing with such transformations. Unfortunately the hyperelastic regularization we imposed on the transformation does not guarantee diffeomorphic transformations in the classical sense. Thus we describe the relaxation of classical infinitesimal calculus to Sobolev mappings, before we focus on the presentation of our analytical results.
3.1. Preliminary Results
In this section we generalize known definitions from the classical infinitesimal calculus to equivalence classes of functions in Lebesque respectively Sobolev spaces (compare for example [22, 28, 25] for a further course on this matter). Our final goal is to derive a version of the transformation theorem for integrals for nondiffeomorphic functions. Since we can not distinguish functions differing on zero sets, the classical definition of differentiability is not a feasible way, because we would like to obtain the same result for all representatives of the equivalence class. A natural way to define a coherent function value for all representatives is via averaging, which leads to the following definition.
Definition 1.
Let be a domain and . Then the set of points for which exists, such that
(14) 
is called the Lebesgue set , while the points in are called Lebesgue points.
Remark 2.

It is known that the complement of the Lebesgue points is a zero set [20, Thm. 2.19].

For we can define the set of Lebesgue points for the derivative analogously.

If is a vectorvalued function, we say is a Lebesgue point, iff it is a Lebesgue point for each component function.
Differentiability can be generalized in a similar way [2]:
Definition 3 (Approximate differential for ).
Let and let ; we say that is approximately differentiable at , iff there exists a matrix , such that
(15) 
To overcome difficulties arising from changing functions on a zero set, Nikolai Lusin imposed in his dissertation [40] the so called Lusin condition, also known as Ncondition:
Definition 4 (Lusin’s condition).
A mapping satisfies Lusin’s condition, iff:
(16) 
with denoting the Lesbesguemeasure.
Having surmounted these difficulties we need to take in account, that a nondiffeomorphic function may hit some points several times. For this problem Stefan Banach introduced the so called Banach indicatrix, which gives the number of roots to an equation [6]. This concept was generalized to discontinuous functions by Lozinski [38] and into higher dimensions by Kronrod [35] and Vitushkin in his master thesis [54]:
Definition 5 (Banach indicatrix).
Let , . The Banach indicatrix is a function
which is given by
(17) 
We are now ready to present a change of variables formula under minimal assumptions, which was given by Hajlasz [28]. The central idea is, that points hit multiple times by the transformation need to be taken into account as multiplicative factor.
Theorem 6 (Area formula).
Let be a mapping. If is approximately differentiable almost everywhere, then can be redefined on a zero set, such that the new fulfills Lusin’s condition. Furthermore the following statements hold for every measurable subset and positive measurable function :

The functions and are measurable.

If moreover then
(18) 

If one of the functions and is integrable then so is the other and the formula (18) holds.
Additionally we have
(19) 
Proof.
As we see, the generalization of the transformation theorem can take into account, that points might be hit several times. Consequently a mapping with no additional injectivity restriction might violate the masspreservation condition.
Furthermore, this subject ist strongly linked to the topological degree, which was introduced by Browers in 1911 [10] and generalized to Sobolev mappings by Giaquinta et al. [24]. We will not focus on this topic, but we will use some of this results in the remaining part of this thesis. We start by giving the generalized definition of the topological degree, see also[24]
Definition 7 (Topological degree).
Let be an open set and an almost everywhere approximately differentiable map with Jacobian the degree of is defined as
(20) 
Remark 8.
The topological degree is strongly related to the Banach indicatrix, since it coincides with the Banach indicatrix for certain mappings. As a direct consequence of [23, Chapter 1, Proposition 2] we obtain for orientation preserving mappings:
(21) 
An interesting property of the topological degree is that it is completely determined on the boundary for sufficiently regular functions. This is phrased in the following proposition.
Proposition 9.
Let be a bounded Lipschitz domain in and let be mappings in with . Suppose that text in the sense of traces. Then
Proof.
See [23], Chapter 2, Proposition 1. ∎
To conclude this brief summary, we mention that there is indeed a specification to injective Sobolev functions. This leads to the field of weak diffeomorphisms [21]. With Theorem 10 we will only present a compactness result, which is central for our analysis. A short introduction is given in the appendix. A weak diffeomorphism can be expressed as the limit of a sequence of orientation preserving diffeomorphisms [21]. Furthermore closeness and compactness results for this class can be stated ([23, Chapter 5, Theorems 3 and 4]). This relies heavily on weak convergence in the space of weak diffeomorphisms . Thus we present a similar result published by Henao and MoraCorral [29] instead, which follows directly from [29, Proposition 2 and Theorem 2].
Theorem 10 (Injectivity as closed constraint).
For each let be a.e. approximately differentiable. Assume furthermore, that
(22) 
as well as
(23) 
Suppose that there exists such that a.e. and
(24) 
as . Assume that for each the function we have a.e. with a.e.. Then

a.e.,

a.e..
3.2. Regularization Functionals
In this section we shortly present the properties of the regularization functionals we use in our reconstruction framework. Both regularizations guarantee stronger convergence properties additionally to the compactness of sublevel sets. We do not focus on general possibilities for choosing these energies, but rather present the specific choices for images (total variation) and transformation (hyperelastic).
Total variation regularization was first introduced for image denoising in [48], now known as the ROFmodel. Recently TV regularization has been applied to different tasks in imaging.
In fact there are many different equivalent ways to define total variation and functions (compare
[5, Chapter 10, Definition 10.1.1]), but we focus on the common definition in image processing (cf. [13]):
Definition 11 (BV seminorm and functions of bounded variation).
Let Then the BVseminorm is given by
(25) 
Consequently we define the space of functions with bounded variation by
(26) 
The BVseminorm is lower semicontinuous in the topology [2, Remark 3.5]. [2, Remark 3.12], thus one can define a weakstar convergence.
Note that due to compactness of the embedding operator the weak star convergence in guarantees strong convergence in . As we will see later, this will be useful to prove convergence properties of compositions of functions. Furthermore we can approximate any function by a sequence of smooth functions:
Theorem 12 (Approximation by smooth functions).
Let . Then , if and only if there exists a sequence in converging to u in and there exists a constant satisfying
(27) 
Proof.
[2, Thm. 3.9]. ∎
3.2.1. Hyperelastic Regularization
For the hyperelastic regularization energy, we presented earlier, Ruthotto [49] used the following set of admissible transformations
(28) 
where is bounded by M, and is defined by:
Remark 13.
As we see the admissible transformations have strict positive Jacobians a.e. and are thus locally invertible. Note that this deduction is not trivial: Since by the Sobolev embedding theorem [18, Chapter 5.6.4, Thm 6] an admissible function does not need to have a continuous version, we cannot use the implicit function theorem to deduce the existence of a local inversion. Even worse, the standard theory on (local) invertibility of Sobolev mappings is focussed on mappings in (see for example [19, 34]), so this theory would only be applicable for . However, we can use some results from the theory of Cartesian currents we mentioned briefly earlier and use the fact, that:
(29) 
with as being defined in the Appendix. Then a result from Müller [45] yields the closedness of the graph [24] of the transformation and thus is a weak local diffeomorphism as defined by Giaquinta et al. [26]. We will not elaborate on this further, because we used this argumentation only to demonstrate that we can expect to have local invertibility and that this property is not directly guaranteed by the positivity of the Jacobian determinant.
We conclude this short course on hyperelastic regularization by stating the convergence properties, shown in [49, Chapter 3, Theorem 4]:
Theorem 14 (Convergence properties of admissible transformations ).
Let be a domain with a boundary and , be admissible transformations. Then convergence of
implies and
3.3. Existence of a Minimizer
In this section we establish the following existence result for a compact subset of the injective admissible transformations:
Theorem 15 (Existence of a minimizer in motioncorrected reconstruction).
Let our assumptions (A1)(A3) hold, furthermore we assume that we have
(30) 
Then the functional (5)
with a continuous distance term fulfilling coercivity property (47) has at least one minimizer
, where is a closed subset of . Particularly this holds for the KullbackLeibler divergence as distance term.
Injectivity is crucial to enforce the masspreservation condition. Despite the fact that our proof can be extended to transformations with bounded in we shortly give two closed subsets of the injective transformations:
Remark 16.

The set is closed [29].

The set is a closed subset of for an injective boundary value , where the equality is understood in the sense of traces.
Crucial for the proof is controlling sequences of functions after transformations with beeing weak* convergent sequences resulting from coercivity properties. Before actually presenting our central theorem on this convergence, we define weak* convergence for the transformation as follows (compare [50]).
Definition 17 (Weak* Convergence in ).
Let . We say in , iff
Now we can deduce convergence properties of composed weak* convergent sequences.
Theorem 18.
Let in , in and in . Assume additionally, that
(31) 
Then we obtain .
Proof.
For the ease of presentation we use the following abbreviation:
For any fixed we show:
We examine both summands separately and show that each of them converges to zero. We recall that weak star convergence in implies strong convergence in [13]. Now the first term is treated in a straightforward way by the area formula and (31)
Since and in implies weak convergence, we obtain
(32) 
The second term needs further care: According to [2, Thm. 3.9]. we find a sequence of functions with .
Let . Thus we can pick a fix N, such that:
(33) 
We now expand the first term with said and obtain:
We examine each term separately: We start with applying Hölder’s inequality to the first one and obtain
which yields together with (33)
(34) 
In the second term we start in a straightforward way with
(35) 
By using that is bounded, compact and is weakly convergent, it directly follows, that and is bounded by some (BanachSteinhaus, see e.g. [1, Chapter 5, Theorem 3]). Thus (35) is finite. Aswell we can directly deduce
(36) 
and hence,
Note that is in and therefore is Lipschitzcontinuous with some constant L, since is compact. We obtain
(37) 
By the RellichKondrachov compactness theorem [18, Chapter 5.7, Theorem 1] converges strongly to in and so we find , such that
(38) 
This implies for each
(39) 
Now let us considered the third term. Since is bounded it follows from the weak convergence of the determinants, that there exists , such that for every
(40) 
Note that the compactness of ensures that . For the last term we can proceed as for the first one, with as above, and deduce
(41) 
for any .
As we see, the boundedness of the BanachIndicatrix is substantial to control sequences of images after transformations. In order to guarantee the convergence properties given by Theorem 18, we restrict our analysis to injective transformations. Additionally this has the effect that as a consequence of the area formula the masspreservation condition is not violated.
Now we can give continuity properties for a wide range of distance measures including the KullbackLeibler divergence:
Lemma 19.
Let the assumptions from Theorem 18 be fulfilled. Then the distance part of our functional J, defined by
(43) 
with a continuous integrand function is lower semicontinous with respect to weakstar convergence in , weak convergence in for the transformations and weak convergence in for the determinants.
Proof.
Let in , in , in . Since we denote as the collection of all transformations we understand the weak convergence componentwise. From Theorem 18 we obtain for any fixed
Since is a compact operator, K is completely continuous, which gives us:
Therefore we can follow the proof of Lemma 3.4 (iii) from [47] for each summand and obtain lower semicontinuity of the KullbackLeibler data fidelity term with the following reasoning:
Since , we have convergence almost everywhere. Thus we can deduce
Now we can apply Fatou’s Lemma [25, Chapter 1, Theorem 2 (iii)] and obtain:
Having shown lower semicontinuity for an arbitrary summand with fixed i, the assertion follows directly. ∎
We have now stated lower semicontinuity results for a wide range of distance terms, including the KullbackLeibler data fidelity. We now turn our focus to the TVregularization. By setting for any in (5), this would follow directly by the properties of the BVseminorm we mentioned earlier. However to formulate an existence result for , we give the following lemma, which follows with a proof as in [13]:
Lemma 20.
Let in . Then
(44) 
We verified the first condition for the existence of a minimizer and turn our focus coercivity.
Lemma 21 (Coercivity for the KullbackLeibler divergence).
Let our assumptions (28) for our transformation and (A1)(A3) for the operator be fulfilled. Then
is coercive with respect to the variable .
Proof.
We begin by observing:
for any fixed . We add the constant , such that each summand is nonnegative. Note that and therefore we can bind the KullbackLeibler divergence from below by:
Therefore we can conclude
(45) 
∎
Remark 22.
The lemma holds not only for the KullbackLeibler divergence, but for any distance of the form
(46) 
Â´ with convex, which is bounded below and satisfies
(47) 
with constants and , only dependent of .
We have proved that the first part of our functional is coerciv in , so it remains to be shown, that the hyperelastic regularization is coercive with respect to . Fortunately this has already been done in [49, Chapter 2, Lemma 1]. Combining these results, we can finally prove Theorem 15:
Proof.
(of Theorem 15) The coercivity properties of distance (47), TV and hyperelastic regularization ensure the existence of a bounded level set. Compactness is then granted by the BanachAlaoglu Theorem. Thus a minimizing sequence has a convergent subsequence with limit , while the convergence is understood component wise. By the lower semicontinuity properties of distance (Lemma 19), TV (Lemma 20) and hyperelastic regularization we obtain:
Finally as a result of the compactness of we have . ∎
3.4. Convergence of the Regularization Method
Having stated existence results for the variational problem of motioncorrected reconstruction we can show with analogous weak compactness and lower semicontinuity arguments that the minimization of our functional (5) can be understood as a (nonlinear) regularization method, i.e. there exist appropriate limites as noise and regularization parameter tend to zero [17].
Theorem 23.
Let be a sequence of noisy data with
(48) 
where is the exact data for an image and transformations , such that:
(49) 
for a nonnegative distance , which is lower semicontinuous in both arguments and fulfilling (47). Furthermore, we define a sequence of functionals by:
(50) 
Then for and with
(51) 
the sequence , with being minimizers of has a convergent subsequence and the limit fulfills:
(52) 
Note that the estimation (52) only holds for the composition and not for the components. Since different transformations can lead to the same transformed image we can in general not expect to derive convergence for both components.
Remark 24.
In order to deal with the non uniqueness of the solution yielded by Theorem 23 we assume additionally