Motion-Corrected Reconstruction of Density Images

# A Nonlinear Variational Approach to Motion-Corrected Reconstruction of Density Images

Martin Burger and Jan Modersitzki and Sebastian Suhr
###### Abstract.

The aim of this paper is to establish a nonlinear variational approach to the reconstruction of moving density images from indirect dynamic measurements. Our approach is to model the dynamics as a hyperelastic deformation of an initial density including preservation of mass. Consequently we derive a variational regularization model for the reconstruction, which - besides the usual data fidelity and total variation regularization of the images - also includes a motion constraint and a hyperelastic regularization energy.

Under suitable assumptions we prove the existence of a minimizer, which relies on the concept of weak diffeomorphisms for the motion. Moreover, we study natural parameter asymptotics and regularizing properties of the variational model. Finally, we develop a computational solution method based on alternating minimization and splitting techniques, with a particular focus on dynamic PET. The potential improvements of our approach compared to conventional reconstruction techniques are investigated in appropriately designed examples.

Institute for Computational and Applied Mathematics, University of Münster, Einsteinstr. 62, 48149 Münster, Germany.
Cells in Motion Cluster of Excellence, University of Münster
Institute of Mathematics and Image Computing, University of Lübeck, Maria-Goeppert-StraÃe 3, 23562 Lübeck, Germany
Fraunhofer MEVIS Project Group Image Registration, Maria-Goeppert-StraÃe 3, 23562 Lübeck, Germany
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) through grants BU 2327/8-1 and MO-1053/2-1, as well as by ERC via Grant EU FP 7 - ERC Consolidator Grant 615216 LifeInverse. MB acknowledges further support by the German Science Foundation DFG via EXC 1003 Cells in Motion Cluster of Excellence, Münster, Germany

## 1. Introduction

With current advances of imaging devices, in particular in biomedical imaging, dynamic studies including motion attract more and more attention. Prominent examples are live microscopy, MRI, and emission tomography (PET and SPECT). For the latter cardiac applications are particularly interesting, which are subject to cardiac and respiratory motion. In most of such applications a density image (e.g. the density of a radiochemical or fluorescent tracer), which is subject to motion between consecutive time steps, is to be reconstructed from indirect measurements. This problem shall be tackled with a novel variational approach in this paper.

The need for advanced motion-corrected reconstruction methods is caused by the fact that separate reconstructions in small time intervals - in which the motion effect is negliglible - are of inferior quality. The latter problem is either caused by severe undersampling (e.g. in MRI due to time restrictions in acquiring slice data) or the bad signal-to-noise ratio (e.g. in optical imaging and emission tomography techniques being based on counting photons) in small time steps. Several recent approaches exploiting sparsity properties of the images to be reconstructed achieve strong improvements in these two situations (cf. MR compressed sensing [39], Bregman-EM-TV for PET [44]). However, those approaches are still limited in certain practical situations, and since separate reconstruction in small time scales does not exploit natural temporal correlation, spatio-temporal image reconstruction methods are an obvious next step.

Several approaches tried to obtain improved reconstructions by appropriately averaging reconstructions at different time steps. To do so, the motion between those time steps needs to be estimated, which can be performed either by flow-type (optical flow) techniques [15] or by image registration [32]. Subsequently the motion in the image can be corrected and improved images can be obtained by averaging, cf. e.g. [32] for an implementation in a clinical PET setup. Further improvements are often achieved by iterating a reconstruction step (in separate time steps) and the motion estimation, where some motion-corrected density (inital or average) serves as a prior for the next reconstruction step. A remaining disadvantage are artefacts that can be created this way as well as unclear convergence properties of such alternating iteration approaches. Those can be cured by formulating an appropriate variational model, which is then minimized by an alternating iteration method. In this way consistent and convergent iteration schemes can be obtained and the properties of the limits can be understood from the structure of the underlying variational problem. A key issue is that smoothness and smallness of deformations or velocities can be used as reasonable priors (via appropriate energy terms in the variational problems), which finally leads to physically reasonable reconstructions with significantly improved signal-to-noise ratio. Examples are the approaches of [41, 8, 30] to motion-corrected PET reconstruction.

Similar as in [41, 8] we are going to minimize a functional of the form

 J(ρ,y)=N∑i=0(D(Kρi,fi)+αiRI(ρi)+βiRM(yi)) (1)

for an appropriate transformation model. Here is the data fidelity between estimated and measured data, and respectively are regularization functionals on the image respectively motion (with nonnegative regularization parameters and ).

In our setup we reconstruct a sequence of nonnegative densities on , typically such that

 ρi(x)=ρ0(yi(x))det(∇yi(x)), (2)

with a reasonably smooth deformation field . The different densities are thought of as evaluations at (ascending) time steps .

The measurements are noisy versions of the projected image at theth time step , with a stationary forward operator , e.g. a convolution in fluorescence microscopy or versions of the x-ray transform in PET and SPECT. Motivated by the above applications, in particular cardiac PET, we will focus on noisy data drawn from Poisson statistics, i.e. is interpreted as a Poisson distributed random variable with expected value , where is a typical length of the time interval. However, extension to various other commonly used noise models, such as additive noise or additional background noise, is straight-forward in our variational setting by simply exchanging a data fidelity term related to the negative log-likelihood of the noise distribution.

The main contributions of the paper are the following:

• An appropriate modelling of the mass-conserving density transformation and its implementation in the reconstruction process, which has not been considered previously.

• A model with suitable regularization functionals for images with edges (total variation) and for physically reasonable, (weakly) one-to-one deformations (hyperelastic).

• A detailed mathematical analysis of the arising variational problem in a function space setting.

• The construction and implementation of computational methods for a setup in cardiac PET, with applications to realistic datasets.

As we shall see below, a major issue in the analysis of the model is the fact that we only deal with BV images and Sobolev deformations, whose composition is not defined in a straight-forward way and continuity with respect to weak convergence is a rather difficult issue. The analysis section is therefore split into two main parts: In the first we present some results from geometric measure theory, which allow for a generalized version of the change of variables formula. With help of this formula we are able to prove weak convergence of the composition of functions with Sobolev mappings (Theorem 18). This convergence result is the backbone of the existence result we formulate for injective transformations (Theorem 15).

The next section lays the focus on our numerical optimization. Since we assume time discrete data, we pick a standard space discretization and perform a First-Discretize-Then-Optimize approach. Similar to [41] we end up with an alternating two step minimization: The first step is reconstruction with a fixed motion estimation, while we optimize the motion estimation with a fix reconstructed image in the second. We describe both steps in detail and show that we can perform standard EM-TV algorithms also with motion corrected reconstruction. The minimization in the motion step leads to image registration, but we present a new distance measure which is defined on the detector domain. Despite from that we follow the widely known FAIR-framework [43] for the registration.

## 2. Motion Models and Variational Formulation

In the following we discuss the appropriate mathematical modelling of motion-corrected reconstruction of density images. Our goal is to obtain the reconstructed image, which is a sequence of nonnegative,integrable probability densities

 ρ=(ρ0,ρ1,…,ρN)∈L1+(Ω)N+1 (3)

as well as the motion (deformation)

 y=(y1,…,yN)∈H1(Ω)N (4)

as a minimizer of a suitable variational problem. We will further restrict the set of admissible deformations below and sometimes also look for densities in different spaces, e.g. as functions of bounded variation or relaxed as probability measures instead of densities. We will investigate the following variational model: Minimize

 J(ρ,y)=N∑k=0(D(Kρi,fi)+αi|ρi|BV(Ω))+N∑k=1βiShyper(yi) (5)

over nonnegative subject to (2). Here is assumed to be a compact operator and the are noisy measurements at time , we refer to the subsequent sections for precise definition of the BV-seminorm and . This can be interpreted as a MAP estimate for a (formal) Bayesian model as we demonstrate in the following.

### 2.1. Bayesian Modelling

In order to derive a variational model we can resort to a formal Bayesian model, where for simplicity we use the above notations. For a rigorous modelling, discretized versions of the above variables or infinite-dimensional probability distributions should be used, which is beyond the scope of this paper. Bayes theorem yields the posterior probability density

 P((ρ,y)|f)∼P(f|ρ) P(ρ|y) P(y), (6)

where is the likelihood, which we simply assume as

 P(f|ρ)=N∏i=0Ln(Kρi,fi), (7)

where is the stationary noise likelihood of observing given . Note that in our model we make the natural assumption that does not depend on explicitely, since the image formation and hence the involved noise is considered as an instantaneous snapshot given some density at time . For data obtained by collecting information over a larger time interval (or even the full interval , the modelling needs to be changed at this point.

The prior probability density for the image sequence given the deformations is given by

 P(ρ|y)∼P0(ρ0)N∏k=1ε(ρi−ρ0(yi)det(∇yi))Pi(ρi), (8)

where is the concentrated measure (centered at the origin) and is some a-priori probability for the image at time step . Finally, is a prior probability density on the deformation sequence.

In order to compute a maximum a-posteriori probability (MAP) estimator, we need to restrict to the set of images and deformations satisfying (2) due to the concentrated measure and minimize the negative log-likelihood of the remaining factors in the density. With we obtain a problem of the form

 minρ,yN∑i=0(D(Kρi,fi)−logPi(ρi))−logP(y). (9)

As usual the negative logarithms of the prior probabilities are related to regularization terms in standard inverse problems theory [31]. Hence, we can directly specify those using appropriate models. Since we aim at reconstructing images with sharp edges we employ total variation regularization on the image, i.e.,

 −logPi(ρi)=αi|ρi|BV(Ω) (10)

For the deformation we use a hyperelastic energy (cf. [49]), i.e.,

 −logP(y)=N∑i=1βiShyper(yi), (11)

which is natural since it also regularizes the Jacobian determinant det. Note that hyperelastic regularization also enforces orientation preservation and thus a.e. [12]. In detail the hyperelastic energy is given by

 Shyper(y) = ∫Ωα1len(∇y)+α2surf(cof(∇y))+α3vol(det(∇y))dx , (12)

with the penalty functions

 len(s)=∥s−I∥2Fro ,surf(s)=(max(∥s∥2Fro−3,0))2,vol(s)=⎧⎨⎩(s−1)4s2% if s>0+∞else .

The three terms punish deviations from the identity and in volume, length and surface. This energy enforces locally one-to-one transformations, but there might be globally non injective transformations, see [53].

Having defined regularization for the density as well as for the motion field, we look for a minimizer of (5), subject to (2)

In the particular case of Poisson data, the data fidelity equals the Kullback-Leibler divergence (for being a continuous operator from to ) up to a constant independent of , more precisely

 D(Kρi,fi)=∫ΣKρi−filog(Kρi)dσ (13)

## 3. Analysis

This section is devoted to the analysis of the functional we derived in the previous section. Resulting from the mass-preservation condition we derived a functional with transformed images of the form . The transformation theorem for integrals is a powerful tool for dealing with such transformations. Unfortunately the hyperelastic regularization we imposed on the transformation does not guarantee diffeomorphic transformations in the classical sense. Thus we describe the relaxation of classical infinitesimal calculus to Sobolev mappings, before we focus on the presentation of our analytical results.

### 3.1. Preliminary Results

In this section we generalize known definitions from the classical infinitesimal calculus to equivalence classes of functions in Lebesque- respectively Sobolev spaces (compare for example [22, 28, 25] for a further course on this matter). Our final goal is to derive a version of the transformation theorem for integrals for non-diffeomorphic functions. Since we can not distinguish functions differing on zero sets, the classical definition of differentiability is not a feasible way, because we would like to obtain the same result for all representatives of the equivalence class. A natural way to define a coherent function value for all representatives is via averaging, which leads to the following definition.

###### Definition 1.

Let be a domain and . Then the set of points for which exists, such that

 (14)

is called the Lebesgue set , while the points in are called Lebesgue points.

###### Remark 2.

• It is known that the complement of the Lebesgue points is a zero set [20, Thm. 2.19].

• For we can define the set of Lebesgue points for the derivative analogously.

• If is a vector-valued function, we say is a Lebesgue point, iff it is a Lebesgue point for each component function.

Differentiability can be generalized in a similar way [2]:

###### Definition 3 (Approximate differential for L1loc).

Let and let ; we say that is approximately differentiable at , iff there exists a matrix , such that

 (15)

To overcome difficulties arising from changing functions on a zero set, Nikolai Lusin imposed in his dissertation [40] the so called Lusin condition, also known as N-condition:

###### Definition 4 (Lusin’s condition).

A mapping satisfies Lusin’s condition, iff:

 ∀E⊂Ω, λ(E)=0⇒λ(y(E))=0 (16)

with denoting the Lesbesgue-measure.

Having surmounted these difficulties we need to take in account, that a non-diffeomorphic function may hit some points several times. For this problem Stefan Banach introduced the so called Banach indicatrix, which gives the number of roots to an equation [6]. This concept was generalized to discontinuous functions by Lozinski [38] and into higher dimensions by Kronrod [35] and Vitushkin in his master thesis [54]:

###### Definition 5 (Banach indicatrix).

Let , . The Banach indicatrix is a function

 Ny(⋅,Ω):Rm→N0∪{∞},

which is given by

 Ny(z,E):=card({y−1(z)∩Ω}). (17)

We are now ready to present a change of variables formula under minimal assumptions, which was given by Hajlasz [28]. The central idea is, that points hit multiple times by the transformation need to be taken into account as multiplicative factor.

###### Theorem 6 (Area formula).

Let be a mapping. If is approximately differentiable almost everywhere, then can be redefined on a zero set, such that the new fulfills Lusin’s condition. Furthermore the following statements hold for every measurable subset and positive measurable function :

• The functions and are measurable.

• If moreover then

 ∫Ωu(y(x))det(∇y(x))dx=∫Rmu(z)Ny(z,Ω)dz. (18)
• If one of the functions and is integrable then so is the other and the formula (18) holds.

 ∫Ωu(x)det(∇y(x))dx=∫Rd∑w∈(y−1(z)∩Ω)u(w)dz. (19)
###### Proof.

See [28, Theorem 2] for the first part of the theorem. The second part is given in [25, Chapter 1.2 Theorem 2] for Lipschitz mappings and can be generalized by following the proof by Hajlasz [28, Theorem 2]. ∎

As we see, the generalization of the transformation theorem can take into account, that points might be hit several times. Consequently a mapping with no additional injectivity restriction might violate the mass-preservation condition.

Furthermore, this subject ist strongly linked to the topological degree, which was introduced by Browers in 1911 [10] and generalized to Sobolev mappings by Giaquinta et al. [24]. We will not focus on this topic, but we will use some of this results in the remaining part of this thesis. We start by giving the generalized definition of the topological degree, see also[24]

###### Definition 7 (Topological degree).

Let be an open set and an almost everywhere approximately differentiable map with Jacobian the degree of is defined as

 deg(y,Ω,z):=∑x∈y−1(z)sgn(det(Dy(x))) (20)
###### Remark 8.

The topological degree is strongly related to the Banach indicatrix, since it coincides with the Banach indicatrix for certain mappings. As a direct consequence of [23, Chapter 1, Proposition 2] we obtain for orientation preserving mappings:

 deg(y,Ω,⋅)=Ny(⋅,Ω)a.e. (21)

An interesting property of the topological degree is that it is completely determined on the boundary for sufficiently regular functions. This is phrased in the following proposition.

###### Proposition 9.

Let be a bounded Lipschitz domain in and let be mappings in with . Suppose that text in the sense of traces. Then

 deg(y1,Ω,z)=deg(y2,Ω,z)
###### Proof.

See [23], Chapter 2, Proposition 1. ∎

To conclude this brief summary, we mention that there is indeed a specification to injective Sobolev functions. This leads to the field of weak diffeomorphisms [21]. With Theorem 10 we will only present a compactness result, which is central for our analysis. A short introduction is given in the appendix. A weak diffeomorphism can be expressed as the limit of a sequence of orientation preserving diffeomorphisms [21]. Furthermore closeness and compactness results for this class can be stated ([23, Chapter 5, Theorems 3 and 4]). This relies heavily on weak convergence in the space of weak diffeomorphisms . Thus we present a similar result published by Henao and Mora-Corral [29] instead, which follows directly from [29, Proposition 2 and Theorem 2].

###### Theorem 10 (Injectivity as closed constraint).

For each let be a.e. approximately differentiable. Assume furthermore, that

 yj∈W1,p(Ω,Rm)p≥d−1det(Dyj)∈L1(Ω) (22)

as well as

 cof(Dyj)∈Lg(Ω)q≥pp−1supj∈N∥cof(Dyj)∥1<∞. (23)

Suppose that there exists such that a.e. and

 yj→ydet(Dyj)⇀θ % in L1(Ω) (24)

as . Assume that for each the function we have a.e. with a.e.. Then

• a.e.,

• a.e..

### 3.2. Regularization Functionals

In this section we shortly present the properties of the regularization functionals we use in our reconstruction framework. Both regularizations guarantee stronger convergence properties additionally to the compactness of sublevel sets. We do not focus on general possibilities for choosing these energies, but rather present the specific choices for images (total variation) and transformation (hyperelastic).

Total variation regularization was first introduced for image denoising in [48], now known as the ROF-model. Recently TV regularization has been applied to different tasks in imaging. In fact there are many different equivalent ways to define total variation and functions (compare
[5, Chapter 10, Definition 10.1.1]), but we focus on the common definition in image processing (cf. [13]):

###### Definition 11 (BV seminorm and functions of bounded variation).

Let Then the BV-seminorm is given by

 |u|BV(Ω):=supg∈C∞0(Ω;Rd),∥g∥∞≤1∫Ωu∇⋅gdx. (25)

Consequently we define the space of functions with bounded variation by

 BV(Ω):={u∈L1(Ω)||u|BV(Ω)<∞}∥u∥BV:=∥u∥1+|u|BV(Ω) (26)

The BV-seminorm is lower semicontinuous in the topology [2, Remark 3.5]. [2, Remark 3.12], thus one can define a weak-star convergence.

Note that due to compactness of the embedding operator the weak star convergence in guarantees strong convergence in . As we will see later, this will be useful to prove convergence properties of compositions of functions. Furthermore we can approximate any -function by a sequence of smooth functions:

###### Theorem 12 (Approximation by smooth functions).

Let . Then , if and only if there exists a sequence in converging to u in and there exists a constant satisfying

 limn→∞∫Ω|∇un|dx<∞ (27)

[2, Thm. 3.9]. ∎

#### 3.2.1. Hyperelastic Regularization

For the hyperelastic regularization energy, we presented earlier, Ruthotto [49] used the following set of admissible transformations

 A:={y∈A0:|∫Ωy(x)dx|≤vol(Ω)(M+diam(Ω))}, (28)

where is bounded by M, and is defined by:

 A0:={y∈W1,2: cof(∇y)∈L4(Ω,R3×3) det(∇y)∈L2(Ω,R),det(∇y)>0 a.e.}
###### Remark 13.

A transformation is call an admissible transformation. Note that all admissible transformations fulfill the conditions on the cofactor in Proposition 9 and Theorem 10.

As we see the admissible transformations have strict positive Jacobians a.e. and are thus locally invertible. Note that this deduction is not trivial: Since by the Sobolev embedding theorem [18, Chapter 5.6.4, Thm 6] an admissible function does not need to have a continuous version, we cannot use the implicit function theorem to deduce the existence of a local inversion. Even worse, the standard theory on (local) invertibility of Sobolev mappings is focussed on mappings in (see for example [19, 34]), so this theory would only be applicable for . However, we can use some results from the theory of Cartesian currents we mentioned briefly earlier and use the fact, that:

with as being defined in the Appendix. Then a result from Müller [45] yields the closedness of the graph [24] of the transformation and thus is a weak local diffeomorphism as defined by Giaquinta et al. [26]. We will not elaborate on this further, because we used this argumentation only to demonstrate that we can expect to have local invertibility and that this property is not directly guaranteed by the positivity of the Jacobian determinant.

We conclude this short course on hyperelastic regularization by stating the convergence properties, shown in [49, Chapter 3, Theorem 4]:

###### Theorem 14 (Convergence properties of admissible transformations ).

Let be a domain with a boundary and , be admissible transformations. Then convergence of

 yk⇀y in W1,2(Ω,R3),cof(∇yk)⇀H in L4(Ω,R3×3),det(∇yk)⇀v in L2(Ω,R)

implies and

### 3.3. Existence of a Minimizer

In this section we establish the following existence result for a compact subset of the injective admissible transformations:

###### Theorem 15 (Existence of a minimizer in motion-corrected reconstruction).

Let our assumptions (A1)-(A3) hold, furthermore we assume that we have

 J(1,(Id)N)=N∑i=0(D(Kρ0,fi)+αi|ρ0|BV(Ω))+N∑i=1βiShyper(Id)<∞, (30)

Then the functional (5)

with a continuous distance term fulfilling coercivity property (47) has at least one minimizer
, where is a closed subset of . Particularly this holds for the Kullback-Leibler divergence as distance term.

Injectivity is crucial to enforce the mass-preservation condition. Despite the fact that our proof can be extended to transformations with bounded in we shortly give two closed subsets of the injective transformations:

###### Remark 16.
• The set is closed [29].

• The set is a closed subset of for an injective boundary value , where the equality is understood in the sense of traces.

Crucial for the proof is controlling sequences of functions after transformations with beeing weak-* convergent sequences resulting from coercivity properties. Before actually presenting our central theorem on this convergence, we define weak-* convergence for the transformation as follows (compare [50]).

###### Definition 17 (Weak-* Convergence in A).

Let . We say in , iff

 yk ⇀∗y in W1,2(Ω,R3) cof(∇yk) ⇀∗cof(∇y) in L4(Ω,R3×3) det(∇yk) ⇀∗det(∇y) in L2(Ω,R)

Now we can deduce convergence properties of composed weak-* convergent sequences.

###### Theorem 18.

Let in , in and in . Assume additionally, that

 supk∥Nyk(⋅,Ω)∥∞≤C∈R. (31)

Then we obtain .

###### Proof.

For the ease of presentation we use the following abbreviation:

 ρk:=ρ0kρ:=ρ0dk:=det(∇yk)

For any fixed we show:

 0=limk→∞ ∫Ω(ρk(yk)dk−ρ(y)d)φdx =limk→∞ ∫Ω(ρk(yk)dk−ρ(yk)dk+ρ(yk)dk−ρ(y)d)φdx =limk→∞ ∫Ω(ρk(yk)dk−ρ(yk)dk)φdx+limk→∞∫Ω(ρ(yk)dk−ρ(y)d)φdx

We examine both summands separately and show that each of them converges to zero. We recall that weak star convergence in implies strong convergence in [13]. Now the first term is treated in a straightforward way by the area formula and (31)

 limk→∞∫Ω|ρk(yk)dk−ρ(yk)dk|dx =limk→∞∫Rn|ρk−ρ|Nyk(x,Ω)dx ≤limk→∞C∫Rn|ρk−ρ|dx

Since and in implies weak convergence, we obtain

 limk→∞∫Ω(ρk(yk)dk−ρ(yk)dk)φdx=0∀φ. (32)

The second term needs further care: According to [2, Thm. 3.9]. we find a sequence of functions with .

Let . Thus we can pick a fix N, such that:

 ∫Ω0|ξN−ρ|dx≤ϵ4∥φ∥∞C. (33)

We now expand the first term with said and obtain:

 ∫Ω(ρ(yk)dk−ξN(yk)dk+ξN(yk)dk−ξN(y)dk +∫ΩξN(y)dk−ξN(y)d+ξN(y)d−ρ(y)d)φdx =∫Ω(ρ(yk)dk−ξN(yk)dk)φdx+∫Ω(ξN(yk)dk−ξN(y)dk)φdx +∫Ω(ξN(y)dk−ξN(y)d)φdx+∫Ω(ξN(y)d−ρ(y)d)φdx,

We examine each term separately: We start with applying Hölder’s inequality to the first one and obtain

 ∫Ω(ρ(yk)dk−ξN(yk)dk)φdx ≤∫Ω|ρ(yk)dk−ξN(yk)dk|dx ∥φ∥∞ =∫Rn|ρ−ξN|Nyk(z,Ω)dz∥φ∥∞ ≤C∥ρ−ξN∥1∥φ∥∞,

which yields together with (33)

 ∫Ω(ρ(yk)dk−ξN(yk)dk)φdx≤ϵ4. (34)

In the second term we start in a straightforward way with

 ∫Ω|ξN(yk)dk|dx≤∥ξN(yk)∥2∥dk∥2. (35)

By using that is bounded, compact and is weakly convergent, it directly follows, that and is bounded by some (Banach-Steinhaus, see e.g. [1, Chapter 5, Theorem 3]). Thus (35) is finite. Aswell we can directly deduce

 ∫Ω|ξN(y)dk|dx≤∥ξN(y)∥2∥dk∥2≤∞, (36)

and hence,

 ∫Ω(ξN(yk)dk−ξN(y)dk)φdx ≤∥(ξN(yk)−ξN(y))φ∥2∥dk∥2 ≤∥(ξN(yk)−ξN(y))φ∥2 E

Note that is in and therefore is Lipschitz-continuous with some constant L, since is compact. We obtain

 ∫Ω((ξN(yk)−ξN(y))φ)2dx≤(∥φ∥∞)2L∥yk−y∥2. (37)

By the Rellich-Kondrachov compactness theorem [18, Chapter 5.7, Theorem 1] converges strongly to in and so we find , such that

 ∥yk−y∥<ϵ216(∥φ∥∞)2LE2.∀k≥K2 (38)

This implies for each

 ∫Ω(ξN(yk)d−ξN(y)d)φdx<ϵ4. (39)

Now let us considered the third term. Since is bounded it follows from the weak convergence of the determinants, that there exists , such that for every

 ∫Ω(ξN(y)dk−ξN(y)d)φdx≤ϵ4. (40)

Note that the compactness of ensures that . For the last term we can proceed as for the first one, with as above, and deduce

 ∫Ω(ξN(y)d−ρ(y)d)φdx≤ϵ4 (41)

for any .

By combining (34), (39), (40) and (41), we obtain for every

 ∫Ω(ρ(yk)dk−ρ(y)d)φdx≤ϵ∀φ

and thus

 limk→∞∫Ω(ρk(yk)dk−ρk(y)d)φdx=0. (42)

The assertion follows by combining (32) and (42). ∎

As we see, the boundedness of the Banach-Indicatrix is substantial to control sequences of images after transformations. In order to guarantee the convergence properties given by Theorem 18, we restrict our analysis to injective transformations. Additionally this has the effect that as a consequence of the area formula the mass-preservation condition is not violated.

Now we can give continuity properties for a wide range of distance measures including the Kullback-Leibler divergence:

###### Lemma 19.

Let the assumptions from Theorem 18 be fulfilled. Then the distance part of our functional J, defined by

 D(ρ0,y)=N∑i=0∫Σg(Kρ0(yi)det(∇yi),fi)dσ (43)

with a -continuous integrand function is lower semicontinous with respect to weak-star convergence in , weak convergence in for the transformations and weak convergence in for the determinants.

###### Proof.

Let in , in , in . Since we denote as the collection of all transformations we understand the weak convergence componentwise. From Theorem 18 we obtain for any fixed

 ρ0k(yik)det(∇yik)⇀ρ0(yi)det(∇yi).

Since is a compact operator, K is completely continuous, which gives us:

 K(ρ0k(yik)det(∇yik))→K(ρ0(yi)det(∇yi)) in L1(Σ).

Therefore we can follow the proof of Lemma 3.4 (iii) from [47] for each summand and obtain lower semicontinuity of the Kullback-Leibler data fidelity term with the following reasoning:

Since , we have convergence almost everywhere. Thus we can deduce

 g(Kρ0k(yik)det(∇yik))→g(Kρ0(yi)det(∇yi)) in L1(Σ).

Now we can apply Fatou’s Lemma [25, Chapter 1, Theorem 2 (iii)] and obtain:

Having shown lower semicontinuity for an arbitrary summand with fixed i, the assertion follows directly. ∎

We have now stated lower semicontinuity results for a wide range of distance terms, including the Kullback-Leibler data fidelity. We now turn our focus to the TV-regularization. By setting for any in (5), this would follow directly by the properties of the BV-seminorm we mentioned earlier. However to formulate an existence result for , we give the following lemma, which follows with a proof as in [13]:

###### Lemma 20.

Let in . Then

 |ρ|BV(Ω)≤liminfk→∞|ρk|BV(Ω). (44)

We verified the first condition for the existence of a minimizer and turn our focus coercivity.

###### Lemma 21 (ρ-Coercivity for the Kullback-Leibler divergence).

Let our assumptions (28) for our transformation and (A1)-(A3) for the operator be fulfilled. Then

 J1(ρ0,y):=∑i ⎡⎢⎣∫ΣK(ρ0(yidet(∇yi)))−filog(K(ρ0(yidet(∇yi))))dσ +αi|ρ0(yi)det(∇yi)|BV(Ω)]

is coercive with respect to the variable .

###### Proof.

We begin by observing:

 log(x)≤1ax+log(a)∀x>0

for any fixed . We add the constant , such that each summand is non-negative. Note that and therefore we can bind the Kullback-Leibler divergence from below by:

 N∑i=0∫ΣK(ρ0(yi)det(∇yi))−filog(K(ρ0(yi)det(∇yi)))+filog(fi)−fidσ ≥ ∫ΣK(ρ0(y0)det(∇y0))−f0log(K(ρ0(y0)det(∇y0)))+f0log(f0)−f0dσ = ∫ΣKρ0−f0log(Kρ0)+f0log(f0)−f0dσ ≥ ∫ΣKρ0−f0(1f0+1Kρ0+log(f0+1))+f0log(f0)−f0dσ ≥ = ∫ΩK∗1f0+1ρ0dx+c2≥∫Ωc1ρ0dx+c2dx = c1∥ρ0∥1+c2

Therefore we can conclude

 J1(ρ0,y)≥c1∥ρ0∥1+α0|ρ0|BV(Ω)≥min{c1,α0}(∥ρ0∥1+|ρ0|BV(Ω)). (45)

###### Remark 22.

The lemma holds not only for the Kullback-Leibler divergence, but for any distance of the form

 D(Kρi,fi)=∫Σgi(Kρi)dσ (46)

Â´ with convex, which is bounded below and satisfies

 g0(Kρ0)≥c1Kρ0+c2 (47)

with constants and , only dependent of .

We have proved that the first part of our functional is coerciv in , so it remains to be shown, that the hyperelastic regularization is coercive with respect to . Fortunately this has already been done in [49, Chapter 2, Lemma 1]. Combining these results, we can finally prove Theorem 15:

###### Proof.

(of Theorem 15) The coercivity properties of distance (47), TV and hyperelastic regularization ensure the existence of a bounded level set. Compactness is then granted by the Banach-Alaoglu Theorem. Thus a minimizing sequence has a convergent subsequence with limit , while the convergence is understood component wise. By the lower semicontinuity properties of distance (Lemma 19), TV (Lemma 20) and hyperelastic regularization we obtain:

 J(ρ,y)≤liminfJ(ρk,yk)=infJ

Finally as a result of the compactness of we have . ∎

### 3.4. Convergence of the Regularization Method

Having stated existence results for the variational problem of motion-corrected reconstruction we can show with analogous weak compactness and lower semicontinuity arguments that the minimization of our functional (5) can be understood as a (nonlinear) regularization method, i.e. there exist appropriate limites as noise and regularization parameter tend to zero [17].

###### Theorem 23.

Let be a sequence of noisy data with

 limkfik=fi∗, (48)

where is the exact data for an image and transformations , such that:

 (ρ∗,y∗)=minρ,yN∑i=0D(K(ρ(yi)det(∇yi)),fi∗) (49)

for a non-negative distance , which is lower semicontinuous in both arguments and fulfilling (47). Furthermore, we define a sequence of functionals by:

 Jk=N∑i=0(D(K(ρ0(yi)det(∇yi)),fik)+αik|ρi|BV(Ω))+N∑i=1βikShyper(yi) (50)

Then for and with

 N∑i=0D(fi∗,fik)mini{aik,βik}→0maxi{aik,βik}mini{aik,βik}≤C∈R∀k (51)

the sequence , with being minimizers of has a convergent subsequence and the limit fulfills:

 ∑iD(^ρ(^yi)det(∇^yi),fi∗)=∑iD(ρ∗(yi∗)det(∇yi∗),fi∗) (52)

Note that the estimation (52) only holds for the composition and not for the components. Since different transformations can lead to the same transformed image we can in general not expect to derive convergence for both components.

###### Remark 24.

In order to deal with the non uniqueness of the solution yielded by Theorem 23 we assume additionally