1 Introduction
Abstract

All imaging modalities such as computed tomography (CT), emission tomography and magnetic resonance imaging (MRI) require a reconstruction approach to produce an image. A common image processing task for applications that utilise those modalities is image segmentation, typically performed posterior to the reconstruction. We explore a new approach that combines reconstruction and segmentation in a unified framework. We derive a variational model that consists of a total variation regularised reconstruction from undersampled measurements and a Chan-Vese based segmentation. We extend the variational regularisation scheme to a Bregman iteration framework to improve the reconstruction and therefore the segmentation. We develop a novel alternating minimisation scheme that solves the non-convex optimisation problem with provable convergence guarantees. Our results for synthetic and real data show that both reconstruction and segmentation are improved compared to the classical sequential approach.

Enhancing joint reconstruction and segmentation with non-convex Bregman iteration

Veronica Corona, Martin Benning, Matthias J. Ehrhardt, Lynn F. Gladden, Richard Mair,

Andi Reci, Andrew J. Sederman, Stefanie Reichelt, Carola-Bibiane Schönlieb

Department of Applied Mathematics and Theoretical Physics, University of Cambridge

Department of Chemical Engineering and Biotechnology, University of Cambridge

Cancer Research UK Cambridge Institute, University of Cambridge

1 Introduction

Image reconstruction plays a central role in many imaging modalities for medical and non-medical applications. The majority of imaging techniques deal with incomplete data and noise, making the inverse problem of reconstruction severely ill-posed. Based on compressed sensing (CS) it is possible to tackle this problem by exploiting prior knowledge of the signal [1, 2, 3]. Nevertheless, reconstructions from very noisy and undersampled data will present some errors that will be propagated into further analysis, e.g. image segmentation. Segmentation is an image processing task used to partition the image into meaningful regions. Its goal is to identify objects of interest, based on contours or similarities in the interior. Typically segmentation is performed after reconstruction, hence its result strongly depends on the quality of the reconstruction. Recently the idea of combining reconstruction and segmentation has become more popular. The main motivation is to avoid error propagations that occur in the sequential approach by estimating edges simultaneously from the data, ultimately improving the reconstruction. In this paper, we propose a new model for joint reconstruction and segmentation from undersampled MRI data. The underlying idea is to incorporate prior knowledge about the objects that we want to segment in the reconstruction step, thus introducing additional regularity in our solution. In this unified framework, we expect that the segmentation will also benefit from sharper reconstructions. We demonstrate that our joint approach improves the reconstruction quality and yields better segmentations compared to sequential approaches. In Figure 1, we consider a brain phantom from which we simulated the undersampled k-space data and added Gaussian noise. (b) and 0(e) present reconstructions and segmentations obtained with the sequential approaches, while (c) and 0(f) show the results for our joint approach. The reconstruction using our method shows clearly more details and it is able to detect finer structures that are not recovered with the classical separate approach. As a consequence, the joint segmentation is also improved. In the following section we present the mathematical models that we used in our comparison. We investigated the performance of our model for two different applications: bubbly flow and cancer imaging. We show that both reconstruction and segmentation benefit from this method, compared to the traditional sequential approaches, suggesting that error propagation is reduced.

Our contribution.

In our proposed joint method, we obtain an image reconstruction that preserves its intrinsic structures and edges, possibly enhancing them, thanks to the joint segmentation, and simultaneously we achieve an accurate segmentation. We consider the edge-preserving total variation regularisation for both the reconstruction and segmentation term using Bregman distances. In this unified Bregman iteration framework, we have the advantage of improving the reconstruction by reducing the contrast bias in the TV formulation, which leads to more accurate segmentation. In addition, the segmentation constitutes another prior for the reconstruction by enhancing edges of the regions of interest. Furthermore, we propose a non-convex alternating direction algorithm in a Bregman iteration scheme for which we prove global convergence.

The paper is organised as follows. In Section 2 we describe the problems of MRI reconstruction and region-based segmentation. We then introduce our joint reconstruction and segmentation approach in a Bregman iteration framework. This section also contains a detailed comparison of other joint models in the literature. In Section 3 we study the non-convex optimisation problem and present the convergence analysis for this class of problems. Finally in Section 4 we present numerical results for MRI data for different applications. Here we investigate the robustness of our model by testing the undersampling rate up to its limit and by considering different noise levels.

2 MRI reconstruction and segmentation

In the following section we introduce the mathematical tools to perform image reconstruction and image segmentation. In this work, we focus on the specific MRI application; however, our proposed joint method can be applied to other imaging problems in which the measured data is connected to the image via a linear and bounded forward operator, cf. below. Finally we present our model that combines the two tasks of reconstruction and segmentation in a unified framework.

2.1 Reconstruction

In image reconstruction problems, we have the general setting

 f=Au+η, (1)

where is a bounded and linear operator mapping between two vector-spaces. The measured data is usually corrupted by some noise and often only observed partially. In this formulation we are interested in recovering the image given the data .
In this work, we focus on the application of MRI and we refer to the measurements as the k-space data. In standard MRI acquisitions, the Fourier coefficients are collected in the k-space by radio-frequency (RF) coils. Because the k-space data is acquired sequentially, the scanning time cannot be arbitrarily fast. One of the most common ways to perform fast imaging consists of undersampling the k-space; this, however, only yields satisfactory results if the dimension of the parameter space can implicitly be reduced, for example by exploiting sparsity in certain domains. In the reconstruction, this assumption is incorporated in the regularisation term. Let with be a discrete image domain. Let with be our given undersampled k-space data, where are the measured Fourier coefficients that fulfil the relationship (1) with . The operator is now composed by , which is a sampling operator that selects measurements from the data according to the locations provided by a binary sampling matrix (see e.g. (d)), where is the discrete Fourier transform. In MRI, the noise is drawn from a complex-valued Gaussian distribution with zero mean and standard deviation [4].
In problem (1) for MRI, the aim is to recover the image . However, in this work we follow the standard assumption that in many applications we have negligible phase, i.e. we are working with real valued, non-negative images. Therefore, we are only interested in ; hence we consider the MRI forward operator as and its adjoint as modelled in [5]. Problem (1) is ill-posed due to noise and incomplete measurements. The easiest approach to approximate (1) is to compute the solution, for which the missing entries are replaced with zero

 uz=A∗f.

However, images reconstructed with this approach will suffer from aliasing artifacts because undersampling the k-space violates the Nyquist–Shannon sampling theorem. Therefore, we consider a mathematical model that incorporates prior knowledge by using a variational regularisation approach. A popular model is to find an approximate solution for as a minimiser of the Tikhonov-type regularisation approach

 u∗∈argminu{12∥Au−f∥22+αJ(u)}, (2)

where the first term is the data fidelity that forces the reconstruction to be close to the measurements and the second term is the regularisation, which imposes some regularity on the solution. The parameter is a regularisation parameter that balances the two terms in the variational scheme. In this setting, different regularisation functionals can be chosen (see [6] for a survey of variational regularisation approaches).
Although problems of the form (2) are very effective, they also lead to a systematic loss of contrast [7, 8, 9]. To overcome this problem, [10] proposed an iterative regularisation method based on the generalised Bregman distance [11, 12]. The Bregman distance with respect to is defined as

 DpkJ(u,uk)=J(u)−J(uk)−⟨pk,u−uk⟩ (3)

with , where is called sub-differential and it is a generalisation of the classical differential for convex functions. We replace problem (2) with a sequence of minimisation problems

 uk+1∈argminu{12∥Au−f∥22+αDpkJ(u,uk)}. (4)

The update on the subgradient can be conveniently computed by the optimality condition of (4)

 pk+1=pk−1αA∗(Auk+1−f). (5)

In this work, we will focus on one particular choice for , namely the total variation. The total variation (TV) regularisation is a well-known edge-preserving approach, first introduced by Rudin, Osher and Fatemi in [13] for image denoising. The TV regularisation, i.e. the 1-norm penalty on a discrete finite difference approximation of the two-dimensional gradient , that is , is in the discrete setting

 J(u)=TV(u)=∥∇u∥2,1=∑(i,j)∈Ω√|∇1u(i,j)|2+|∇2u(i,j)|2, (6)

for the isotropic case.
We then consider the Bregman iteration scheme in (4) for . This approach is usually carried on by initialising the regularisation parameter with a large value, producing overregularised initial solutions. At every step , finer details are added. A suitable criterion to stop iterations (4) and (5) (see [6]), is the Morozov’s discrepancy principle [14]. The discrepancy principle suggests to choose the smallest such that satisfies

 ∥f−Auk+1∥2≤σ√m (7)

where is the number of samples and is the standard deviation of the noise in the data. Note that using Bregman iterations, the contrast is improved and in some cases even recovered exactly, compared to the variational regularisation model. In addition, it makes the regularisation parameter choice less challenging.

2.2 Segmentation

Image segmentation refers to the process of automatically dividing the image into meaningful regions. Mathematically, one is interested in finding a partition of the image domain subject to and . One way to do this is to use region-based segmentation models, which identify regions based on similarities of their pixels. The segmentation model we are considering was originally proposed by Chan and Vese in [15]. Given an image function , the goal is to divide the image domain in two separated regions and by minimising the following energy function

 ∫Ω1(f(x)−c1)2dx+∫Ω2(f(x)−c2)2dx+β⋅Length(C)→minc1,c2,C

where is the desired contour separating and , and the constants and represents the average intensity value of inside and outside , respectively. The parameter penalises the length of the contour , controlling the scale of the objects in the segmentation. From this formulation we can make two observations: first, the regions and can be represented by the characteristic function

 v(x)={0,if x∈Ω1∪C1,if x∈Ω2,

second, the perimeter of the contour identified by the the characteristic function corresponds to its total variation, as shown by the Coarea formula [16]. This leads to the new formulation

 ∫Ωv(x)(f(x)−c1)2dx+(1−v(x))(f(x)−c2)2dx+βTV(v)→minc1,c2,v∈{0,1}.

Even assuming fixed constants , the problem is non-convex due to the binary constraint. In [17] the authors proposed to relax the constraint, allowing to assume values in the interval . They showed that for fixed constants ,, global minimisers can be obtained by minimising the following energy

 ∫Ωv(x)(f(x)−c1)2dx+(1−v(x))(f(x)−c2)2dx+βTV(v)→minv∈[0,1] (8)

followed by thresholding, setting . As the problem is convex but not strictly convex, the global minimiser may not be unique. In practice we obtain solutions which are almost binary, hence the choice of is not crucial.
Setting

 s(x)=(f(x)−c1)2−(f(x)−c2)2

the energy (8) can be written in a more general form as

 ∫Ωv(x)s(x)dx+βTV(v)→minv∈[0,1].

In this paper, we are interested in the extension of the two-class problem to the multi-class formulation [18]. Following the simplex-constrained vector function representation for multiple regions and its convex relaxation proposed in [19], we obtain as a special case a convex relaxation of the Chan-Vese model for arbitrary number of regions, which reads

 ∫Ωℓ∑i=1vi(x)(ci−u(x))2dx+βTV(v)→minv∈C, (9)

where is a convex set which restricts to lie in the standard probability simplex. As in the binary case, the constants describe the average intensity value inside region . In this case we consider the vector-valued formulation of TV

 TV(v)=∫Ω√∥∇v1∥2+⋯+∥∇vℓ∥2dx.

2.3 Joint reconstruction and segmentation

MRI reconstructions from highly undersampled data are subject to errors, even when prior knowledge about the underlying object is incorporated in the mathematical model. It is often required to find a trade-off between filtering out the noise and retrieving the intrinsic structures while preserving the intensity configuration and small details. As a consequence, segmentations in the presence of artifacts are likely to fail.
In this paper, we propose to solve the two image processing tasks of reconstruction and segmentation in a unified framework. The underlying idea is to inform the reconstruction with prior knowledge of the regions of interest, and simultaneously update this belief according to the actual measurements. Mathematically, given the under-sampled and noisy k-space data , we want to recover the image and compute its segmentation in disjoint regions, by solving the following problem

 (u,v) =argminu,v12∥Au−f∥22+αTV(u)reconstruction (10) +δn∑i=1ℓ∑j=1vij(cj−ui)2+βTV(v)+ıC(v)% segmentation.

where is the characteristic function over . However, instead of solving (10), we consider the iterative regularisation procedure using Bregman distances. The main motivation is to exploit the contrast enhancement aspect for the reconstruction thanks to the Bregman iterative scheme. By improving the reconstruction, the segmentation is in turn refined. Therefore, we replace (10) with the following sequence of minimisation problems for

 uk+1 =argminu12∥Au−f∥22+αDpkTV(u,uk)+δn∑i=1ℓ∑j=1vij(cj−ui)2 (11a) pk+1 =pk−1α(A∗(Auk+1−f)−2δℓ∑j=1vkj(uk+1−cj)) (11b) vk+1 =argminvδn∑i=1ℓ∑j=1vij(cj−ui)2dx+χC(v)+βDqkTV(v,vk) (11c) qk+1 =qk−δβ(cj−uk+1)2=−δ(k+1)β(cj−uk+1)2. (11d)

Under suitable assumptions, we are going to show global convergence of the sequences generated by (11) in Section 3.
This model combines the reconstruction approach described in (4) and the discretised multi-class segmentation in (9) with a variation in the regularisation term, which is now embedded in the Bregman iteration scheme. In [20] the authors used Bregman distances for the Chan-Vese formulation (8), combined with spectral analysis, to produce multiscale segmentations.
As described in the previous subsection, the parameters and describe the scale of the details in and the scale of the segmented objects in . By integrating the two regularisations into the same Bregman iteration framework, we obtain that these scales are now determined by the iteration . At the first Bregman iteration , when is very large, we obtain an over-smoothed , and the value of is not very important. Intuitively, is almost piecewise constant with small total variation and a broad range of values of may lead to very similar segmentations . However, at every iteration , finer scales are added to the solution with the update . Accordingly, the update , which is independent of , in fact updates the regularisation parameter as . Therefore, the segmentation keeps up with the scale in the reconstructed image .
The novelty of this approach is also represented by the role of the parameter . This parameter weighs the effect of the segmentation in the reconstruction, imposing regularity in in terms of sharp edges in the regions of interest. In Section 4 we show how different ranges of affects the reconstruction (see Figure 12). Intuitively, large values of force the solution to be close to the piecewise constant solution described by the constants . This is beneficial in applications where MRI is a means to extract shapes and sizes of underlying objects, (e.g. bubbly flow in Subsection 4.1). On the other hand, with very small , the segmentation has little impact and the solutions for are close to the ones obtained by solving the individual problem (4). Instead, intermediate values of impose sharper boundaries in the reconstruction while preserving the texture.
Obviously, we need to stop the iteration before the residual brings back noise from the data . As we cannot use Morozov discrepancy principle in this case (due to the fact that will rather increase due to the effect of the coupling term controlled by the parameter ), we stop when two consecutive iterates in are smaller than a certain tolerance, , following the observation that the rate at which changes close to the optimal solution is low, in contrary to more abrupt changes at the beginning of the Bregman iteration and later on when it starts to add noise.

Clearly, problem (11) is non-convex in the joint argument due to the coupling term. However, it is convex in each individual variable. We propose to solve the joint problem by iteratively alternating the minimisation with respect to and to (see Section 3 for numerical optimisation and convergence analysis).

2.4 Comparison to other joint reconstruction and segmentation approaches

In this section we will provide an overview of some existing simultaneous reconstruction and segmentation (SRS) approaches with respect to different imaging applications.

Ct/spect.

Ramlau and Ring [21] first proposed a simultaneous reconstruction and segmentation model for CT, that was later extended to SPECT in [22] and to limited data tomography [23]. In these work, the authors aim to simultaneously reconstruct and segment the data acquired from SPECT and CT. CT measures the mass density distribution , that represents the attenuation of x-rays through the material; SPECT measures the activity distribution as the concentration of the radio tracer injected in the material. Given the two measurements and , from CT and SPECT, they consider the following energy functional

 E(f,μ,Γf,Γμ)=∥A(f,μ)−yδ∥2+β∥Rμ−zδ∥2+α(Length(Γf)+Length(Γμ)).

They propose a joint model based on a Mumford-Shah like functional, in which the reconstructions of and and the given data are embedded in the data term in a least squares sense. The operators and are the attenuated Radon transform (SPECT operator) and the Radon transform (CT operator), respectively. The penalty term is considered to be a multiple of the lengths of the contours of , and the contours of , . These boundaries are modelled using level set functions. In these segmented partitions of the domain, and are assumed to be piecewise constant. The optimisation problem is then solved alternatively with respect to the functional variables and with fixed geometric variables and and the other way around.

In [24] the simultaneous reconstruction and segmentation is applied to dynamic SPECT imaging, which solves a variational framework consisting of a Kullback-Leibler (KL) data fidelity and different regulariser terms to enforce sharp edges and sparsity for the segmentation and smoothness for the reconstruction. The cost function is

 E(u,c)=KL(R(u⋅c),g)+αK∑k=1∥∇uk∥+βK∑k=1∥uk∥1+δ2K∑k=1∥∂∂tck∥22.

Given the data , they want to retreive the concentration curves in time for K disjoint regions and their indication functions in space. The optimisation is carried out alternating the minimisation over having fixed and then over having fixed.

In [25] they propose a variational approach for reconstruction and segmentation of CT images, with limited field of view and occluded geometry. The cost function

 E(u,c,v)=12∥Ax−y∥2+α∥∇u∥+β2(λn∑iK∑k=1vik(ui−ck)2+12∥Dv∥22)

s.t. a box constraint on the image values and the simplex constraint on the labelling function . The operator is the undersampled Radon transform modelling the occluded geometry and is the given data. The second term is the edge-preserving regularisation term for , the third term is the segmentation term which aims at finding regions in that are close to the value in region . The operator is the finite difference approximation of the gradient. The non-convex problem is solved by alternating minimisation between updates of .

PET and Transmission Tomography.

In [26], the authors propose a maximum likelihood reconstruction and doubly stochastic segmentation for emission and transmission tomography. In their model they use a Hidden Markov Measure Field Model (HMMFM) to estimate the different classes of objects from the given data . They want to maximise the following cost function

 E(u,p,θ)=logP(r|u)+logP(u|p,θ)+logP(p).

The first term is the data likelihood which will be modelled differently for emission and transmission tomography. The second term is the conditional probability or class fitting term, for which they use HMMFM. The third term is the regularisation on the HMMFM. The optimisation is carried out in three steps, where first they solve for (image update) fixing , then for , holding (measure field update) and finally for (parameter update) having fixed.

A variant of this method has been presented in [27], in which they incorporate prior information about the segmentation classes through a HMMFM. Here, the reconstruction is the minimisation over a constrained Bayesian formulation that involves a data fidelity term as a classical least squares fitting term, a class fitting term as a Gaussian mixture for each pixel given classes and dependent of the class probabilities defined by the HMMFM, and a regulariser also dependent of the class probabilities. The model to minimise is

 E(u,δ)= λnoise∥Au−b∥22−N∑j=1log(K∑k=1δjk√2πσkexp(−(uj−μk)22σ2k))+λclassK∑k=1R(δk) s.t. K∑k=1δjk=1,δjk≥0,j=1,…,N,k=1,…,K.

The operator will be modelled as the Radon transform in case of CT and represents the measured data; is the number of pixel in the image; and are the regularisation parameters; are the class parameters. The cost function is non convex and they solve the problem in an alternating scheme where they either update the pixel values or the class probabilities for each pixel.

Storath and others [28] model the joint reconstruction and segmentation using the Potts Model with application to PET imaging and CT. They consider the variational formulation of the Potts model for the reconstruction. Since the solution is piecewise constant, this directly induces a partition of the image domain, thus a segmentation. Given the data , and an operator (e.g. Radon transform), the energy functional is in the following form

 E(u)=λ∥∇u∥0+∥Au−f∥22

where the first term is the jump penalty enforcing piecewise constant solutions and the second term is the data fidelity. As the Potts model is NP hard, they propose a discretisation scheme that allows to split the Potts problem into subproblems that can be solved efficiently.

Mri.

In [29], the authors proposed a joint model with application to MRI. Their reconstruction-segmentation model consists of a fitting term and a patch-based dictionary to sparsely represent the image, and a term that models the segmentation as a mixture of Gaussian distributions with mean, standard deviation and mixture weights , , . Their model is

 E(u,Γ,μ,σ,π)=∥Au−y∥2+λN∑n=1∥Rnu−Dγn∥2−βlnP(u|μ,σ,π)s.t.∥γn∥0≤T\leavevmode\nobreak ∀n

where is the undersampled Fourier transform, is the given data, is a patch extraction operator, is a weighting parameter, is the sparsity threshold, and is the sparse representation of patch organised as column of the matrix . The problem is highly non-convex and it is solved iteratively using conjugate gradient (CG) on , orthogonal matching pursuit on and Expectation-Maximisation (EM) algorithm on .

Summary.

Recently, the idea to solve the problems of reconstruction and segmentation simultaneously has become more popular. The majority of these joint methods have been proposed for CT, SPECT and PET data. Mainly they differ in the way they encode prior information in terms of regularisers and how they link the reconstruction and segmentation in the coupling term. Some imposes smoothness in the reconstruction [24], others sparsity in the gradient [21, 25, 28], other consider a patch-dictionary sparsifying approach [29]. In [28] they do not explicitly obtain a segmentation, but they force the reconstruction to be piecewise constant. Depending on the application, the coupling term is the data fitting term itself (e.g. SPECT), or the segmentation term. In [26, 27, 29] the authors model the segmentation as a mixture of Gaussian distribution, while [25] has a a region-based segmentation approach similar to what we propose. However, [25] penalises the squared 2-norm of segmentation, imposing spatial smoothness.
In our proposed joint approach, we perform reconstruction and segmentation in a unified Bregman iteration scheme, exploiting the advantage of improving the reconstruction, which results in a more accurate segmentation. Furthermore, the segmentation constitutes another prior imposing regularity in the reconstruction in terms of sharp edges in the regions of interest. We propose a novel numerical optimisation problem in a non-convex Bregman iteration framework for which we present a rigorous convergence result in the following section.

3 Optimisation

The cost function (11) is non-convex in the joint argument , but it is convex in each individual variable. To solve this problem we derive a splitting approach where we solve the two minimisation problems in an alternating fashion with respect to and . We present the general algorithm and its convergence analysis in the next subsection. First, we describe the solution of each subproblem.

Problem in u.

 uk+1=argminu 12∥Au−f∥22+α(TV(u)−⟨pk,u⟩) +δn∑i=1ℓ∑j=1vkij(cj−ui)2.

We solve the optimisation for , fixing , using the primal-dual algorithm proposed in [30, 31, 32, 33]. We write , and and obtain the following iterates for and step sizes

 yn+1 =yn+σ∇¯unmax(1,∥ yn+σ∇¯un∥ ) un+1 =un+τ∇⋅yn+1+2τδ∑ℓj=1vkjcj+ταpk+τA∗f1+2τδ+τA∗A ¯un+1 =2un+1−un.

After sufficiently many iterations we set and compute the update from the optimality condition of (3) as (11b).

Problem in v.

 vk+1=argminv∈C⟨v,δg−βqk⟩+βTV(v)

with . We now solve a variant of the primal-dual method [30] as suggested in [33, 34]. They consider the general problem including pointwise linear terms of the form

 minx∈Cmaxy∈A⟨Kx,y⟩+⟨g,x⟩−⟨h,y⟩

where , are closed, convex sets.
Setting and , and step sizes , the updates are

 wn+1 =ΠA(wn+σ(∇¯vn−h)) vn+1 =ΠC(vn+τ∇⋅(vn+1−δg+βqk)) ¯vn+1 =2vn+1−vn.

At the end, we set and obtain the update as (11d).

3.1 Convergence analysis

The proposed joint approach (11) is an optimisation problem of the form

 minu,vE(u,v)+DpkJ1(u,uk)+DqkJ2(v,vk) (12)

in the general Bregman distance framework for (nonconvex) functions , for and some positive parameters and . The functions and impose some regularity in the solution. In this work we consider a finite dimensional setting and we refer to the next section for the required definitons. To prove global convergence of (12) we make the following assumptions.

Assumption 1

1. is a function

2. is a KL function (see Definition 5)

3. is proper, lower semi-continuous (l.s.c.) and strongly convex

4. is a KL function

5. for any fixed , the function is convex. Likewise for any fixed , the function is convex.

6. for any fixed , the function is , hence the partial gradient is -Lipschitz continuous

 ∥∇uE(u1,v)−∇uE(u2,v)∥≤L1(v)∥u1−u2∥∀u1,u2∈Rn.

Likewise for any fixed , the function is .

We want to study the convergence properties of the alternating scheme

 uk+1 =argminu{E(u,vk)+DpkJ1(u,uk)} (A.1) pk+1 =pk−∇uE(uk+1,vk) (A.2) vk+1 =argminv{E(uk+1,v)+DqkJ2(v,vk)} (A.3) qk+1 =qk−∇vE(uk+1,vk+1) (A.4)

for initial values , and .
We want to show that the whole sequence generated by (13) converges to a critical point of .

In order for the updates (A.1) and (A.3) to exist, we want to be of the form (e.g. and , see [35]) where and fulfil the following assumptions.

Assumption 2

1. The functions and are strongly convex with constants and , respectively. They have Lipschitz continuous gradient and with Lipschitz constant and , respectively.

2. The functions and are proper, l.s.c. and convex.

For , , we can write (13) as

 uk+1 =argminu{E(u,vk)+α1DpkR1(u,uk)+ε1DG1(u,uk)} (14a) pk+1 =pk−1α1(∇uE(uk+1,vk)+ε1(∇G1(uk+1)−∇G1(uk))) (14b) vk+1 =argminv{E(uk+1,v)+α2DqkR2(v,vk)+ε2DG2(v,vk)} (14c) qk+1 =qk−1α2(∇vE(uk+1,vk+1)+ε2(∇G2(vk+1)−∇G2(vk))). (14d)
Theorem 1

(Global convergence).

Suppose is a KL function for any and with , . Assume Assumptions 1 and 2 hold. Let and be sequences generated by (14), which are assumed to be bounded. Then

1. The sequence has finite length, that is

 ∞∑k=0∥zk+1−zk∥≤∞. (15)
2. The sequence converges to a critical point of .

3.2 Proof of Theorem 1

In the following we are going to show global convergence of this algorithm. The first step in our convergence analysis is to show a sufficient decrease property of a surrogate of the energy function (12) and a subgradient bound of the norm of the iterates gap. We first define the surrogate functions.

Definition 1

(Surrogate objective).

Let satisfy Assumption 1 and Assumption 2, respectively. For any and subgradients and , we define the following surrogate objectives , and

 Epk,qk(uk+1,vk+1,uk,vk)=E(uk+1,vk+1)+α1DpkR1(uk+1,uk)+α2DqkR2(vk+1,vk), (16)
 Epk(uk+1,uk)=E(uk+1,vk+1)+α1DpkR1(uk+1,uk), (17)
 Eqk(vk+1,vk)=E(uk+1,vk+1)+α2DqkR2(vk+1,vk). (18)

For convenience we will use the following notations

 zk:= (uk,vk)∀k≥0 rk:= (pk,qk)pk∈∂R1(uk),qk∈∂R2(vk).

The surrogate function will then read

 Erk(zk+1,zk)=Epk,qk(uk+1,vk+1,uk,vk).

For brevity, we will also use this notation . For the rest of the proof we also recall the following definitions.

Definition 2

(Strong convexity).

Let be a proper, l.s.c. and convex function. Then is said to be -strongly convex if there exists a constant such that

 DpG(u,v)≥γ2∥u−v∥2

holds true for all and .

Definition 3

(Symmetric Bregman distance).

Let be a proper, l.s.c. and convex function. Then the symmetric generalised Bregman distance is defined as

 DsymmG(u,v):=DpG(u,v)+DqG(v,u)=⟨p−q,u−v⟩

for with and . We also observe that in case is -strongly convex we have

 DsymmG(u,v)≥γ∥u−v∥2.
Definition 4

(Lipschitz continuity).

A function is (globally) Lipschitz-continuous if there exists a constant such that

 ∥G(u)−G(v)∥≤L∥u−v∥

is satisfied for all .

We can now show the sufficient decrease property of (16) for subsequent iterates.

Lemma 1

(Sufficient decrease property).

The iterates generated by (14) satisfy the descent estimate

 Ek(zk+1)+ρ2∥zk+1−zk∥2≤Ek−1(zk). (19)

 limk→∞DsymmR1(uk+1,uk)=0limk→∞DsymmR2(vk+1,vk)=0 limk→∞DsymmG1(uk+1,uk)=0limk→∞DsymmG2(vk+1,vk)=0.

Proof. From (12) we consider the following step for

 uk+1= argminu{E(u,vk)+α1DpkR1(u,uk)+ε1DG1(u,uk)} = argminu{E(u,vk)+α1R(u)+ε1G(u)−⟨α1pk+ε1∇G(uk),u−uk⟩}.

Computing the optimality condition we obtain

 α1(pk+1−pk)+∇uE(uk+1,vk)+ε1(∇G(uk+1)−∇G(uk))=0

Taking the dual product with yields

 α1⟨pk+1−pk,uk+1−uk⟩=DsymmR1(uk+1,uk)+⟨∇uE(uk+1,vk),uk+1−uk⟩ ≥E(uk+1,vk)−E(uk,vk) +ε1⟨∇G1(uk+1)−∇G1(uk),uk+1−uk⟩=DsymmG1(uk+1,uk)=0.

Using the convexity estimate we obtain the inequality

 α1DsymmR1(uk+1,uk)+ε1DsymmG1(uk+1,uk)+E(uk+1,vk)−E(uk,vk)≤0
 α1(DpkR1(uk+1,uk)+Dpk+1R1(uk,uk+1))+ε1DsymmG1(uk+1,uk)+E(uk+1,vk) ≤E(uk,vk).

Adding to both sides, using the strong convexity of and the surrogate function notation, we get