Landweber-Kaczmarz method in Banach spaces

# Landweber-Kaczmarz method in Banach spaces with inexact inner solvers

Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 2601, Australia
###### Abstract.

In recent years Landweber(-Kaczmarz) method has been proposed for solving nonlinear ill-posed inverse problems in Banach spaces using general convex penalty functions. The implementation of this method involves solving a (nonsmooth) convex minimization problem at each iteration step and the existing theory requires its exact resolution which in general is impossible in practical applications. In this paper we propose a version of Landweber-Kaczmarz method in Banach spaces in which the minimization problem involved in each iteration step is solved inexactly. Based on the -subdifferential calculus we give a convergence analysis of our method. Furthermore, using Nesterov’s strategy, we propose a possible accelerated version of Landweber-Kaczmarz method. Numerical results on computed tomography and parameter identification in partial differential equations are provided to support our theoretical results and to demonstrate our accelerated method.

## 1. Introduction

Regularization of inverse problems has been considered extensively and significant progress has been made; see [10, 18, 21, 26, 34] and references therein. Due to the demand of capturing special features of the reconstructed objects and the demand of dealing with general noise, regularization in Banach spaces has emerged as a highly active research field and many new regularization methods have been proposed and investigated in recent years; one may refer to [5, 17, 19, 23, 24, 25, 33] and the references therein for recent developments.

Because of its simplicity and relatively small complexity per iteration, Landweber iteration and its Kaczmarz version have received extensive attention in inverse problem community [9, 11, 12, 14, 27]. In recent years, several versions of Landweber iteration has been formulated in Banach spaces, see [5, 20, 22, 31]. When solving ill-posed system of the form

 Fi(x)=yi,i=0,⋯,N−1 (1.1)

consisting of equations, a Kaczmarz version of Landweber iteration in Banach spaces with general uniformly convex penalty functions has been proposed in [22] which cyclically considers each equation in (1.1) in a Gauss-Seidel manner. For these modern versions of Landweber method, each iteration step essentially requires the computation of next iterate from current iterate via

where is a step size, is a duality mapping, denote one of , denotes the Fréchet derivative of , and is a uniformly convex function. Therefore, the implementation of the Landweber(-Kaczmarz) method in Banach spaces requires to solving a minimization problem of the form

 x=argminz{Θ(z)−⟨ξ,z⟩} (1.2)

associated with in each iteration step.

The existing convergence theory on Landweber(-Kaczmarz) method in Banach spaces requires the exact resolution of the minimization problem (1.2). For some special its exact resolution is possible. However, this minimization problem in general can only be solved inexactly by an iterative procedure which may produce an approximate solution satisfying

 Θ(¯x)−⟨ξ,¯x⟩≤argminz{Θ(z)−⟨ξ,z⟩}+ε (1.3)

for some small . Furthermore, numerical simulations indicate that solving (1.2) very accurately in every step does not improve the final reconstruction result but wastes a huge amount of computational time. Therefore, it is necessary to formulate a Landweber-Kaczmarz method with inexact inner solver in each iteration step and to develop the corresponding convergence theory. The inequality (1.3) suggests that the -subdifferential calculus might be a useful tool for this purpose.

It is well-known that Landweber iteration admits the slow convergence property ([10]) which restricts its applications to wide range of problems. To expand its applied range, it is necessary to introduce some acceleration strategy to fasten its convergence speed. In this paper we will use the Nesterov’s strategy in optimization ([29]) to propose an accelerated version of Landweber-Kaczmarz method in Banach spaces in which some extrapolation steps are incorporated. We do not have a theory to guarantee its acceleration effect at this moment, however, we will provide numerical simulations to support the fast convergence property.

This paper is organized as follows. In section 2 we will provide some preliminaries on Banach spaces and convex analysis and derive some useful results concerning -subdifferential. In section 3 we propose the Landweber-Kaczmarz method with inexact inner solvers, show that it is well-defined, and prove its convergence and regularization property. We then use Nesterov’s strategy to propose an accelerated version. We also discuss how to produce the inexact solvers for solving the inner minimization problem at each iteration step of Landweber-Kaczmarz method. Finally, we provide numerical simulations to verify the theoretical results and to demonstrate the fast convergence of the accelerated method.

## 2. Preliminaries

Let be two Banach space whose norm is denoted by . We use to denote its dual spaces. For any and , we write for the duality pairing. Let be another Banach space. By we denote for the space of all bounded linear operators from to . For any we use to denote its adjoint, i.e.

 ⟨A∗ζ,x⟩=⟨ζ,Ax⟩

for any and .

For each , the set-valued mapping defined by

 JXs(x):={ξ∈X∗:∥ξ∥=∥x∥s−1% and ⟨ξ,x⟩=∥x∥s}

is called the duality mapping of with gauge function . When is uniformly smooth in the sense that its modulus of smoothness

 ρX(t):=sup{∥¯x+x∥+∥¯x−x∥−2:∥¯x∥=1,∥x∥≤t}

satisfies , the duality mapping , for each , is single valued and uniformly continuous on bounded sets.

Given a convex function , we use

 D(Θ):={x∈X:Θ(x)<∞}

to denote its effective domain. It is called proper if . For a proper convex function and , we define for any the set

 ∂εΘ(x):={ξ∈X∗:Θ(¯x)≥Θ(x)+⟨ξ,¯x−x⟩−ε for all ¯x∈X}

which is called the -subdifferential of at . Any element in is called an -subgradient of at . When , the -subdifferential of reduces to the subdifferential . It is clear that for some implies . If is lower semi-continuous, then for any , the -subdifferential is always non-empty for any , see [35, Theorem 2.4.4]; however, can be empty in general.

For with , we may introduce

 DεξΘ(¯x,x):=Θ(¯x)−Θ(x)−⟨ξ,¯x−x⟩+ε,∀¯x∈X

which is called the -Bregman distance induced by at in the direction . It is clear that

 DεξΘ(¯x,x)≥0.

When , the -Bregman distance becomes the well-known Bregman distance [7] which will be denoted by . It should be pointed out that -Bregman distance is not a metric distance in general. Nevertheless, as the following result shows, the -Bregman distance can be used to detect information under the norm if is -convex for some in the sense that there is a constant such that

 Θ(t¯x+(1−t)x)+c0t(1−t)∥¯x−x∥p≤tΘ(¯x)+(1−t)Θ(x) (2.1)

for all and , .

###### Lemma 2.1.

Let be a proper, lower semi-continuous function that is -convex in the sense of (2.1). If for some , then

 c0∥¯x−x∥p≤2DεξΘ(¯x,x)+2ε. (2.2)

for any .

###### Proof.

Since is -convex and , we have for any that

 c0t(1−t)∥¯x−x∥p ≤tΘ(x)+(1−t)Θ(¯x)−Θ(tx+(1−t)¯x) ≤tΘ(x)+(1−t)Θ(¯x)−[Θ(x)+(1−t)⟨ξ,¯x−x⟩−ε] =(1−t)[Θ(¯x)−Θ(x)−⟨ξ,¯x−x⟩+ε]+tε =(1−t)DεξΘ(¯x,x)+tε.

By taking we then obtain (2.2). ∎

In convex analysis, the Legendre-Fenchel conjugate is an important notion. Given a proper convex function , its Legendre-Fenchel conjugate is defined by

 Θ∗(ξ):=supx∈X{⟨ξ,x⟩−Θ(x)},∀ξ∈X∗.

As an immediate consequence of the definition, one can see, for any , that

 ξ∈∂εΘ(x)⟺Θ(x)+Θ∗(ξ)≤⟨ξ,x⟩+ε. (2.3)

If, in addition, is lower semi-continuous, then ([35, Theorem 2.4.4])

 ξ∈∂εΘ(x)⟺x∈∂εΘ∗(ξ). (2.4)

For a proper, lower semi-continuous, -convex function, even if it is non-smooth, its Legendre-Fenchel conjugate can have enough regularity as the following result indicates.

###### Lemma 2.2.

Let be a reflexive Banach space and let be a proper, lower semi-continuous function that is -convex in the sense of (2.1). Then , is Fréchet differentiable, and its gradient satisfies

 ∥∇Θ∗(ξ)−∇Θ∗(η)∥≤(∥ξ−η∥2c0)1p−1 (2.5)

which consequently implies

 |Θ∗(η)−Θ∗(ξ)−⟨η−ξ,∇Θ∗(ξ)⟩|≤1p∗(2c0)p∗−1∥ξ−η∥p∗ (2.6)

for any , where is the number conjugate to , i.e. .

###### Proof.

See [35, Theorem 3.5.10 and Corollary 3.5.11]. ∎

Finally we conclude this section by providing a result which show that, when is -convex, then, for any , the distance from to can be controlled in terms of .

###### Lemma 2.3.

Let ba a reflexive Banach space and let be a proper, lower semi-continuous function that is -convex in the sense of (2.1). If and satisfy for some , then for any there holds

 ⟨η,x−∇Θ∗(ξ)⟩≤ε+1p∗(2c0)p∗−1∥η∥p∗ (2.7)

and hence

 ∥x−∇Θ∗(ξ)∥p≤p2c0ε. (2.8)
###### Proof.

Since , by (2.4) we have and hence

 Θ∗(ξ+η)≥Θ∗(ξ)+⟨η,x⟩−ε.

By using (2.6) in Lemma 2.2, we also have

 Θ∗(ξ+η)≤Θ∗(ξ)+⟨η,∇Θ∗(ξ)⟩+1p∗(2c0)p∗−1∥η∥p∗.

Combining the above two inequalities we obtain

 ⟨η,x⟩−ε≤⟨η,∇Θ∗(ξ)⟩+1p∗(2c0)p∗−1∥η∥p∗

which shows (2.7). By taking in (2.7) and using the properties of , we then obtain

 2c0∥x−∇Θ∗(ξ)∥p≤ε+2c0p∗∥x−∇Θ∗(ξ)∥p

which shows (2.8). ∎

## 3. The method

We consider the system

 Fi(x)=yi,i=0,⋯,N−1 (3.1)

consisting of equations, where, for each , is an operator between two reflexive Banach spaces and . Such systems arise in many practical applications including various tomography problems using multiple exterior measurements.

We will assume that (3.1) has a solution which consequently implies that

 D:=N−1⋂i=0D(Fi)≠∅.

In practical applications, instead of we only have noisy data satisfying

 ∥yδi−yi∥≤δ,i=0,⋯,N−1 (3.2)

with a small noise level . How to use to produce an approximate solution of (3.1) is an important question. In [22] we proposed a Landweber iteration of Kaczmarz type which makes use of every equation in (3.1) cyclically. In order to capture the feature of the sought solution, general convex functions have been used in [22] as penalty terms.

We will make the following assumption, where .

###### Assumption 3.1.
1. is proper, lower semi-continuous and -convex in the sense of (2.1).

2. There exist , and such that and (3.1) has a solution with

 Dξ0Θ(x†,x0)≤14c0ρp; (3.3)
3. For each there exists such that is continuous on and there is such that

 ∥Fi(¯x)−Fi(x)−Li(x)(¯x−x)∥≤γ∥Fi(¯x)−Fi(x)∥

for all .

According to Assumption 3.1 (c), we can find a constant such that

 ∥Li(x)∥≤B∀x∈B2ρ(x0) and i=0,⋯,N−1. (3.4)

Moreover

 ∥Fi(¯x)−Fi(x)∥≤11−γ∥Li(x)(¯x−x)∥≤B1−γ∥¯x−x∥ (3.5)

for all which shows that is continuous on for each .

The formulation of the Landweber iteration of Kaczmarz type in [22] involves in each iteration step a minimization problem of the form

 x:=argminz∈X{Θ(z)−⟨ξ,z⟩} (3.6)

for any . The convergence result developed there requires to solving (3.6) exactly. The exact solution of (3.6) can be found for some special . However, this minimization problem in general can only be solved inexactly by iterative procedures. Furthermore, numerical simulations indicate that solving (3.6) very accurately in every step does not improve the final reconstruction result. Therefore, it is necessary to formulate a Landweber-Kaczmarz method with inexact inner solver in each iteration step and to develop the corresponding convergence theory.

Concerning the inexact resolution of (3.6), we make the following assumption.

###### Assumption 3.2.

For any given there is a procedure for solving (3.6) such that for any , the element satisfies

 Θ(x)−⟨ξ,x⟩≤minz∈X{Θ(z)−⟨ξ,z⟩}+ε. (3.7)

Moreover, for each , the mapping is continuous.

In subsection 3.5 we will discuss how to produce the inexact procedure by using concrete examples of including the total variation like convex penalty functions.

### 3.1. The method with noisy data

We are ready to formulate our Landweber-Kaczmarz method with inexact inner solvers. We will take and let denote the duality mapping over with gauge function . Given an integer , we set .

###### Algorithm 3.1 (Landweber-Kaczmarz method with noisy data).

Let , and be suitably chosen numbers, and let be a sequence of positive numbers satisfying .

1. Pick and such that .

2. Let and . Let and let be a small number. For we define and

 qn={qn−1+1 if ∥rδn∥p+σεn≤(τδ)p,0 otherwise.

We then update

 {ξδn+1=ξδn−μδnLin(xδn)∗JYins(rδn),xδn+1=Sεn+1(ξδn+1), (3.8)

where

 μδn={~μδn(∥rδn∥p+σεn)1−sp if ∥rδn∥p+σεn>(τδ)p,0 otherwise (3.9)

with

 ~μδn=min⎧⎨⎩β0∥rδn∥p(s−1)∥Lin(xδn)∗JYins(rδn)∥p,β1⎫⎬⎭.
3. Let be the first integer such that and use as an approximate solution.

In Algorithm 3.1, each is determined by completely without involving , and each is defined by the inexact procedure specified in Assumption 3.2 for solving (3.6) which is independent of , . This splitting character can make the implementation of Algorithm 3.1 efficiently. Furthermore, the definition of implies that

which shows that

 ξδn∈∂εnΘ(xδn). (3.10)

We will use this fact in the forthcoming convergence analysis.

We first prove the following basic result which shows that Algorithm 3.1 is well-defined.

###### Lemma 3.1.

Let and be reflexive Banach spaces with being uniformly smooth, let and , satisfy Assumption 3.1, and let be a sequence of positive numbers satisfying

 16∞∑n=0εn≤c0ρp. (3.11)

Let be a constant such that . If and are chosen such that

 c1:=1β−γ−1+γτ−2p∗(β02c0)p∗−1>0 (3.12)

and if is chosen such that , where

 κ={1 if p≥s(βps−p−1)p−sp if p

then for Algorithm 3.1 there hold

1. for all ;

2. the method terminates after iteration steps;

3. for any solution of (3.1) in there holds

 Dεn+1ξδn+1Θ(^x,xδn+1)−DεnξδnΘ(^x,xδn)≤2εn+εn+1 (3.14)

for all . Here we may take because .

###### Proof.

Let be any solution of (3.1) in . We first show that, if for some , then

 Dεn+1ξδn+1Θ(^x,xδn+1)−DεnξδnΘ(^x,xδn)≤−c1μδn(∥rδn∥p+σεn)sp+2εn+εn+1. (3.15)

To see this, we consider

 Δn:=Dεn+1ξδn+1Θ(^x,xδn+1)−DεnξδnΘ(^x,xδn)

which can be written as

 Δn =[Θ(xδn)−⟨ξδn,xδn⟩−εn]+[⟨ξδn+1,xδn+1⟩−Θ(xδn+1)] −⟨ξδn+1−ξδn,^x⟩+εn+1.

Since , we have from (2.3) that

 Θ(xδn)−⟨ξδn,xδn⟩−εn≤−Θ∗(ξδn).

By the definition of we also have

 ⟨ξδn+1,xδn+1⟩−Θ(xδn+1)≤Θ∗(ξδn+1).

Therefore

 Δn ≤Θ∗(ξδn+1)−Θ∗(ξδn)−⟨ξδn+1−ξδn,^x⟩+εn+1 =Θ∗(ξδn+1)−Θ∗(ξδn)−⟨ξδn+1−ξδn,∇Θ∗(ξδn)⟩ +⟨ξδn+1−ξδn,∇Θ∗(ξδn)−xδn⟩ +⟨ξδn+1−ξδn,xδn−^x⟩+εn+1.

Because is -convex, we may use (2.6) in Lemma 2.2 and the definition of to obtain

 Δn ≤1p∗(2c0)p∗−1∥ξδn+1−ξδn∥p∗+⟨ξδn+1−ξδn,∇Θ∗(ξδn)−xδn⟩ +μδn⟨JYins(rδn),Lin(xδn)(^x−xδn)⟩+εn+1. (3.16)

Since , we may use Lemma 2.3 to derive that

 ⟨ξδn+1−ξδn,∇Θ∗(ξδn)−xδn⟩≤εn+1p∗(2c0)p∗−1∥ξδn+1−ξδn∥p∗.

Plugging this estimate into (3.1) and using the definition of it follows that

 Δn ≤2p∗(2c0)p∗−1(μδn)p∗∥Lin(xδn)∗JYins(rδn)∥p∗+εn+εn+1 +μδn⟨JYins(rδn),Lin(xδn)(^x−xδn)⟩.

By writing

 Lin(xδn)(^x−xδn)=−rδn−[yδin−Fin(xδn)−Lin(xδn)(^x−xδn)],

we may use the condition , Assumption 3.1 (c), and the properties of to obtain

 ⟨JYins(rδn),Lin(xδn)(^x−xδn)⟩ ≤−∥rδn∥s+∥rδn∥s−1∥yδin−Fin(xδn)−Lin(xδn)(^x−xδn)∥ ≤−∥rδn∥s+∥rδn∥s−1((1+γ)δ+γ∥rδn∥).

Therefore

 Δn ≤2p∗(2c0)p∗−1(μδn)p∗∥Lin(xδn)∗JYins(rδn)∥p∗+εn+εn+1 −μδn∥rδn∥s+μδn∥rδn∥s−1((1+γ)δ+γ∥rδn∥). (3.17)

By the definition of we can see that

 (μδn)p∗∥Lin(xδn)∗JYins(rδn)∥p∗ =μδn(μδn)p∗−1∥Lin(xδn)∗JYins(rδn)∥p∗ ≤βp∗−10μδn∥rδn∥(s−1)p∗(∥rδn∥p+σεn)(p−s)(p∗−1)p ≤βp∗−10μδn(∥rδn∥p+σεn)sp

and

 μδn∥rδn∥s−1((1+γ)δ+γ∥rδn∥) ≤1+γτμδn∥rδn∥s−1(∥rδn∥p+σεn)1p+γμδn∥rδn∥s ≤(1+γτ+γ)μδn(∥rδn∥p+σεn)sp.

Combining the above two estimates with (3.1) we can obtain

 Δn ≤[2p∗(β02c0)p∗−1+1+γτ+γ]μδn(∥rδn∥p+σεn)sp −μδn∥rδn∥s+εn+εn+1. (3.18)

We next consider the term . We claim that

 μδn∥rδn∥s≥1βμδn(∥rδn∥p+σεn)sp−κ~μδnσεn, (3.19)

where is the constant defined by (3.13). Indeed, this is trivial when . We only need to consider the case that . If , then

 μδn∥rδn∥s =~μδn(∥rδn∥p+σεn)p−sp∥rδn∥s≥~μδn∥rδn∥p =~μδn(∥rδn∥p+σεn)−~μδnσεn =μδn(∥rδn∥p+σεn)sp−~μδnσεn.

If , we may use the inequality for and to derive that

 ∥rδn∥s≥1β(∥rδn∥p+σεn)sp−(βps−p−1)p−sp(σεn)sp.

Thus, by using , we have

 μδn∥rδn∥s≥1βμδn(∥rδn∥p+σεn)sp−(βps−p−1)p−sp~μδnσεn.

We therefore obtain (3.19).

Combining (3.1) and (3.19) we thus have

 Δn≤−c1μδn(∥rδn∥p+σεn)sp+κ~μδnσεn+εn+εn+1,

where is the constant defined by (3.12). Since and , we therefore obtain (3.15).

Now we use an induction argument to show that for all . This is trivial when . Assume that there is some such that for . Thus (3.15) holds for all . By taking in (3.15) and summing it over these gives

 Dε