Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals

# Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals

Qinian Jin and Wei Wang Mathematical Sciences Institute, Australian National University, Canberra, ACT 0200, Australia College of Mathematics, Physics and Information Engineering, Jiaxing University, Zhejiang 314001, China School of Mathematical Sciences, Fudan University, Shanghai 200433, China
###### Abstract

The determination of solutions of many inverse problems usually requires a set of measurements which leads to solving systems of ill-posed equations. In this paper we propose the Landweber iteration of Kaczmarz type with general uniformly convex penalty functional. The method is formulated by using tools from convex analysis. The penalty term is allowed to be non-smooth to include the and total variation (TV) like penalty functionals, which are significant in reconstructing special features of solutions such as sparsity and piecewise constancy in practical applications. Under reasonable conditions, we establish the convergence of the method. Finally we present numerical simulations on tomography problems and parameter identification in partial differential equations to indicate the performance.

Qinian.Jin@anu.edu.au and weiwangmath@gmail.com

## 1 Introduction

Landweber iteration is one of the most well-known regularization methods for solving inverse problems formulated in Hilbert spaces. A complete account on this method for linear inverse problems can be found in [5] including the convergence analysis and its various accelerated versions. A nonlinear version of Landweber iteration was proposed in [10] for solving nonlinear inverse problems, where an elegant convergence analysis was present. Although Landweber iteration converges slowly, it still receives a lot of attention because it is simple to implement and is robust with respect to noise.

The classical Landweber iteration in Hilbert spaces, however, has the tendency to over-smooth solutions which makes it difficult to capture the special features of the sought solutions such as sparsity and piecewise constancy. It is therefore necessary to reformulate this method either in Banach space setting or in a manner that modern non-smooth penalty functionals, such as the and total variation like functionals, can be incorporated.

Let be a linear compact operator between two Banach spaces and with norms whose dual spaces are denoted by and respectively. Some recent advances on Landweber iteration for linear inverse problems

 Ax=y (1.1)

in Banach space setting have been reported using only the noisy data satisfying

 ∥yδ−y∥≤δ

with a small known noise level . In particular, when is uniformly smooth and uniformly convex, by virtue of the duality mappings, a version of Landweber iteration for solving (1.1) was proposed in [15]. Although the method excludes the use of the and total variation like penalty functionals, new ideas were introduced in [15] which promote the study of Landweber iteration in modern setup. Recently a version of Landweber iteration was proposed in [2] using non-smooth uniformly convex penalty functionals. Let be a proper, lower semi-continuous, uniformly convex functional, then the method in [2] reads as

 {ξn+1=ξn−μnA∗Jr(Axn−yδ),xn+1=argminx∈X{Θ(x)−⟨ξn+1,x⟩}, (1.2)

where denotes the adjoint of , with is the duality mapping of with gauge function , are suitable chosen step-lengths, and denotes the duality pairing between and . The method (1.2) reduces to the one in [15] when taking with . However, (1.2) has more freedom on so that it can be used to detect special features of solutions.

The convergence analysis of (1.2) is given in [2] when it is terminated by the discrepancy principle

 ∥Axnδ−yδ∥≤τδ<∥Axn−yδ∥,0≤n

with . The argument in [2], however, requires that , the interior of , must be non-empty and that (1.1) must have a solution in . These conditions are indeed quite restrictive; for instance, the domain of the total variation like functional

 Θ(x):=12β∫Ω|x(ω)|2dω+∫Ω|Dx|

with over the space on a bounded domain does not have any interior point in , where

 ∫Ω|Dx|:=sup{∫Ωxdivfdω:f∈C10(Ω;RN) and ∥f∥L∞(Ω)≤1}

denotes the total variation of over ([6]). Therefore, the theoretical result in [2] can not be applied to this important penalty functional.

It is natural to ask if the convergence of (1.2) can be proved without assuming . An affirmative answer would theoretically justify the applicability of (1.2) to a wider class of penalty functionals including the total variation like functionals. The control of presents one of the major challenges. The analysis in [2] is based on proving the boundedness of in which consequently enforces to assume that . We observe that the boundedness of is not essential in the convergence analysis, the most essential ingredient is to control for any solution of (1.1). Due to the lack of monotonicity on the residual , it turns out to be difficult to consider for all . Fortunately, with a careful chosen subsequence of integers, we can derive what we expect on which together with some monotonicity results enables us to prove a stronger result, i.e. converges to a solution of (1.1) in Bregman distance.

Instead of considering (1.2) for solving (1.1) directly, we consider a more general setup in which (1.2) is extended for solving linear as well as nonlinear inverse problems. Instead of studying a single equation, we consider the system

 Fi(x)=yi,i=0,⋯,N−1 (1.4)

consisting of equations, where, for each , is an operator between two Banach spaces and . Such systems arise naturally in many practical applications including various tomography techniques using multiple exterior measurements. By introducing

 F:=(F0,⋯,FN−1):D=N−1⋂i=0D(Fi)→Y0×⋯×YN−1

and

 y:=(y0,⋯,yN−1),

the system (1.4) could be reformulated as a single equation . One might consider extending (1.2) to solve directly. This procedure, however, becomes inefficient if is large because it destroys the special structure of (1.4) and results in an equation requiring huge memory to save the intermediate computational results. Therefore, it seems advantageous to use the Kaczmarz-type methods, which cyclically consider each equation in (1.4) separately and hence require only reasonable memory consumption.

Some Landweber-Kaczmarz methods were formulated in [11, 7] for solving the system (1.4) when and are Hilbert spaces, and the numerical results indicate that artefacts can appear in the reconstructed solutions due to oversmoothness. Recently a Landweber-Kaczmarz method was proposed in [13] for solving (1.4) in Banach space setting in the spirit of [15] and hence the possible use of the and total variation like penalty functionals is excluded. Furthermore, the convergence analysis in [13] unfortunately contains an error (see the first line on page 12 in [13]). In this paper, we propose a Landweber iteration of Kaczmarz type in which (1.2) is adapted to solve each equation in (1.4) and thus general non-smooth uniformly convex penalty functionals are incorporated into the method with the hope of removing artefacts and of capturing special features of solutions. We give the detailed convergence analysis of our method. It is worthy pointing out that our analysis does not require the interior of be nonempty and therefore the convergence result applies for the total variation like penalty functionals.

This paper is organized as follows. In section 2 we give some preliminary results from convex analysis. In section 3, we first formulate the Landweber iteration of Kaczmarz type with general uniformly convex penalty term for solving the system (1.4), and then present the detail convergence analysis. In section 4 we give the proof of an important proposition which plays an important role in section 3. Finally, in section 5 we present some numerical simulations on tomography problems in imaging and parameter identification in partial differential equations to test the performance of the method.

## 2 Preliminaries

Let be a Banach space with norm . We use to denote its dual space, and for any and we write for the duality pairing. If is another Banach space and is a bounded linear operator, we use to denote its adjoint, i.e. for any and . Let be the null space of and let

 N(A)⊥:={ξ∈X∗:⟨ξ,x⟩=0 % for all x∈N(A)}

be the annihilator of . When is reflexive, there holds

 N(A)⊥=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯R(A∗),

where denotes the closure of , the range space of , in .

Given a convex function , we use

 D(Θ):={x∈X:Θ(x)<+∞}

to denote its effective domain. It is called proper if . The subgradient of at is defined as

 ∂Θ(x):={ξ∈X∗:Θ(z)−Θ(x)−⟨ξ,z−x⟩≥0 for all z∈X}.

The multi-valued mapping is called the subdifferential of . We set

 D(∂Θ):={x∈D(Θ):∂Θ(x)≠∅}.

For and we define ([3])

 DξΘ(z,x):=Θ(z)−Θ(x)−⟨ξ,z−x⟩,∀z∈X

which is called the Bregman distance induced by at in the direction . Clearly and

 DξΘ(x2,x)−DξΘ(x1,x)=Dξ1Θ(x2,x1)+⟨ξ1−ξ,x2−x1⟩ (2.1)

for all , , and .

Bregman distance can be used to obtain information under the Banach space norm when has stronger convexity. A proper convex function is called uniformly convex if there is a continuous increasing function , with the property that implies , such that

 Θ(λ¯x+(1−λ)x)+λ(1−λ)h(∥¯x−x∥)≤λΘ(¯x)+(1−λ)Θ(x)

for all and . If can be taken as for some and , then is called -convex. It can be shown that if is uniformly convex then

 DξΘ(¯x,x)≥h(∥¯x−x∥)

for all , and . In particular, if is -convex with , then

 DξΘ(¯x,x)≥c0∥¯x−x∥p (2.2)

for all , and .

For a proper, lower semi-continuous, convex function , its Legendre-Fenchel conjugate is defined by

 Θ∗(ξ):=supx∈X{⟨ξ,x⟩−Θ(x)},ξ∈X∗.

It is well known that is also proper, lower semi-continuous, and convex. If, in addition, is reflexive, then

 ξ∈∂Θ(x)⟺x∈∂Θ∗(ξ)⟺Θ(x)+Θ∗(ξ)=⟨ξ,x⟩. (2.3)

When is -convex satisfying (2.2) with , it follows from [19, Corollary 3.5.11] that , is Fréchet differentiable and its gradient satisfies

 ∥∇Θ∗(ξ1)−∇Θ∗(ξ2)∥≤(∥ξ1−ξ2∥2c0)1p−1,∀ξ1,ξ2∈X∗. (2.4)

Moreover

 Θ∗(ξ2)−Θ∗(ξ1)−⟨ξ2−ξ1,∇Θ∗(ξ1)⟩≤1p∗(2c0)p∗−1∥ξ2−ξ1∥p∗ (2.5)

for any , where is the number conjugate to , i.e. . By the subdifferential calculus, there also holds

 x=∇Θ∗(ξ)⟺x=argminz∈X{Θ(z)−⟨ξ,z⟩}. (2.6)

On a Banach space , we consider for the convex function . Its subdifferential at is given by

 JXr(x):={ξ∈X∗:∥ξ∥=∥x∥r−1% and ⟨ξ,x⟩=∥x∥r}

which gives the duality mapping of with gauge function . The duality mapping , for each , is single valued and uniformly continuous on bounded sets if is uniformly smooth in the sense that its modulus of smoothness

 ρX(s):=sup{∥¯x+x∥+∥¯x−x∥−2:∥¯x∥=1,∥x∥≤s}

satisfies .

In many practical applications, proper, weakly lower semi-continuous, -convex functions can be easily constructed. For instance, consider , where and is a bounded domain in . It is known that the functional

 Θ0(x):=∫Ω|x(ω)|pdω

is -convex on . We can construct the new -convex functions

 Θ(x):=μ∫Ω|x(ω)|pdω+a∫Ω|x(ω)|dω+b∫Ω|Dx|, (2.7)

where , , and denotes the total variation of over . For and the corresponding function is useful for sparsity reconstruction ([17]); while for and the corresponding function is useful for detecting the discontinuities, in particular, when the solutions are piecewise-constant ([14]).

## 3 Landweber iteration of Kaczmarz type

We consider the system (1.4), i.e.

 Fi(x)=yi,i=0,⋯,N−1 (3.1)

consisting of equations, where, for each , is an operator between two Banach spaces and . We will assume that

 D:=N−1⋂i=0D(Fi)≠∅

and each is Fréchet differentiable with the Fréchet derivative denoted by for . We will also assume that (3.1) has a solution. In general, (3.1) may have many solutions. In order to find the desired one, some selection criteria should be enforced. We choose a proper, lower semi-continuous, -convex function . By picking and as the initial guess, which may incorporate some available information on the sought solution, we define to be the solution of (3.1) with the property

 Dξ0Θ(x†,x0):=minx∈D(Θ)∩D{Dξ0Θ(x,x0):Fi(x)=yi,i=0,⋯,N−1}. (3.2)

We will work under the following conditions on the operators where .

###### Assumption 3.1
1. There is such that and (3.1) has a solution in ;

2. Each operator is weakly closed on and is Fréchet differentiable on , and is continuous on .

3. Each is properly scaled so that for .

4. There exists such that

 ∥Fi(x)−Fi(¯x)−F′i(¯x)(x−¯x)∥≤η∥Fi(x)−Fi(¯x)∥

for all and .

All the conditions in Assumption 3.1 are standard. Condition (d) is called the tangential cone condition and is widely used in the analysis of regularization methods for solving nonlinear ill-posed inverse problems ([10]) The weakly closedness of over in (b) means that if converges weakly to some and converges weakly to some , then and .

When is a reflexive Banach space, by using the -convexity and the weakly lower semi-continuity of together with the weakly closedness of for it is standard to show that exists. The following result shows that is in fact uniquely defined.

###### Lemma 3.2

Let be reflexive and satisfy Assumption 3.1. If , then is the unique solution of (3.1) in satisfying (3.2).

Proof. Assume that (3.1) has another solution in satisfying (3.2) with . Since for , we can use Assumption 3.1 (d) to derive that

 F′i(x†)(^x−x†)=0,i=0,⋯,N−1.

Let for . Then and

 F′i(x†)(xλ−x†)=0,i=0,⋯,N−1.

Thus we can use Assumption 3.1 (d) to conclude that

 ∥Fi(xλ)−Fi(x†)∥≤η∥Fi(xλ)−Fi(x†)∥.

Since , this implies that for . Consequently, by the minimal property of we have

 Dξ0Θ(xλ,x0)≥Dξ0Θ(x†,x0). (3.3)

On the other hand, it follows from the strictly convexity of that

 Dξ0Θ(xλ,x0)<λDξ0Θ(^x,x0)+(1−λ)Dξ0Θ(x†,x0)=Dξ0Θ(x†,x0)

for which is a contradiction to (3.3). □

In practical application, instead of we only have noisy data satisfying

 ∥yδi−yi∥≤δ,i=0,⋯,N−1

with a small known noise level . We will use , , to construct an approximate solution to (3.1). We assume that each is uniformly smooth so that, for each , the duality mapping is single valued and continuous. By introducing a proper, lower-semi continuous, -convex function satisfying (2.2) for some constant , we propose the following Landweber iteration of Kaczmarz type:

###### Algorithm 3.3
1. Pick and set ;

2. Let and . Assume that and are defined for some , we set , , and define

 ξδn,i+1 =ξδn,i−μδn,iF′i(xδn,i)∗JYir(Fi(xδn,i)−yδi), xδn,i+1 =argminx∈X{Θ(x)−⟨ξδn,i+1,x⟩}

for , where

 μδn,i={μ0∥Fi(xδn,i)−yδi∥p−r, if ∥Fi(xδn,i)−yδi∥>τδ,0, otherwise

for some . We then define and .

3. Let be the first integer such that

 μδnδ,i=0for all i=0,⋯,N−1

and use to approximate the solution of (3.1).

In Algorithm 3.3, is defined as the minimizer of a -convex functional over which is independent of and therefore it could be found by efficient solvers. By using (2.6), one can see that

 xδn,i+1=∇Θ∗(ξδn,i+1)

which is useful for the forthcoming theoretical analysis.

In this section we will show that Algorithm 3.3 is well-defined by showing that is finite and establish a convergence result on as .

###### Lemma 3.4

Let Assumption 3.1 hold and let be a proper, lower-semi continuous, -convex function with satisfying (2.2) for some . Assume that

 Dξ0Θ(x†,x0)≤c0ρp. (3.4)

Let and be defined by Algorithm 3.3 with and such that

 c1:=1−η−1+ητ−p−1p(μ02c0)1p−1>0.

Then and for all and . Moreover, for any solution of (3.1) in and all there hold

 Dξδn+1Θ(^x,xδn+1)≤DξδnΘ(^x,xδn), (3.5) c1N−1∑i=0μδn,i∥Fi(xδn,i)−yδi∥r≤DξδnΘ(^x,xδn)−Dξδn+1Θ(^x,xδn+1). (3.6)

Proof. In order to obtain (3.5) and (3.6), it suffices to show that and

 Dξδn,i+1Θ(^x,xδn,i+1)−Dξδn,iΘ(^x,xδn,i)≤−c1μδn,i∥Fi(xδn,i)−yδi∥r (3.7)

for all and . From the definition of Bregman distance and (2.3) it follows that

 Dξδn,i+1Θ(^x,xδn,i+1)−Dξδn,iΘ(^x,xδn,i) =Θ(xδn,i)−Θ(xδn,i+1)−⟨ξδn,i+1,^x−xδn,i+1⟩+⟨ξδn,i,^x−xδn,i⟩ =Θ∗(ξδn,i+1)−Θ∗(ξδn,i)−⟨ξδn,i+1−ξδn,i,^x⟩.

Using , we can write

 Dξδn,i+1Θ(^x,xδn,i+1)−Dξδn,iΘ(^x,xδn,i) =Θ∗(ξδn,i+1)−Θ∗(ξδn,i) −⟨ξδn,i+1−ξδn,i,∇Θ∗(ξδn,i)⟩ +⟨ξδn,i+1−ξδn,i,xδn,i−^x⟩.

Since is -convex, we may use (2.5) to obtain

 Dξδn,i+1Θ(^x,xδn,i+1)−Dξδn,iΘ(^x,xδn,i)≤1p∗(2c0)p∗−1∥ξδn,i+1−ξδn,i∥p∗ −μδn,i⟨JYir(Fi(xδn,i)−yδi),F′i(xδn,i)(xδn,i−^x)⟩,

where . By using the properties of the duality mapping and Assumption 3.1 it follows that

 Dξδn,i+1Θ(^x,xδn,i+1)−Dξδn,iΘ(^x,xδn,i) ≤μδn,i∥Fi(xδn,i)−yδi∥r−1∥yδi−Fi(xδn,i)−F′i(xδn,i)(^x−xδn,i)∥ −μδn,i∥Fi(xδn,i)−yδi∥r+1p∗(2c0)p∗−1∥ξδn,i+1−ξδn,i∥p∗ ≤(1+η)μδn,i∥Fi(xδn,i)−yδi∥r−1δ−(1−η)μδn,i∥Fi(xδn,i)−yδi∥r +1p∗(2c0)p∗−1(μδn,i)p∗∥F′i(xδn,i)∗JYir(Fi(xδn,i)−yδi)∥p∗. (3.8)

According to the definition of , the scaling condition in Assumption 3.1 (c), and the property of , it is easy to see that

 μδn,i∥Fi(xδn,i)−yδi∥r−1δ ≤1τμδn,i∥Fi(xδn,i)−yδi∥r, (μδn,i)p∗−1∥F′i(xδn,i)∗JYir(Fi(xδn,i)−yδi)∥p∗ ≤μp∗−10∥Fi(xδn,i)−yδi∥r.

Combining these two inequalities with (3) we can obtain (3.7). To show we first use (3.7) with and (3.4) to obtain

 Dξδn,i+1Θ(x†,xδn,i+1)≤Dξ0Θ(x†,x0)≤c0ρp.

In view of (2.2), we then have and . Consequently .

We next show . According to the definition of , for any there is at least one such that . Consequently

 μδn,in=μ0∥Fin(xδn,in)−yδin∥p−r

and

 N−1∑i=0μδn,i∥Fi(xδn,i)−yδi∥r≥μ0∥Fin(xδn,in)−yδin∥p>μ0τpδp.

By summing (3.6) over from to for any and using the above inequality we obtain . Since this is true for any , we must have . □

When using the exact data instead of the noisy data in Algorithm 3.3, we will drop the superscript in all the quantities involved, for instance, we will write as , as , and so on. Observing that

 μn,i∥Fi(xn,i)−yi∥r=μ0∥Fi(xn,i)−yi∥p.

The proof of Lemma 3.4 in fact shows that, under Assumption 3.1, if

 c2:=1−η−p−1p(μ02c0)1p−1>0,

then

 xn,i∈B2ρ(x0)∀n≥0 and i=0,⋯,N−1

and for any solution of (3.1) in and all there hold

 Dξn+1Θ(^x,xn+1)≤DξnΘ(^x,xn), (3.9) c2μ0N−1∑i=0∥Fi(xn,i)−yi∥p≤DξnΘ(^x,xn)−Dξn+1Θ(^x,xn+1). (3.10)

These two inequalities imply immediately that

 limn→∞N−1∑i=0∥Fi(xn,i)−yi∥p=0. (3.11)

The next result gives an estimate on and shows that as for all