On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems

# On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems

Qinian Jin Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 0200, Australia
22email: Qinian.Jin@anu.edu.au

Min Zhong School of Mathematical Sciences, Fudan University, Shanghai 200433, China
33email: 09110180007@fudan.edu.cn
Min Zhong Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 0200, Australia
22email: Qinian.Jin@anu.edu.au

Min Zhong School of Mathematical Sciences, Fudan University, Shanghai 200433, China
33email: 09110180007@fudan.edu.cn
###### Abstract

In this paper we propose an extension of the iteratively regularized Gauss–Newton method to the Banach space setting by defining the iterates via convex optimization problems. We consider some a posteriori stopping rules to terminate the iteration and present the detailed convergence analysis. The remarkable point is that in each convex optimization problem we allow non-smooth penalty terms including and total variation (TV) like penalty functionals. This enables us to reconstruct special features of solutions such as sparsity and discontinuities in practical applications. Some numerical experiments on parameter identification in partial differential equations are reported to test the performance of our method.

###### Msc:
65J15 65J20 47H17

## 1 Introduction

Inverse problems arise from many practical applications whenever one searches for unknown causes based on observation of their effects. A characteristic property of inverse problems is their ill-posedness in the sense that their solutions do not depend continuously on the data. Due to errors in the measurements, in practical applications one never has the exact data; instead only noisy data are available. Therefore, how to use the noisy data to produce a stable approximate solution is an important topic.

We are interested in solving nonlinear inverse problems in Banach spaces which can be formulated as the nonlinear operator equation

 F(x)=y, (1.1)

where is a nonlinear operator between two Banach spaces and with domain . We will use the same notation to denote the norms of and which should be clear from the context. Let be the only available approximate data to satisfying

 ∥yδ−y∥≤δ (1.2)

with a given small noise level . Due to the ill-posedness, regularization methods should be employed to produce from a stable approximate solution.

When both and are Hilbert spaces and is Fréchet differentiable, a lot of regularization methods have been developed during the last two decades, see BK04 (); Jin2011 (); Jin2012 (); JT09 (); KNS2008 () and the references therein. The iteratively regularized Gauss–Newton method is one of the well known methods and it takes the form (B92 ())

 xδn+1=xδn−(αn+F′(xδn)∗F′(xδn))−1(F′(xδn)∗(F(xδn)−yδ)+αn(xδn−x0)),

where denotes the Fréchet derivative of at , denotes the adjoint of , is an initial guess, and is a sequence of positive numbers satisfying

 αn>0,1≤αnαn+1≤θ% andlimn→∞αn=0 (1.3)

for some constant . When terminated by the discrepancy principle, the regularization property of the iteratively regularized Gauss–Newton method has been studied extensively, see JT09 (); KNS2008 () and references therein. It is worthwhile to point out that is the unique minimizer of the quadratic functional

 ∥yδ−F(xδn)−F′(xδn)(x−xδn)∥2+αn∥x−x0∥2over X. (1.4)

Regularization methods in Hilbert spaces can produce good results when the sought solution is smooth. However, because such methods have a tendency to over-smooth solutions, they may not produce good results in applications where the sought solution has special features such as sparsity or discontinuities. In order to capture the special features, the methods in Hilbert spaces must be modified by incorporating the information of some adapted penalty functionals such as the and the total variation (TV) like functionals, for which the theories in Hilbert space setting are no longer applicable. On the other hand, due to their intrinsic features, many inverse problems are more natural to formulate in Banach spaces than in Hilbert spaces. Therefore, it is necessary to develop regularization methods to solve inverse problems in the framework of Banach spaces with general penalty function.

In this paper we will extend the iteratively regularized Gauss–Newton method to the Banach space setting. Motivated by the variational formulation (1.4) in Hilbert spaces, it is natural to use convex optimization problems to define the iterates. To this end, we take a proper, lower semi-continuous, convex function whose sub-differential is denoted as . By picking an initial guess and , we define

 xδn+1:=argminx∈X{∥yδ−F(xδn)−F′(xδn)(x−xδn)∥p+αnDξ0Θ(x,x0)} (1.5)

where , , and denotes the Bregman distance induced by at in the direction . When and , this method has been considered in KH2010 () under essentially the nonlinearity condition

 ∥(F′(x)−F′(z))h∥≤κ∥F′(z)(x−z)∥1/2∥F′(z)h∥1/2 (1.6)

with the iteration terminated by an a priori stopping rule. It turns out that (1.6) is difficult to verify for nonlinear inverse problems, and the restriction of to the special choice may prevent the method from capturing the special features of solutions. Moreover, since a priori stopping rules depend crucially on the unknown source conditions, it is useless in practical applications. In this paper we will develop a convergence theory on the iteratively regularized Gauss–Newton method in Banach spaces with general convex penalty function . We will propose some a posteriori stopping rules, including the discrepancy principle, to terminate the method and give detailed convergence analysis under reasonable nonlinearity conditions.

This paper is organized as follows. In section 2 we give some preliminary facts on convex analysis. In section 3 we then formulate the iteratively regularized Gauss–Newton method in Banach spaces and propose some a posteriori stopping rules. We show that the method is well-defined and obtain a weak convergence result. In section 4 we derive the rates of convergence when the solution satisfies certain source conditions formulated as variational inequalities. In section 5 we prove a strong convergence result without assuming any source conditions when is a Hilbert spaces and is a -convex function, which is useful for sparsity reconstruction and discontinuity detection. Finally, in section 6 we present some numerical experiments to test our method for parameter identification in partial differential equations.

## 2 Preliminaries

Let be a Banach space with norm . We use to denote its dual space. Given and we write for the duality pairing. If is another Banach space and is a bounded linear operator, we use to denote its adjoint, i.e. for any and .

Let be a convex function. We use to denote its effective domain. We call proper if . Given we define

 ∂Θ(x):={ξ∈X∗:Θ(z)−Θ(x)−⟨ξ,z−x⟩≥0 for all z∈X}

which is called the subgradient of at . It is clear that is convex and closed in for each . The multi-valued mapping is called the subdifferential of . It could happen that for some . We set

 D(∂Θ):={x∈D(Θ):∂Θ(x)≠∅}.

For and we define

 DξΘ(z,x):=Θ(z)−Θ(x)−⟨ξ,z−x⟩,∀z∈X

which is called the Bregman distance induced by at in the direction . Clearly . By direct calculation we can see that

 DξΘ(x2,x)−DξΘ(x1,x)=Dξ1Θ(x2,x1)+⟨ξ1−ξ,x2−x1⟩ (2.1)

for all , , and .

A proper function is said to be -convex for some if there is a constant such that for all and there holds

 Θ(λz+(1−λ)x)+C0λ(1−λ)∥z−x∥p≤λΘ(z)+(1−λ)Θ(x).

It can be shown that is -convex if and only if there is a constant such that

 ∥z−x∥≤γ[DξΘ(z,x)]1p (2.2)

for all , and .

For a proper, lower semi-continuous, convex function we can define its Fenchel conjugate

 Θ∗(ξ):=supx∈X{⟨ξ,x⟩−Θ(x)},ξ∈X∗.

It is well known that is also proper, lower semi-continuous, and convex. If, in addition, is reflexive, then if and only if . When is -convex satisfying (2.2) with , it follows from (Z2002, , Corollary 3.5.11) that , is Fréchet differentiable and its gradient satisfies

 ∥∇Θ∗(ξ)−∇Θ∗(η)∥≤γpp−1∥ξ−η∥1p−1,∀ξ,η∈X∗. (2.3)

Many examples of -convex functions can be provided by functions of the norms in -convex Banach spaces. We say a Banach space is -convex with if there is a positive constant such that for all , where

 δX(ε):=inf{2−∥x+z∥:x,z∈X,∥x∥=∥z∥=1 and ∥x−z∥≥ε}

is the modulus of convexity of . According to a characterization of uniform convexity of Banach spaces in XR91 (), it is easy to see that, for any , the functional

 Θ(x):=∥x−x0∥p

is -convex and its subgradient at is given by , where denotes the duality mapping of with gauge function which is defined for each by

 Jp(x):={ξ∈X∗:∥ξ∥=∥x∥p−1 and ⟨ξ,x⟩=∥x∥p}.

The sequence spaces , the Lebesgue spaces , the Sobolev spaces and the Besov spaces with are the most commonly used function spaces that are -convex (Adams (); C1990 ()).

Given a proper, lower semi-continuous, -convex function on , we can produce such new functions by adding any available proper, lower semi-continuous, convex functions to . In this way, we can construct non-smooth -convex functions that can be used to detect special features of solutions when solving inverse problems. For instance, let , where is a bounded domain in . It is clear that the functional

 x→∫Ω|x(ω)|2dω

is -convex on . By adding the function to the multiple of the above function we can obtain the -convex function

 Θ1(x):=λ∫Ω|x(ω)|2dω+∫Ω|x(ω)|dω

with small which is useful for sparsity recovery (T96 ()). Similarly, we may produce on the -convex function

 Θ2(x):=λ∫Ω|x(ω)|2dω+∫Ω|Dx|,

where denotes the total variation of over that is defined by (G84 ())

 ∫Ω|Dx|:=sup{∫Ωxdivφdω:φ∈C10(Ω;RN) and ∥φ∥L∞(Ω)≤1}.

This functional is useful for detecting the discontinuities, in particular, when the solutions are piecewise-constant (ROF92 ()).

## 3 The method and its weak convergence

In this section we formulate the iteratively regularized Gauss–Newton method in the framework of Banach spaces to produce a stable approximate solution of (1.1) from an available noisy data satisfying (1.2). In order to capture the features of solutions, we take a proper, lower semi-continuous, -convex function with ; we assume that satisfies (2.2) and . We will work under the following conditions on the nonlinear operator .

###### Assumption 3.1
1. is a closed convex set in and the equation (1.1) has a solution ;

2. There is such that for each there is a bounded linear operator such that

 limt↘0F(x+t(z−x))−F(x)t=F′(x)(z−x),∀z∈Bρ(x†)∩D(F),

where ;

3. The operator is properly scaled so that ;

4. There exist two constants and such that

 ∥[F′(z)−F′(x)]w∥≤K0∥z−x∥∥F′(x)w∥+K1∥F′(x)(z−x)∥∥w∥

for all and .

It is easy to see that condition (b) in Assumption 3.1 implies, for any , that the function is differentiable and

 ddtF(x+t(z−x))=F′(x+t(z−x))(z−x).

The condition (d) was first formulated in JT09 (). In section 6 we will present several examples from the parameter identification in partial differential equations to indicate that this condition indeed can be verified for a wide range of applications. As direct consequences of (b) and (d), we have for that

 ∥F(z)−F(x)−F′(x)(z−x)∥≤12(K0+K1)∥z−x∥∥F′(x)(z−x)∥

and

 ∥F(z)−F(x)−F′(z)(z−x)∥≤32(K0+K1)∥z−x∥∥F′(x)(z−x)∥.

In order to formulate the method, let

 χD(F)(x)={0,x∈D(F),+∞,x∉D(F)

be the characteristic function of and define

 ΘF(x):=Θ(x)+χD(F)(x). (3.1)

Since is closed and convex, is a proper, lower semi-continuous, convex function on . Consequently, is a proper, lower semi-continuous, -convex function on satisfying

 ∥z−x∥≤γ[DξΘF(z,x)]1p,∀z∈X,x∈D(∂ΘF) and ξ∈∂ΘF(x). (3.2)

We pick and define , where denotes the Fenchel conjugate of and is known to be Fréchet differentiable with gradient . We have and . Consequently

 x0=argminx∈X{ΘF(x)−⟨ξ0,x⟩}=argminx∈D(F){Θ(x)−⟨ξ0,x⟩}.

We use and as initial data. We then pick a sequence of positive numbers satisfying (1.3) and define successively by setting and letting be the unique minimizer of the convex minimization problem

 minx∈X{∥yδ−F(xδn)−F′(xδn)(x−xδn)∥p+αnDξ0ΘF(x,x0)}. (3.3)

By the properties of , is uniquely defined and .

Considering the practical applications, the iteration must be terminated by some a posteriori stopping rule to output an integer and hence which is used as an approximate solution of (1.1). In this paper we will consider the following three stopping rules.

###### Rule 3.1

Let be a given number. We define to be the integer such that

 ∥F(xδnδ)−yδ∥≤τδ<∥F(xδn)−yδ∥,0≤n
###### Rule 3.2

Let be a given number. If we define ; otherwise we define to be the first integer such that

 12(∥F(xδnδ)−yδ∥+∥F(xδnδ−1)−yδ∥)≤τδ.
###### Rule 3.3

Let be a given number. If we define ; otherwise we define to be the first integer such that

 max{∥F(xδnδ)−yδ∥,∥F(xδnδ−1)−yδ∥}≤τδ. (3.4)

Rule 3.1 is known as the discrepancy principle and is widely used to terminate regularization methods. Rule 3.3 appeared first in K99 () to deal with some Newton-type regularization methods in Hilbert spaces. It is easy to see that Rule 3.1 terminates the iteration no later than Rule 3.2, and Rule 3.2 terminates the iteration no later than Rule 3.3. Most of the results in this paper are true for Rule 3.1 except the ones in Section 4 concerning the rates of convergence under certain source conditions formulated as variational inequalities; the convergence rates, however, can be derived when the iteration is terminated by either Rule 3.2 or Rule 3.3.

In this section we show that the method together with any one of the above three stopping rules with is well-defined. To this end, we introduce the integer defined by

 α^nδ≤μ−pδp∥ξ0−ξ†∥p∗<αn,0≤n<^nδ, (3.5)

where is the number conjugate to , i.e. , the number is chosen to satisfy

 γ1p−1θ2pμ−1<τ−12. (3.6)

and is the unique element that realizes the distance from to the closed convex set in , i.e.

 d(ξ0,∂ΘF(x†))=∥ξ0−ξ†∥.

Because the sequence satisfies (1.3), the integer exists and is finite. We will show that for all and for the integer defined by any one of the above three stopping rules. For simplicity of presentation, we use the notation . We also use to denote a universal constant that is independent of and when its explicit formula is not important.

###### Lemma 1

Let and be Banach spaces, let be a proper, lower semi-continuous, -convex function with , let be a sequence satisfying (1.3), and let satisfy Assumption 3.1. If and is sufficiently small, then and

 ∥xδn−x†∥ ≤(p∗γμ+2γpp−1)∥ξ0−ξ†∥1p−1, (3.7) ∥T(xδn−x†)∥ ≤(3μ+γ1p−1)θ1p∥ξ0−ξ†∥1p−1α1pn (3.8)

for all . Moreover, for the integer defined by either Rule 3.1, 3.2, or 3.3 with .

###### Proof

Since implies , from the definition of and (2.3) it follows that

 ∥x0−x†∥≤γpp−1∥ξ0−ξ†∥1p−1<ρ.

Thus and (3.7) holds. In view of the scaling condition we can obtain

 ∥Te0∥≤γ1p−1α1p0∥ξ0−ξ†∥1p−1. (3.9)

Therefore the result holds for . Now we assume that the estimates for have been proved for some and show that the estimates for are also true. By the minimizing property of we have

 ∥yδ−F (xδn)−F′(xδn)(xδn+1−xδn)∥p+αnDξ0ΘF(xδn+1,x0) ≤∥yδ−F(xδn)−F′(xδn)(x†−xδn)∥p+αnDξ0ΘF(x†,x0).

By using the identity (2.1) we have

 Dξ0ΘF(xδn+1,x0)−Dξ0ΘF(x†,x0)=Dξ†ΘF(xδn+1,x†)−⟨ξ0−ξ†,xδn+1−x†⟩.

Therefore, it follows from the above inequality that

 ∥yδ −F(xδn)−F′(xδn)(xδn+1−xδn)∥p+αnDξ†ΘF(xδn+1,x†) ≤∥yδ−F(xδn)−F′(xδn)(x†−xδn)∥p+αn⟨ξ0−ξ†,xδn+1−x†⟩. (3.10)

In view of the Young’s inequality for , and , we have

 ⟨ξ0−ξ†,xδn+1−x†⟩ ≤1p(γ−1∥eδn+1∥)p+1p∗(γ∥ξ0−ξ†∥)p∗.

Combining this with (3) and using the -convexity of , we can obtain

 ∥yδ −F(xδn)−F′(xδn)(xδn+1−xδn)∥p+1p∗αn(γ−1∥eδn+1∥)p ≤∥yδ−F(xδn)−F′(xδn)(x†−xδn)∥p+1p∗αn(γ∥ξ0−ξ†∥)p∗.

By using the fact that for and , we have from the above inequality that

 (3.11)

and

 ∥yδ−F(xδn)−F′(xδn)(xδn+1−xδn)∥ ≤∥yδ−F(xδn)−F′(xδn)(x†−xδn)∥ +(1p∗γp∗αn∥ξ0−ξ†∥p∗)1p. (3.12)

By using and Assumption 3.1 we have

 ∥yδ−F(xδn)−F′(xδn)(x†−xδn)∥≤δ+32(K0+K1)∥eδn∥∥Teδn∥. (3.13)

Since , it follows from (3.5) that

 δ≤μ∥ξ0−ξ†∥1p−1α1pn. (3.14)

In view of the induction hypotheses we thus have

 ∥ yδ−F(xδn)−F′(xδn)(x†−xδn)∥≤(μ+CE)∥ξ0−ξ†∥1p−1α1pn.

Combining this with (3.11) gives

 ∥eδn+1∥ ≤((p∗)1pγμ+γpp−1+CE)∥ξ0−ξ†∥1p−1.

Therefore, if is sufficiently small, then

 ∥eδn+1∥≤(p∗γμ+2γpp−1)∥ξ0−ξ†∥1p−1<ρ.

Next we estimate . From (3) and (3.13) it follows that

 ∥yδ−F(xδn)−F′(xδn)(xδn+1−xδn)∥ ≤δ+32(K0+K1)∥eδn∥∥Teδn∥ +(1p∗γp∗αn∥ξ0−ξ†∥p∗)1p. (3.15)

Observing that

 ∥yδ−y−Teδn+1∥ ≤∥yδ−F(xδn)−F′(xδn)(xδn+1−xδn)∥ +∥y−F(xδn)−F′(xδn)(x†−xδn)∥ +∥(T−F′(xδn))eδn+1∥. (3.16)

Thus, we may use Assumption 3.1, (3), and the estimates on and to derive that

 ∥yδ−y−Teδn+1∥ ≤δ+CE∥Teδn∥+CE∥Teδn+1∥+(1p∗γp∗αn∥ξ0−ξ†∥p∗)1p. (3.17)

Therefore, by using the induction hypothesis on , the fact , and (3.14), we can obtain for sufficiently small that

 ∥Teδn+1∥ ≤(3μ+γ1p−1)(θαn+1∥ξ0−ξ†∥p∗)1p.

We therefore obtain the desired estimates (3.7) and (3.8).

Finally we show that . We first claim that for there holds

 ∥yδ−y−Teδn∥ ≤δ+(γ1p−1θ1p+CE)∥ξ0−ξ†∥1p−1α1pn.

In fact, for this inequality follows from (1.2) and (3.9), and for it follows from (3.17), (3.8) and (1.3). Therefore, by using Assumption 3.1 and the estimates (3.7) and (3.8), we can obtain

 ∥yδ−F(xδn)∥ ≤∥yδ−y−Teδn∥+∥y−F(xδn)+Teδn∥ ≤δ+(γ1p−1θ1p+CE)∥ξ0−ξ†∥1p−1α1pn. (3.18)

If , then . Therefore

 ∥F(x0)−yδ∥≤δ+(γ1p−1θ1pμ−1+CE)δ.

In view of (3.6) we have for sufficiently small that . Consequently .

In the following we assume that . Observing from (1.3) and (3.5) that for and there holds

 α1pn≤(θα^nδ)1p≤μ−1θ1p∥ξ0−ξ†∥−1p−1δ.

Thus, from (3) we have for and that

 ∥yδ−F(xδn)∥ ≤(1+CE)δ+γ1p−1θ2pμ−1δ.

Since is chosen to satisfy (3.6), we have for sufficiently small that

 ∥yδ−F(xδn)∥≤τδfor n=^nδ and ^nδ−1.

Therefore, by the definition of we have .

###### Remark 1

We use in (3.3) to guarantee that without assuming is an interior point of . If is an interior point of so that for a ball of radius , we can replace in (3.3) by and define to be the unique minimizer of the convex minimization problem

 minx∈X{∥yδ−F(xδn)−F′(xδn)(x−xδn)∥p+αn