On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems

On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems

Qinian Jin Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 0200, Australia
22email: Qinian.Jin@anu.edu.au

Min Zhong School of Mathematical Sciences, Fudan University, Shanghai 200433, China
33email: 09110180007@fudan.edu.cn
   Min Zhong Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 0200, Australia
22email: Qinian.Jin@anu.edu.au

Min Zhong School of Mathematical Sciences, Fudan University, Shanghai 200433, China
33email: 09110180007@fudan.edu.cn
Abstract

In this paper we propose an extension of the iteratively regularized Gauss–Newton method to the Banach space setting by defining the iterates via convex optimization problems. We consider some a posteriori stopping rules to terminate the iteration and present the detailed convergence analysis. The remarkable point is that in each convex optimization problem we allow non-smooth penalty terms including and total variation (TV) like penalty functionals. This enables us to reconstruct special features of solutions such as sparsity and discontinuities in practical applications. Some numerical experiments on parameter identification in partial differential equations are reported to test the performance of our method.

Msc:
65J15 65J20 47H17

1 Introduction

Inverse problems arise from many practical applications whenever one searches for unknown causes based on observation of their effects. A characteristic property of inverse problems is their ill-posedness in the sense that their solutions do not depend continuously on the data. Due to errors in the measurements, in practical applications one never has the exact data; instead only noisy data are available. Therefore, how to use the noisy data to produce a stable approximate solution is an important topic.

We are interested in solving nonlinear inverse problems in Banach spaces which can be formulated as the nonlinear operator equation

(1.1)

where is a nonlinear operator between two Banach spaces and with domain . We will use the same notation to denote the norms of and which should be clear from the context. Let be the only available approximate data to satisfying

(1.2)

with a given small noise level . Due to the ill-posedness, regularization methods should be employed to produce from a stable approximate solution.

When both and are Hilbert spaces and is Fréchet differentiable, a lot of regularization methods have been developed during the last two decades, see BK04 (); Jin2011 (); Jin2012 (); JT09 (); KNS2008 () and the references therein. The iteratively regularized Gauss–Newton method is one of the well known methods and it takes the form (B92 ())

where denotes the Fréchet derivative of at , denotes the adjoint of , is an initial guess, and is a sequence of positive numbers satisfying

(1.3)

for some constant . When terminated by the discrepancy principle, the regularization property of the iteratively regularized Gauss–Newton method has been studied extensively, see JT09 (); KNS2008 () and references therein. It is worthwhile to point out that is the unique minimizer of the quadratic functional

(1.4)

Regularization methods in Hilbert spaces can produce good results when the sought solution is smooth. However, because such methods have a tendency to over-smooth solutions, they may not produce good results in applications where the sought solution has special features such as sparsity or discontinuities. In order to capture the special features, the methods in Hilbert spaces must be modified by incorporating the information of some adapted penalty functionals such as the and the total variation (TV) like functionals, for which the theories in Hilbert space setting are no longer applicable. On the other hand, due to their intrinsic features, many inverse problems are more natural to formulate in Banach spaces than in Hilbert spaces. Therefore, it is necessary to develop regularization methods to solve inverse problems in the framework of Banach spaces with general penalty function.

In this paper we will extend the iteratively regularized Gauss–Newton method to the Banach space setting. Motivated by the variational formulation (1.4) in Hilbert spaces, it is natural to use convex optimization problems to define the iterates. To this end, we take a proper, lower semi-continuous, convex function whose sub-differential is denoted as . By picking an initial guess and , we define

(1.5)

where , , and denotes the Bregman distance induced by at in the direction . When and , this method has been considered in KH2010 () under essentially the nonlinearity condition

(1.6)

with the iteration terminated by an a priori stopping rule. It turns out that (1.6) is difficult to verify for nonlinear inverse problems, and the restriction of to the special choice may prevent the method from capturing the special features of solutions. Moreover, since a priori stopping rules depend crucially on the unknown source conditions, it is useless in practical applications. In this paper we will develop a convergence theory on the iteratively regularized Gauss–Newton method in Banach spaces with general convex penalty function . We will propose some a posteriori stopping rules, including the discrepancy principle, to terminate the method and give detailed convergence analysis under reasonable nonlinearity conditions.

This paper is organized as follows. In section 2 we give some preliminary facts on convex analysis. In section 3 we then formulate the iteratively regularized Gauss–Newton method in Banach spaces and propose some a posteriori stopping rules. We show that the method is well-defined and obtain a weak convergence result. In section 4 we derive the rates of convergence when the solution satisfies certain source conditions formulated as variational inequalities. In section 5 we prove a strong convergence result without assuming any source conditions when is a Hilbert spaces and is a -convex function, which is useful for sparsity reconstruction and discontinuity detection. Finally, in section 6 we present some numerical experiments to test our method for parameter identification in partial differential equations.

2 Preliminaries

Let be a Banach space with norm . We use to denote its dual space. Given and we write for the duality pairing. If is another Banach space and is a bounded linear operator, we use to denote its adjoint, i.e. for any and .

Let be a convex function. We use to denote its effective domain. We call proper if . Given we define

which is called the subgradient of at . It is clear that is convex and closed in for each . The multi-valued mapping is called the subdifferential of . It could happen that for some . We set

For and we define

which is called the Bregman distance induced by at in the direction . Clearly . By direct calculation we can see that

(2.1)

for all , , and .

A proper function is said to be -convex for some if there is a constant such that for all and there holds

It can be shown that is -convex if and only if there is a constant such that

(2.2)

for all , and .

For a proper, lower semi-continuous, convex function we can define its Fenchel conjugate

It is well known that is also proper, lower semi-continuous, and convex. If, in addition, is reflexive, then if and only if . When is -convex satisfying (2.2) with , it follows from (Z2002, , Corollary 3.5.11) that , is Fréchet differentiable and its gradient satisfies

(2.3)

Many examples of -convex functions can be provided by functions of the norms in -convex Banach spaces. We say a Banach space is -convex with if there is a positive constant such that for all , where

is the modulus of convexity of . According to a characterization of uniform convexity of Banach spaces in XR91 (), it is easy to see that, for any , the functional

is -convex and its subgradient at is given by , where denotes the duality mapping of with gauge function which is defined for each by

The sequence spaces , the Lebesgue spaces , the Sobolev spaces and the Besov spaces with are the most commonly used function spaces that are -convex (Adams (); C1990 ()).

Given a proper, lower semi-continuous, -convex function on , we can produce such new functions by adding any available proper, lower semi-continuous, convex functions to . In this way, we can construct non-smooth -convex functions that can be used to detect special features of solutions when solving inverse problems. For instance, let , where is a bounded domain in . It is clear that the functional

is -convex on . By adding the function to the multiple of the above function we can obtain the -convex function

with small which is useful for sparsity recovery (T96 ()). Similarly, we may produce on the -convex function

where denotes the total variation of over that is defined by (G84 ())

This functional is useful for detecting the discontinuities, in particular, when the solutions are piecewise-constant (ROF92 ()).

3 The method and its weak convergence

In this section we formulate the iteratively regularized Gauss–Newton method in the framework of Banach spaces to produce a stable approximate solution of (1.1) from an available noisy data satisfying (1.2). In order to capture the features of solutions, we take a proper, lower semi-continuous, -convex function with ; we assume that satisfies (2.2) and . We will work under the following conditions on the nonlinear operator .

Assumption 3.1
    1. is a closed convex set in and the equation (1.1) has a solution ;

    2. There is such that for each there is a bounded linear operator such that

      where ;

    3. The operator is properly scaled so that ;

    4. There exist two constants and such that

      for all and .

It is easy to see that condition (b) in Assumption 3.1 implies, for any , that the function is differentiable and

The condition (d) was first formulated in JT09 (). In section 6 we will present several examples from the parameter identification in partial differential equations to indicate that this condition indeed can be verified for a wide range of applications. As direct consequences of (b) and (d), we have for that

and

In order to formulate the method, let

be the characteristic function of and define

(3.1)

Since is closed and convex, is a proper, lower semi-continuous, convex function on . Consequently, is a proper, lower semi-continuous, -convex function on satisfying

(3.2)

We pick and define , where denotes the Fenchel conjugate of and is known to be Fréchet differentiable with gradient . We have and . Consequently

We use and as initial data. We then pick a sequence of positive numbers satisfying (1.3) and define successively by setting and letting be the unique minimizer of the convex minimization problem

(3.3)

By the properties of , is uniquely defined and .

Considering the practical applications, the iteration must be terminated by some a posteriori stopping rule to output an integer and hence which is used as an approximate solution of (1.1). In this paper we will consider the following three stopping rules.

Rule 3.1

Let be a given number. We define to be the integer such that

Rule 3.2

Let be a given number. If we define ; otherwise we define to be the first integer such that

Rule 3.3

Let be a given number. If we define ; otherwise we define to be the first integer such that

(3.4)

Rule 3.1 is known as the discrepancy principle and is widely used to terminate regularization methods. Rule 3.3 appeared first in K99 () to deal with some Newton-type regularization methods in Hilbert spaces. It is easy to see that Rule 3.1 terminates the iteration no later than Rule 3.2, and Rule 3.2 terminates the iteration no later than Rule 3.3. Most of the results in this paper are true for Rule 3.1 except the ones in Section 4 concerning the rates of convergence under certain source conditions formulated as variational inequalities; the convergence rates, however, can be derived when the iteration is terminated by either Rule 3.2 or Rule 3.3.

In this section we show that the method together with any one of the above three stopping rules with is well-defined. To this end, we introduce the integer defined by

(3.5)

where is the number conjugate to , i.e. , the number is chosen to satisfy

(3.6)

and is the unique element that realizes the distance from to the closed convex set in , i.e.

Because the sequence satisfies (1.3), the integer exists and is finite. We will show that for all and for the integer defined by any one of the above three stopping rules. For simplicity of presentation, we use the notation . We also use to denote a universal constant that is independent of and when its explicit formula is not important.

Lemma 1

Let and be Banach spaces, let be a proper, lower semi-continuous, -convex function with , let be a sequence satisfying (1.3), and let satisfy Assumption 3.1. If and is sufficiently small, then and

(3.7)
(3.8)

for all . Moreover, for the integer defined by either Rule 3.1, 3.2, or 3.3 with .

Proof

Since implies , from the definition of and (2.3) it follows that

Thus and (3.7) holds. In view of the scaling condition we can obtain

(3.9)

Therefore the result holds for . Now we assume that the estimates for have been proved for some and show that the estimates for are also true. By the minimizing property of we have

By using the identity (2.1) we have

Therefore, it follows from the above inequality that

(3.10)

In view of the Young’s inequality for , and , we have

Combining this with (3) and using the -convexity of , we can obtain

By using the fact that for and , we have from the above inequality that

(3.11)

and

(3.12)

By using and Assumption 3.1 we have

(3.13)

Since , it follows from (3.5) that

(3.14)

In view of the induction hypotheses we thus have

Combining this with (3.11) gives

Therefore, if is sufficiently small, then

Next we estimate . From (3) and (3.13) it follows that

(3.15)

Observing that

(3.16)

Thus, we may use Assumption 3.1, (3), and the estimates on and to derive that

(3.17)

Therefore, by using the induction hypothesis on , the fact , and (3.14), we can obtain for sufficiently small that

We therefore obtain the desired estimates (3.7) and (3.8).

Finally we show that . We first claim that for there holds

In fact, for this inequality follows from (1.2) and (3.9), and for it follows from (3.17), (3.8) and (1.3). Therefore, by using Assumption 3.1 and the estimates (3.7) and (3.8), we can obtain

(3.18)

If , then . Therefore

In view of (3.6) we have for sufficiently small that . Consequently .

In the following we assume that . Observing from (1.3) and (3.5) that for and there holds

Thus, from (3) we have for and that

Since is chosen to satisfy (3.6), we have for sufficiently small that

Therefore, by the definition of we have .

Remark 1

We use in (3.3) to guarantee that without assuming is an interior point of . If is an interior point of so that for a ball of radius , we can replace in (3.3) by and define to be the unique minimizer of the convex minimization problem