On the iteratively regularized Gauss–Newton method in Banach spaces with applications to parameter identification problems
Abstract
In this paper we propose an extension of the iteratively regularized Gauss–Newton method to the Banach space setting by defining the iterates via convex optimization problems. We consider some a posteriori stopping rules to terminate the iteration and present the detailed convergence analysis. The remarkable point is that in each convex optimization problem we allow nonsmooth penalty terms including and total variation (TV) like penalty functionals. This enables us to reconstruct special features of solutions such as sparsity and discontinuities in practical applications. Some numerical experiments on parameter identification in partial differential equations are reported to test the performance of our method.
Msc:
65J15 65J20 47H17∎
1 Introduction
Inverse problems arise from many practical applications whenever one searches for unknown causes based on observation of their effects. A characteristic property of inverse problems is their illposedness in the sense that their solutions do not depend continuously on the data. Due to errors in the measurements, in practical applications one never has the exact data; instead only noisy data are available. Therefore, how to use the noisy data to produce a stable approximate solution is an important topic.
We are interested in solving nonlinear inverse problems in Banach spaces which can be formulated as the nonlinear operator equation
(1.1) 
where is a nonlinear operator between two Banach spaces and with domain . We will use the same notation to denote the norms of and which should be clear from the context. Let be the only available approximate data to satisfying
(1.2) 
with a given small noise level . Due to the illposedness, regularization methods should be employed to produce from a stable approximate solution.
When both and are Hilbert spaces and is Fréchet differentiable, a lot of regularization methods have been developed during the last two decades, see BK04 (); Jin2011 (); Jin2012 (); JT09 (); KNS2008 () and the references therein. The iteratively regularized Gauss–Newton method is one of the well known methods and it takes the form (B92 ())
where denotes the Fréchet derivative of at , denotes the adjoint of , is an initial guess, and is a sequence of positive numbers satisfying
(1.3) 
for some constant . When terminated by the discrepancy principle, the regularization property of the iteratively regularized Gauss–Newton method has been studied extensively, see JT09 (); KNS2008 () and references therein. It is worthwhile to point out that is the unique minimizer of the quadratic functional
(1.4) 
Regularization methods in Hilbert spaces can produce good results when the sought solution is smooth. However, because such methods have a tendency to oversmooth solutions, they may not produce good results in applications where the sought solution has special features such as sparsity or discontinuities. In order to capture the special features, the methods in Hilbert spaces must be modified by incorporating the information of some adapted penalty functionals such as the and the total variation (TV) like functionals, for which the theories in Hilbert space setting are no longer applicable. On the other hand, due to their intrinsic features, many inverse problems are more natural to formulate in Banach spaces than in Hilbert spaces. Therefore, it is necessary to develop regularization methods to solve inverse problems in the framework of Banach spaces with general penalty function.
In this paper we will extend the iteratively regularized Gauss–Newton method to the Banach space setting. Motivated by the variational formulation (1.4) in Hilbert spaces, it is natural to use convex optimization problems to define the iterates. To this end, we take a proper, lower semicontinuous, convex function whose subdifferential is denoted as . By picking an initial guess and , we define
(1.5) 
where , , and denotes the Bregman distance induced by at in the direction . When and , this method has been considered in KH2010 () under essentially the nonlinearity condition
(1.6) 
with the iteration terminated by an a priori stopping rule. It turns out that (1.6) is difficult to verify for nonlinear inverse problems, and the restriction of to the special choice may prevent the method from capturing the special features of solutions. Moreover, since a priori stopping rules depend crucially on the unknown source conditions, it is useless in practical applications. In this paper we will develop a convergence theory on the iteratively regularized Gauss–Newton method in Banach spaces with general convex penalty function . We will propose some a posteriori stopping rules, including the discrepancy principle, to terminate the method and give detailed convergence analysis under reasonable nonlinearity conditions.
This paper is organized as follows. In section 2 we give some preliminary facts on convex analysis. In section 3 we then formulate the iteratively regularized Gauss–Newton method in Banach spaces and propose some a posteriori stopping rules. We show that the method is welldefined and obtain a weak convergence result. In section 4 we derive the rates of convergence when the solution satisfies certain source conditions formulated as variational inequalities. In section 5 we prove a strong convergence result without assuming any source conditions when is a Hilbert spaces and is a convex function, which is useful for sparsity reconstruction and discontinuity detection. Finally, in section 6 we present some numerical experiments to test our method for parameter identification in partial differential equations.
2 Preliminaries
Let be a Banach space with norm . We use to denote its dual space. Given and we write for the duality pairing. If is another Banach space and is a bounded linear operator, we use to denote its adjoint, i.e. for any and .
Let be a convex function. We use to denote its effective domain. We call proper if . Given we define
which is called the subgradient of at . It is clear that is convex and closed in for each . The multivalued mapping is called the subdifferential of . It could happen that for some . We set
For and we define
which is called the Bregman distance induced by at in the direction . Clearly . By direct calculation we can see that
(2.1) 
for all , , and .
A proper function is said to be convex for some if there is a constant such that for all and there holds
It can be shown that is convex if and only if there is a constant such that
(2.2) 
for all , and .
For a proper, lower semicontinuous, convex function we can define its Fenchel conjugate
It is well known that is also proper, lower semicontinuous, and convex. If, in addition, is reflexive, then if and only if . When is convex satisfying (2.2) with , it follows from (Z2002, , Corollary 3.5.11) that , is Fréchet differentiable and its gradient satisfies
(2.3) 
Many examples of convex functions can be provided by functions of the norms in convex Banach spaces. We say a Banach space is convex with if there is a positive constant such that for all , where
is the modulus of convexity of . According to a characterization of uniform convexity of Banach spaces in XR91 (), it is easy to see that, for any , the functional
is convex and its subgradient at is given by , where denotes the duality mapping of with gauge function which is defined for each by
The sequence spaces , the Lebesgue spaces , the Sobolev spaces and the Besov spaces with are the most commonly used function spaces that are convex (Adams (); C1990 ()).
Given a proper, lower semicontinuous, convex function on , we can produce such new functions by adding any available proper, lower semicontinuous, convex functions to . In this way, we can construct nonsmooth convex functions that can be used to detect special features of solutions when solving inverse problems. For instance, let , where is a bounded domain in . It is clear that the functional
is convex on . By adding the function to the multiple of the above function we can obtain the convex function
with small which is useful for sparsity recovery (T96 ()). Similarly, we may produce on the convex function
where denotes the total variation of over that is defined by (G84 ())
This functional is useful for detecting the discontinuities, in particular, when the solutions are piecewiseconstant (ROF92 ()).
3 The method and its weak convergence
In this section we formulate the iteratively regularized Gauss–Newton method in the framework of Banach spaces to produce a stable approximate solution of (1.1) from an available noisy data satisfying (1.2). In order to capture the features of solutions, we take a proper, lower semicontinuous, convex function with ; we assume that satisfies (2.2) and . We will work under the following conditions on the nonlinear operator .
Assumption 3.1


is a closed convex set in and the equation (1.1) has a solution ;

There is such that for each there is a bounded linear operator such that
where ;

The operator is properly scaled so that ;

There exist two constants and such that
for all and .

It is easy to see that condition (b) in Assumption 3.1 implies, for any , that the function is differentiable and
The condition (d) was first formulated in JT09 (). In section 6 we will present several examples from the parameter identification in partial differential equations to indicate that this condition indeed can be verified for a wide range of applications. As direct consequences of (b) and (d), we have for that
and
In order to formulate the method, let
be the characteristic function of and define
(3.1) 
Since is closed and convex, is a proper, lower semicontinuous, convex function on . Consequently, is a proper, lower semicontinuous, convex function on satisfying
(3.2) 
We pick and define , where denotes the Fenchel conjugate of and is known to be Fréchet differentiable with gradient . We have and . Consequently
We use and as initial data. We then pick a sequence of positive numbers satisfying (1.3) and define successively by setting and letting be the unique minimizer of the convex minimization problem
(3.3) 
By the properties of , is uniquely defined and .
Considering the practical applications, the iteration must be terminated by some a posteriori stopping rule to output an integer and hence which is used as an approximate solution of (1.1). In this paper we will consider the following three stopping rules.
Rule 3.1
Let be a given number. We define to be the integer such that
Rule 3.2
Let be a given number. If we define ; otherwise we define to be the first integer such that
Rule 3.3
Let be a given number. If we define ; otherwise we define to be the first integer such that
(3.4) 
Rule 3.1 is known as the discrepancy principle and is widely used to terminate regularization methods. Rule 3.3 appeared first in K99 () to deal with some Newtontype regularization methods in Hilbert spaces. It is easy to see that Rule 3.1 terminates the iteration no later than Rule 3.2, and Rule 3.2 terminates the iteration no later than Rule 3.3. Most of the results in this paper are true for Rule 3.1 except the ones in Section 4 concerning the rates of convergence under certain source conditions formulated as variational inequalities; the convergence rates, however, can be derived when the iteration is terminated by either Rule 3.2 or Rule 3.3.
In this section we show that the method together with any one of the above three stopping rules with is welldefined. To this end, we introduce the integer defined by
(3.5) 
where is the number conjugate to , i.e. , the number is chosen to satisfy
(3.6) 
and is the unique element that realizes the distance from to the closed convex set in , i.e.
Because the sequence satisfies (1.3), the integer exists and is finite. We will show that for all and for the integer defined by any one of the above three stopping rules. For simplicity of presentation, we use the notation . We also use to denote a universal constant that is independent of and when its explicit formula is not important.
Lemma 1
Proof
Since implies , from the definition of and (2.3) it follows that
Thus and (3.7) holds. In view of the scaling condition we can obtain
(3.9) 
Therefore the result holds for . Now we assume that the estimates for have been proved for some and show that the estimates for are also true. By the minimizing property of we have
By using the identity (2.1) we have
Therefore, it follows from the above inequality that
(3.10) 
In view of the Young’s inequality for , and , we have
Combining this with (3) and using the convexity of , we can obtain
By using the fact that for and , we have from the above inequality that
(3.11) 
and
(3.12) 
By using and Assumption 3.1 we have
(3.13) 
Since , it follows from (3.5) that
(3.14) 
In view of the induction hypotheses we thus have
Combining this with (3.11) gives
Therefore, if is sufficiently small, then
Next we estimate . From (3) and (3.13) it follows that
(3.15) 
Observing that
(3.16) 
Thus, we may use Assumption 3.1, (3), and the estimates on and to derive that
(3.17) 
Therefore, by using the induction hypothesis on , the fact , and (3.14), we can obtain for sufficiently small that
Finally we show that . We first claim that for there holds
In fact, for this inequality follows from (1.2) and (3.9), and for it follows from (3.17), (3.8) and (1.3). Therefore, by using Assumption 3.1 and the estimates (3.7) and (3.8), we can obtain
(3.18) 
If , then . Therefore
In view of (3.6) we have for sufficiently small that . Consequently .