Convergence analysis in convex regularization

# Convergence analysis in convex regularization depending on the smoothness degree of the penalizer

Erdem Altuntac Institute for Numerical and Applied Mathematics, University of Göttingen, Lotzestr. 16-18, D-37083, Göttingen, Germany
###### Abstract

The problem of minimization of the least squares functional with a smooth, lower semi-continuous, convex penalizer is considered to be solved. Over some compact and convex subset of the Hilbert space the regularizer is implicitly defined as where So the cost functional associated with some given linear, compact and injective forward operator

 Fα(⋅,fδ):=12||T(⋅)−fδ||2H+αJ(⋅),

where is the given perturbed data with its perturbation amount in it. Convergence of the regularized optimum solution to the true solution is analysed depending on the smoothness degree of the penalizer, i.e. the cases in In both cases, we define such a regularization parameter that is in cooperation with the condition

 α(δ,fδ)∈{α>0 | ||Tφα(δ)−fδ||≤τδ},

for some fixed In the case of we are able to evaluate the discrepancy with the Hessian Lipschitz constant of the functional

Keywords. convex regularization, Bregman divergence, Hessian Lipschitz constant, discrepancy principle.

## 1 Introduction

In this work, over some compact and convex subset of the Hilbert space we consider solving formulate our main variational minimization problem,

 argminΩ⊂H{Fα(⋅,fδ):=12||T(⋅)−fδ||2H+αJ(⋅)}. (1.1)

Here, for is convex and is the regularization parameter. Following [10, 13, 18], we construct the parametrized solution for the problem (1.1) satisfying

1. For any there exists a solution to the problem (1.1);

2. For any there is no more than one

3. Convergence of the regularized solution to the true solution must depend on the given data, i.e.

 ||φα(δ)−φ†||H→0 as α(δ)→0 for δ→0

whilst

 ||f†−fδ||≤δ

where is the true measurement and is the noise level.

What is stated by ‘(iii)’ is that when the given measurement lies in some ball centered at the true measurement , then the expected solution must lie in the corresponding ball. It is also required that this solution must depend on the data Therefore, we are always tasked with finding an approximation of the unbounded inverse operator by a bounded linear operator

###### Definition 1.1 (Regularization operator).

[10, Definition 4.3],[20, Theorem 2.2] Let be some given linear injective operator. Then a family of bounded operators with the property of pointwise convergence

 limα→0RαTφ†=φ† (1.2)

is called a regularization scheme for the operator The parameter is called regularization paremeter.

As alternative to well established Tikhonov regularization, [21, 22], studying convex variational regularization with any penalizer has become important over the last decade. Introducing a new image denoising method named as total variation, [24], is commencement of this study. Application and analysis of the method have been widely carried out in the communities of inverse problems and optimization, [1, 2, 4, 7, 8, 9, 11, 12, 25]. Particularly, formulating the minimization problem as variational problem and estimating convergence rates with variational source conditions has also become popular recently, [6, 15, 16, 17, 20]. Different from available literature, we take into account one fact; for some given measurement with the noise level and forward operator the regularized solution to the problem (1.1) should satisfy for some fixed With this fact, we manage to obtain tight convergence rates for and we can carry out this analysis for a general smooth, convex penalty for the cases We will be able to quantify the tight convergence rates under the assumption that is defined over space for To be more specific, we will observe that rule for the choice of regularization paremeter must contain Lipschitz constant in addition to the noise level That is, when we will need class.

## 2 Notations and prerequisite knowledge

Let be the space of continuous functions on the compact domain Then, function space

 Ck(Ω):={φ∈C(Ω):∇(k)φ∈C(Ω)}.

Addition to traditional spaces, we will need to address for the purpose of convergence analysis. In general for an open set a mapping is said to be of class if it is of class and th partial derivatives are not just continuous but strictly continuous on [23, pp. 355]. Then, for a smooth and convex functional defined over there exists Lipschitz constant such that

 ||∇(k)J(φ)−∇(k)J(Ψ)||≤~L||φ−Ψ||. (2.1)

When by we denote well-known Lipschitz constant . When will be Hessian Lipschitz , [14].

Over some compact and convex domain variational minimization problem is formulated as such,

 argminφ∈H{Fα(⋅,fδ):=12||T(⋅)−fδ||2H+αJ(⋅)} (2.2)

with its penalty where and is the regularization parameter. Another dual minimization problem to (2.2) is given by

 J(⋅)→minH, subject to ||T(⋅)−fδ||≤δ. (2.3)

In the Hilbert scales, it is known that the solution of the penalized minimizatin problem (2.2) equals to the solution of the constrained minimization problem (2.3), [6, Subsection 3.1]. The regularized solution of the problem (2.2) satisfies the following first order optimality conditions,

 0 =∇Fα(φα(δ)) (2.4) 0 =T∗(Tφα(δ)−fδ)+α(δ)∇J(φα(δ)) T∗(fδ−Tφα(δ)) =α(δ)∇J(φα(δ)).

In this work, the radii of the ball are estimated, by means of the Bregman divergence, with potential The choice of regularization parameter in this work does not require any a priori knowledge about the true solution. We always work with perturbed data and introduce the rates according to the perturbation amount

### 2.1 Bregman divergence

We will be able to quantify the rate of the convergence of by means of different formulations of the Bregman divergence. Following formulation emphasizes the functionality of the Bregman divergence in proving the norm convergence of the minimizer of the convex minimization problem to the true solution.

###### Definition 2.1 (Total convexity and Bregman divergence).

[5, Def.1]

Let be a smooth and convex functional. Then is called totally convex in if, for and it holds that

 DΦ(u,u∗)=Φ(u)−Φ(u∗)−⟨∇Φ(u∗),u−u∗⟩→0⇒||u−u∗||H→0

where represents the Bregman divergence.

It is said that is q-convex in with a if for all there exists a such that for all we have

 DΦ(u,u∗)=Φ(u)−Φ(u∗)−⟨∇Φ(u∗),u−u∗⟩≥c∗||u−u∗||qH. (2.5)

Throughout our norm convergence estimations, we refer to this definition for the case of convexity. We will also study different formulations of the Bregman divergence. We introduce these different formulations below.

###### Remark 2.2 (Different formulations of the Bregman divergence).

Let defined on respectively be the regularized and the true solutions of the problem (2.2). Then we give the following definitions of the Bregman divergence;

• Bregman distance associated with the cost functional

 DF(φα(δ),φ†)=F(φα(δ))−F(φ†)−⟨∇F(φ†),φα(δ)−φ†⟩, (2.6)
• Bregman distance associated with the penalty

 DJ(φα(δ),φ†)=J(φα(δ))−J(φ†)−⟨∇J(φ†),φα(δ)−φ†⟩ (2.7)
• Bregman distance associated with the misfit term

 DGδ(φα(δ),φ†)=12||Tφα(δ)−fδ||2−12||Tφ†−fδ||2−⟨∇Gδ(φ†,fδ),φα(δ)−φ†⟩ (2.8)

Reader may also refer to Appendix A for further properties of the Bregman divergence. In fact, another similar estimation to (2.5), for can also be derived by making further assumption about the functional one of which is strong convexity with modulus [3, Definition 10.5]. Below is this alternative way of obtaining (2.5) when

###### Proposition 2.3.

Let be is strongly convex with modulus of convexity i.e. then

 DΦ(u,v)>c||u−v||2+O(||u−v||2). (2.9)
###### Proof.

Let us begin with considering the Taylor expansion of

 Φ(u)=Φ(v)+⟨∇Φ(v),u−v⟩+12⟨∇2Φ(v)(u−v),u−v⟩+O(||u−v||2). (2.10)

Then the Bregman divergence

 DΦ(u,v) = Φ(u)−Φ(v)−⟨∇Φ(v),u−v⟩ = ⟨∇Φ(v),u−v⟩+12⟨∇2Φ(v)(u−v),u−v⟩+O(||u−v||2)−⟨∇Φ(v),u−v⟩ = 12⟨∇2Φ(v)(u−v),u−v⟩+O(||u−v||2).

Since is striclty convex, due to strong convexity and hence one obtains that

 DΦ(u,v)>c||u−v||2+O(||u−v||2), (2.11)

where is the modulus of convexity.

Above, in (2.8), we have set In this case, one must assume even more than stated about the existence of the modulus of convexity These assumptions can be formulated in the following way. Suppose that there exists some measurement lying in the ball for all small enough such that the followings hold,

 00. (2.13)

Then is convex and according to Proposition 2.3,

 DGδ(φα(δ),φ†)>cfδ||φα(δ)−φ†||2+O(||φα(δ)−φ†||2), (2.14)

Addition to the traditional definition of Bregman divergence in (2.5), symmetrical Bregman divergence is also given below, [16, Definition 2.1],

 DsymΦ(u,u∗):=DΦ(u,u∗)+DΦ(u∗,u). (2.15)

With symmetrical Bregman divergence having formulated, following from the Definition 2.1, we give the last proposition for this chapter.

###### Proposition 2.4.

[16, as appears in the proof of Theorem 4.4] Let be a smooth and q-convex functional. Then there exist positive constants such that for all we have

 DsymΦ(u,u∗) = ⟨∇Φ(u∗)−∇Φ(~u),u−u∗⟩ (2.16) ≥ (c∗+c)||u−u∗||2H.
###### Proof.

Proof is a straightforward result of the estimation in (2.5) and the symmetrical Bregman divergence definition given by (2.15). ∎

### 2.2 Appropriate regularization parameter with discrepancy principle

A regularization parameter is admissible for when

 ||Tφα−fδ||≤τδ (2.17)

for some fixed We seek a rule for chosing as a function of such that (2.17) is satisfied and

 α(δ)→0, as δ→0.

Folllowing [13, Eq. (4.57) and (4.58)], [19, Definition 2.3], in order to obtain tight rates of convergence of we define such that

 α(δ,fδ)∈{α>0 | ||Tφα−fδ||≤τδ, for all given (δ,fδ)}. (2.18)

The strong relation between the discrepancy and the norm convergence of can be formulated in the following lemma.

###### Lemma 2.5.

Let be a linear and compact operator. Denote by the regularized solution and by the true solution to the problem (2.2). Then

 ||Tφα(δ)−fδ||≤δ+||φα(δ)−φ†||||T∗||, (2.19)

where the noisy data to the true data both satisfy for sufficiently small amount of noise

###### Proof.

Desired result follows from the following straightforward calculations,

 ||Tφα(δ)−fδ||2 = = ⟨Tφα(δ)−f†+f†−fδ,Tφα(δ)−fδ⟩ = ⟨Tφα(δ)−f†,Tφα(δ)−fδ⟩+⟨f†−fδ,Tφα(δ)−fδ⟩ = ⟨T(φα(δ)−φ†),Tφα(δ)−fδ⟩+⟨f†−fδ,Tφα(δ)−fδ⟩ = ⟨φα(δ)−φ†,T∗(Tφα(δ)−fδ)⟩+⟨f†−fδ,Tφα(δ)−fδ⟩ ≤ ||φα(δ)−φ†||||T∗||||Tφα(δ)−fδ||+δ||Tφα(δ)−fδ||.

## 3 Monotonicity of the gradient of convex functionals

If the positive real valued convex functional is in the class of then for all defined on

 P(Ψ)≥P(φ)+⟨∇P(φ),Ψ−φ⟩. (3.1)

What this inequality basically means is that at each the tangent line of the functional lies below the functional itself. The same is also true from subdifferentiability point of view. Following from (3.1), one can also write that

 P(φ)−P(Ψ)≤⟨∇P(φ),φ−Ψ⟩. (3.2)

Still from (3.1), by replacing with one obtains

 P(φ)≥P(Ψ)+⟨∇P(Ψ),φ−Ψ⟩, (3.3)

or equivalently

 P(φ)−P(Ψ)≥⟨∇P(Ψ),φ−Ψ⟩. (3.4)

Combining (3.2) and (3.4) brings us,

 ⟨∇P(Ψ),φ−Ψ⟩≤P(φ)−P(Ψ)≤⟨∇P(φ),φ−Ψ⟩. (3.5)

Eventually this implies

 0≤⟨∇P(φ)−∇P(Ψ),φ−Ψ⟩ (3.6)

which is the monotonicity of the gradient of convex functionals, [3, Proposition 17.10].

Initially, owing to the relation in (3.5), it can easily be shown the weak convergence of the regularized solution to the true solution , with the choice of regularization parameter

###### Theorem 3.1 (Weak convergence of the regularized solution).

In the same conditions of Lemma 2.5, if the regularized minimum to the problem (2.2) exists and then

 φα(δ)⇀φ†, as α(δ)=δp→0 for any p∈(0,2). (3.7)
###### Proof.

Since is the minimizer of the cost functional then

 F(φα(δ),fδ)=12||Tφα(δ)−fδ||2L2+αJ(φα(δ))≤12||Tφ†−fδ||2L2+αJ(φ†)=F(φ†,fδ),

which is in other words,

 α(J(φα(δ))−J(φ†))≤12||Tφ†−fδ||2L2−12||Tφα(δ)−fδ||2L2. (3.8)

From the convexity of the penalization term a lower boundary has been already found in (3.5). Then following from (3.5), the last inequality implies,

 α⟨∇J(φ†),φα(δ)−φ†⟩≤α(J(φα(δ))−J(φ†))≤12δ2, (3.9)

since With the choice of for any desired result is obtained

 ⟨∇J(φ†),φα(δ)−φ†⟩≤12δp−2. (3.10)

###### Remark 3.2.

Note that the result of the theorem is true for any smooth and convex penalty in the problem (2.2).

## 4 Convergence Results for ||φα(δ)−φ†||

We now come to the point where we analyse each cases when for In each case, we will consider the discrepancy principle for the choice of regularization parameter while providing the norm convergence.

### 4.1 When the penalty J(⋅) is defined over C1(Ω,H)

First part of the following formulation has been studied in [6, Theorem 5.]. There, the authors obtain some convergence in terms of a Lagrange multiplier instead of a regularization parameter According to theoretical set up given by the authors, their convergence rate explicitly contain Lagrange multiplier defined as Second part, on the other hand, has been motivated by [16, Theorem 4.4]. All convergence results are obtained under the assumption that the penalizer is convex according to (2.5).

###### Theorem 4.1 (Upper bound for the Bregman divergence associated with the penalty).

Let be the smooth and convex penalization term of the cost functional given in the problem (2.2), and denote by the regularizd solution of the same problem satisfying where as in (2.2). Then, the choice of regularization parameter yields,

 DJ(φα(δ),φ†)≤√δ||φα(δ)−φ†||, (4.1)

and

 DsymJ(φα(δ),φ†)≤√δ||φα(δ)−φ†||, (4.2)

both of which imply,

 ||φα(δ)−φ†||≤√δ. (4.3)
###### Proof.

First recall the formulation for the Bregman divergence associated with the penalty in (2.7). Convexity of the penalizer brings the following estimation by the second part of (3.5),

 J(φα(δ))−J(φ†)≤⟨∇J(φα(δ)),φα(δ)−φ†⟩

Then in fact (2.7) can be bounded by,

 DJ(φα(δ),φ†) ≤ ⟨∇J(φα(δ))−∇J(φ†),φα(δ)−φ†⟩ = 1α(δ)⟨T∗(fδ−Tφα(δ))−T∗(fδ−Tφ†),φα(δ)−φ†⟩,

due to the first order optimality conditions in (2.4), i.e. The inner product can also be written in the composite form,

 DJ(φα(δ),φ†)≤1α(δ)⟨T∗(fδ−Tφα(δ)),φα(δ)−φ†⟩−1α(δ)⟨T∗(fδ−f†),φα(δ)−φ†⟩,

where the true solution satisfies Taking absolute value of the right hand side with Cauch-Schwarz inequality and recalling that by (2.18) brings

 DJ(φα(δ),φ†) ≤ τδα(δ)||T∗||||φα(δ)−φ†||+δα(δ)||T∗||||φα(δ)−φ†|| (4.4) = (τ+1)δα(δ)||T∗||||φα(δ)−φ†||.

As for the upper bound for we adapt (2.15) in the following way

 DsymJ(φα(δ),φ†) = DJ(φα(δ),φ†)+DJ(φ†,φα(δ)) = ⟨∇J(φ†)−∇J(φα(δ)),φα(δ)−φ†⟩.

Again by the first order optimality conditions in (2.4), then

 DsymJ(φα(δ),φ†)=1α(δ)⟨T∗(fδ−f†)−T∗(fδ−Tφα(δ)),φα(δ)−φ†⟩

We split this inner product over the term together with the absolute value of each part as such,

 DsymJ(φα(δ),φ†) ≤ 1α(δ){|⟨T∗(fδ−f†),φα(δ)−φ†⟩|+1α(δ)|⟨T∗(fδ−Tφα(δ)),φα(δ)−φ†⟩|} ≤ 1α(δ){δ||T∗||||φα(δ)−φ†||+||T∗||||Tφα(δ)−fδ||||φα(δ)−φ†||},

which is the consequence of Cauchy-Schwarz. Now again by the condition in (2.18)

 DsymJ(φα(δ),φ†)≤1α(δ){δ||T∗||||φα(δ)−φ†||+τδ||T∗||||φα(δ)−φ†||}. (4.5)

Considering the defined regularization parameter, both in (4.4) and in (4.5) yields the desired upper bounds for and respectively. Since is convex, then the norm convergence of is obtained due to (2.5). ∎

In fact those rates also imply another faster convergence rate when the regularization parameter is defined as . To observe this, different formulation of the Bregman divergence is necessary. In the Definition 2.1, take to formulate the following. However, we need to recall the assumptions about the convexity of in (2.12) and (2.13).

###### Theorem 4.2.

Let be a compact forward operator in the problem (2.2) and assume that the conditions in (2.12) and (2.13) are satisfied. We formulate a Bregman divergence associated with the misfit term

 DG(φα(δ),φ†)=12||Tφα(δ)−fδ||2−12||Tφ†−fδ||2−⟨∇Gδ(φ†,fδ),φα(δ)−φ†⟩. (4.6)

If is the regularized minima for the problem (2.2), then with the choice of regularization parameter for sufficiently small

 DGδ(φα(δ),φ†)≤O(δ3/2) (4.7)

As expected, this rate also implies the following

 ||φα(δ)−φ†||≤O(δ3/4). (4.8)
###### Proof.

As given by (2.18), Additionally the noisy measurement to the true measurement satisfies In the Theorem 4.1 above, we have estimated a pair of convergence rates with the same regularization parameter So for defined by (4.6) will provide the result below;

 DGδ(φα(δ),φ†) ≤ 12(τδ)2+12δ2−⟨T∗(f†−fδ),φα(δ)−φ†⟩ = 12(τδ)2+12δ2−⟨f†−fδ,T∗(φα(δ)−φ†)⟩ ≤ 12δ2(τ2+1)+δ||T∗||||φα(δ)−φ†||

As has been estimated in the Theorem 4.1 when Hence,

 DGδ(φα(δ),φ†) ≤ 12δ2(τ2+1)+δ3/2||T∗|| (4.9) ≤ δ3/2(12(τ2+1)+||T∗||).

Now, since is convex (see Def. 2.1), by (2.5) and by the assumptions (2.12) and (2.13), we have,

 ||φα(δ)−φ†||≤δ3/4cfδ(12(τ2+1)+||T∗||)1/2. (4.10)

### 4.2 When the penalty J(⋅) is defined over C2+(Ω,H)

Surely the convergence rates above are still preserved when the penalty is defined over since However, one may be interested in discrepancy principle in this more specific case. Above, we have formulated those convergence rates under the assumption We will now analyse the convergence with assuming Here we will define regularization parameter also as a function of Hessian Lipschitz constant , [14]. We begin with estimating the dicrepancy

###### Theorem 4.3.

Let be the smooth and convex cost functional as defined in the problem (2.2). If the penalty is strongly convex, then

 ||Tφα(δ)−fδ||≤δ(1+1LH||T∗||2)1/2+√~O(||φα(δ)−φ†||2)

where is the Hessian Lipschitz constant of the functional

###### Proof.

Let us consider the following second order Taylor expansion,

 Fα(φα(δ))=Fα(φ†) + ⟨φα(δ)−φ†,∇Fα(φ†)⟩+ + 12⟨φα(δ)−φ†,∇2Fα(φ†)(φα(δ)−φ†)⟩+O(||φα(δ)−φ†||2)

Obviously, this Taylor expansion is bounded by

 Fα(φα(δ))≤Fα(φ†)+⟨φα(δ)−φ