Image denoising: learning noise distribution via PDE-constrained optimization

# Image denoising: learning noise distribution via PDE-constrained optimization

Juan Carlos De los Reyes111Departamento de Matemática, Escuela Politécnica Nacional de Quito, Ecuador (juan.delosreyes@epn.edu.ec)    Carola-Bibiane Schönlieb222Department of Applied Mathematics and Theoretical Physics (DAMTP), University of Cambridge (C.B.Schoenlieb@damtp.cam.ac.uk)
###### Abstract

We propose a PDE-constrained optimization approach for the determination of noise distribution in total variation (TV) image denoising. An optimization problem for the determination of the weights correspondent to different types of noise distributions is stated and existence of an optimal solution is proved. A tailored regularization approach for the approximation of the optimal parameter values is proposed thereafter and its consistency studied. Additionally, the differentiability of the solution operator is proved and an optimality system characterizing the optimal solutions of each regularized problem is derived. The optimal parameter values are numerically computed by using a quasi-Newton method, together with semismooth Newton type algorithms for the solution of the TV-subproblems.

I
33footnotetext: Research partially supported by the Alexander von Humboldt Foundation. Moreover, CBS acknowledges the financial support provided by the Cambridge Centre for Analysis (CCA) and the Royal Society International Exchanges Award IE110314 for the project High-order Compressed Sensing for Medical Imaging. Further, this publication is based on work supported by Award No. KUK-I1-007-43 , made by King Abdullah University of Science and Technology (KAUST).footnotetext: Date: 14. July 2012

mage denoising, noise distribution, PDE-constrained optimization, Huber regularization.

## 1 Introduction

Let , or with , be a given noisy image. Depending on the application at hand the type of noise, i.e., the noise distribution, changes [5]. Examples for noise distributions are Gaussian noise, which typically appears in, e.g. MRI (Magnetic Resonance Tomography), Poisson noise in, e.g. radar measurements or PET (Positron Emission Tomography), and impulse noise usually due to transmission errors or malfunctioning pixel elements in camera sensors. To remove the noise a total variation (TV) regularization is frequently considered [3, 11, 12, 13, 20, 37] that amounts to reconstruct a denoised version of as a minimiser of the generic functional

 J(u)=|Du|(Ω)+λϕ(u,f), (1)

with

 |Du|(Ω)=supg∈C∞0(Ω;R2),∥g∥∞≤1∫Ωu ∇⋅g dx (2)

the total variation of in , a positive parameter and a suitable distance function called the data fidelity term. The latter depends on the statistics of the data , which can be either estimated or approximated by a noise model known from the physics behind the acquisition of . For normally distributed , i.e. the interferences in are Gaussian noise, this distance function is the squared norm of . If a Poisson noise distribution is present, , which corresponds to the Kullback-Leibler distance between and [29, 33]. In the presence of impulse noise, the correct data fidelity term turns out to be the norm of [32, 21]. Other noise models have been considered as well, cf. e.g. [2]. The size of the parameter depends on the strength of the noise, i.e. it models the trade-off between regularisation and fidelity to the measured datum .

A key issue in total variation denoising is an adequate choice of the correct noise model, i.e. the choice of , and of the size of the parameter . Depending on this choice, different results are obtained. The term is usually modelled from the physics behind the acquisition process. Several strategies, both heuristic and statistically grounded, have been considered for choosing the weight , cf. e.g. [11, 22, 23, 24, 25, 35]. In this paper we propose an optimal control strategy for choosing both and . To do so we extend model (1) to a more general model, that allows for mixed noise distributions in the data. Namely, instead of (1) we consider

 minu(|Du|(Ω)+d∑i=1∫Ωλi ϕi(u,f)dx). (3)

where , are convex differentiable functions in , and are positive parameters. The functions model different choices of data fidelities. In the case of mixed Gaussian and impulse noise , and . The parameters weight the different noise models and the regularising term against each other. As such, the choice of these parameters depends on the amount and strength of noise of different distributions in . Typically, the are chosen to be real parameters. However, in some applications, it may be more favourable to choose them to be spatially dependent functions , cf. e.g. [1, 4, 23, 25, 35].

We propose a PDE-constrained optimization approach to determine the weights of the noise distribution and, in that manner, learn the noise distribution present in the measured datum for both and mixed noise models . To do so, we treat (3) as a constraint and state an optimization problem governed by (3) for the optimal determination of weights. When possible, we replace the optimization problem by a necessary and sufficient optimality condition (in form of a variational inequality (VI)) as a constraint.

Schematically, we proceed in the following way:

1. We consider a training set of pairs , . Here, ’s are noisy images, which have been measured with a fixed device with fixed settings, and the images represent the ground truth or images that approximate the ground truth within a desirable tolerance.

2. We determine the optimal choice of functions by solving the following problem for

 minλi≥0, i=1,...,d ∥~u−uk∥2L2(Ω)+βd∑i=1∥λi∥2X, (4)

where solves the minimization problem (3) for a given , corresponds to in the case of scalar parameters or to, e.g., in the case of distributed functions, and is a given weight.

The reasonability of assuming to have a such a training set is motivated by certain applications, where the accuracy and as such the noise level in the measurements can be tuned to a certain extent. In MRI or PET, for example, the accuracy of the measurements depends on the setup of the experiment, e.g., the acquisition time. Hence, such a training set can be provided by a series of measurements using phantoms. Then, the ’s are measured with the maximal accuracy practically possible and the ’s are measured within a usual clinical setup. For instance, dictionary based image reconstruction methods are already used in the medical imaging community. There, good quality measurements or template shapes are used as priors for reconstructing , cf. e.g. [36], or for image segmentation, cf. e.g. [34] and references therein.

Up to our knowledge this paper is the first one to approach the estimation of the noise distribution as an optimal control problem. By incorporating more than one into the model (3) our approach automatically chooses the correct one(s) through an optimal choice of the weights in terms of (4).

#### Organisation of the paper:

We continue with the analysis of the optimization problem (8)–(8b) in Section 2. After proving existence of an optimal solution and convergence of the Huber-regularized minimisers to a minimiser of the original total variation problem, the optimization problem is transferred to a Hilbert space setting where the rest of our analysis takes place in Section 3. This further smoothing of the regularizer turns out to be necessary in order to prove continuity of the solution map in a strong enough topology and to verify convergence of our procedure. Moreover, differentiability of the regularized solution operator is thereafter proved, which leads to a first order optimality system characterization of the regularized minimisers. The paper ends with three detailed numerical experiments where the suitability of our approach is computationally verified.

## 2 Optimization problem in BV(Ω)

We are interested in the solution of the following bilevel optimization problem

 minλi≥0, i=1,...,d g(~u)+βd∑i=1∥λi∥2X (5a) subject to (5b)

where the space corresponds to in the case of scalar parameters or to a function space such that (where stands for continuous injection) in the case of distributed functions, is a functional to be minimised and . The admissible set of functions is chosen according to the data fidelities . In particular, restricts the set of functions on to those for which the ’s are well defined, cf. examples below. Moreover, we assume that the functions are differentiable and convex in , are bounded from below, and fulfil the following coercivity assumption

 ∫Ωϕi(u,f) dx≥C1∥u∥pLp−C2,∀u∈Lp(Ω)∩A (6)

for nonnegative constants and at least one or . Examples of ’s that fulfill these assumptions and that are considered in the paper are

• The Gaussian noise model, where fulfills the coercivity constraint for and the admissible set .

• The Poisson noise model, where and . This is convex and differentiable and fulfils the coercivity condition for . More precisely, we have for

 ∫Ω(u−flogu) dx≥∥u∥L1(Ω)−∥f∥L∞(Ω)⋅log∥u∥L1(Ω),

where we have used Jensen’s inequality, i.e., for

 log(∫Ωu dx)≥∫Ωlogu dx.
• The impulse noise model, where fulfills the coercivity constraint for .

For the numerical solution of (3) we want to use derivative-based iterative methods. To do so, the gradient of the total variation denoising model has to be uniquely defined. That is, a minimiser of (3) is uniquely characterised by the solution of the corresponding Euler-Lagrange equation. Since the total variation regulariser is not differentiable but its ”derivative” can be only characterised by a set of subgradients (the subdifferential), we (from now on) shall use a regularised version of the total variation. More precisely, we consider for the Huber-type regularisation of the total variation with

 |∇u|γ={|∇u|−12γif |∇u|≥1γ|∇u|2γ2if |∇u|<1γ (7)

and the following regularised version of (5)-(5b)

 minλi≥0, i=1,...,d g(~u)+βd∑i=1∥λi∥2X (8a) subject to ~u=argminu∈W1,1∩A{Jγ(u)=∫Ω|∇u|γ dx+d∑i=1∫Ωλiϕi(u,f) dx}, (8b)

where the space , , ’s and are defined as before. The admissible set of functions is assumed to be convex and closed subset of and is chosen according to the data fidelities , cf. examples above. The existence of an optimal solution for (8b) is proven by the method of relaxation. To do so we extend the definition of to as

 Jγext(u)={Jγ(u)u∈W1,1(Ω)∩A+∞u∈BV(Ω)∖(W1,1∩A)

and prove the existence of a minimiser for the lower-semicontinuous envelope of as follows. We have the following existence result. {theorem} Let and fixed. Then there exists a unique solution of the minimisation problem

 minu∈BV(Ω)∩AJγrelax(u),

where

 Jγrelax(u)=∫Ω|∇u|γ dx+C∫Ω|Dsu|+d∑i=1∫Ωλiϕi(u,f) dx. (9)

is the relaxed functional of on .

###### Remark 2.1

Note that

 Jγrelax(u)≤Jγext(u),u∈BV(Ω)

and for . Moreover, the relaxation result from Theorem 2 means that

 Jγrelax(u)=inf{liminfnJγext(un):un∈BV(Ω),un→u in BV−w∗},

i.e. is the greatest lower semicontinuous functional less than or equal to .

{proof}

Let be a minimising sequence for . We start by stating the fact that is coercive and at most linear. That is

 For |x|≥1γ:A|x|−B≤|x|γ=|x|−12γ≤|x|+1 For |x|<1γ<1:A|x|−B≤=|x|γ=|x|2γ2<|x|γ2.

Hence,

 |Dun|(Ω)=∫Ω|∇un| dx+∫Ω|Dsu|≤∫Ω|∇u|γ dx+C∫Ω|Dsu|≤M,∀n≥1.

Moreover, is uniformly bounded in for or because of the coercivity assumption (6) on and therefore is uniformly bounded in . Because can be compactly embedded in this gives that converges weak to a function in and (by passing to another subsequence) strongly converges in . From the convergence in , bounded, we get that (up to a subsequence) converges pointwise a.e. in . Moreover, since is continuous, also converges pointwise to . Then, lower-semicontinuity of w.r.t. strong convergence in [19] and Fatou’s lemma together with pointwise convergence applied to gives that

 Jγrelax(u)=∫Ω|∇u|γ dx+C∫Ω|Dsu|+d∑i=1∫Ωλiϕi(u,f) dx≤liminfnJγrelax(un)=liminfn(∫Ω|∇un|γ dx+C∫Ω|Dsun|+d∑i=1∫Ωλiϕi(un,f) dx).

To see that the minimiser lies in the admissible set it is enough to observe that the set is a convex and closed subset of and hence it is weakly closed by Mazur’s Theorem. This gives that . To see that in fact is the greatest lower-semicontinuous envelope of see [19, 7, 8, 9, 6].

{theorem}

There exists an optimal solution to

 minλi≥0, i=1,...,d g(~u)+βd∑i=1∥λi∥2X (10a) subject to ~u=argminu∈BV(Ω)∩AJγrelax(u). (10b)
{proof}

Since the cost functional is bounded from below, there exists a minimizing sequence . Due to the Tikhonov term in the cost functional, we get that is bounded in . Let be a minimiser of for a corresponding . Such a minimiser exists because of Theorem 2. Hence,

 Jγrelax(un) ≤Jγrelax(0) ∫Ω|∇un|γ dx+C∫Ω|Dsun|+d∑i=1∫Ω(λi)nϕi(un,f) dx ≤d∑i=1∫Ω(λi)nϕi(0,f) dx

As before, from the coercivity condition on and the uniform bound on , we deduce that

 C|Dun|(Ω)≤C|Dun|(Ω)+d∑i=1∫Ω(λi)nϕi(un,f) dx ≤d∑i=1∫Ω(λi)nϕi(0,f) dx ≤12(d∑i=1∥(λi)n∥X+∥ϕi(0,f)∥2L2) ≤C.

Moreover, from the coercivity of in we get with a similar calculation that is uniformly bounded in for or , and hence in particular in . In sum, is uniformly bounded in and hence, converges weakly in and strongly in . The latter also gives pointwise convergence of and consequently a.e. and hence we have

 Jγrelax(^u,^λ):=∫Ω|∇^u|γ dx+C∫Ω|Ds^u|+d∑i=1∫Ω^λiϕi(^u,f) dx≤liminfn(∫Ω|∇un|γ dx+C∫Ω|Dsun|+d∑i=1∫Ω(λi)nϕi(un,f) dx)=liminfnJγrelax(un,λn).

Since the cost functional is w.l.s.c., it follows, together with the fact that is weakly closed, that is optimal for (8).

{theorem}

The sequence of functionals in (9) converges in the - sense to the functional

 Jrelax(u)={∫Ω|∇u|+∑di=1∫Ωλiϕi(u) dxu∈W1,1(Ω)∩A+∞u∈BV(Ω)∖(W1,1(Ω)∩A)

as . Therefore, the unique minimiser of converges to the unique minimiser of as goes to infinity.

{proof}

The proof is a standard result that follows from the fact that a decreasing point wise converging sequence of functionals - converges to the lower semicontinuous envelope of the point wise limit [16, Propostion 5.7]. In fact, decreases in and converges pointwise to . Then, for the functional (being the lower-semicontinuous envelope of ) - converges to the functional in Theorem 2. The latter is the lower-semicontinous envelope of the functional in (3).

Although Theorem 2 provides a convergence result for the regularized TV subproblems, it is not sufficient to conclude convergence of the optimal regularized weights. For this we need the continuity of the solution map . Up to our knowledge, no sufficient continuity results for the control-to-state map in the case of a total variation minimiser as the state are known. There are various contributions in this directions [14, 31, 38, 39] which are – as they stand – not strong enough to prove the desired result in our case. Indeed, this is a matter of future research.

## 3 Optimization problem in H10(Ω)

In order to obtain continuity of the solution map and, hence, convergence of the regularized optimal parameters, we proceed in an alternative way and move, from now on, to a Hilbert space setting. Specifically, we replace the minimisation problem (3) by the following elliptic-regularized version of it:

 minu(ε2∥Du∥2L2+|Du|(Ω)+d∑i=1∫Ωλi ϕi(u)dx). (11)

where is an artificial diffusion parameter.

A necessary and sufficient optimality condition for (11) is given by the following elliptic variational inequality:

 ε(Du,D(v−u))L2+d∑i=1∫Ωλiϕ′i(u)(v−u) dx+∫Ω|Dv| dx−∫Ω|Du| dx≥0 % for all v∈H10(Ω). (12)

Note that by adding the coercive term, we implicitely impose the solution space (see [26]).

Our aim is to determine the optimal choice of parameters by solving the following optimization problem:

 minλi≥0, i=1,...,d g(u)+βd∑i=1∥λi∥2X (13a) subject to ε(Du,D(v−u))L2+d∑i=1∫Ωλi ϕ′i(u)(v−u) dx+∫Ω|Dv| dx−∫Ω|Du| dx≥0 % for all v∈H10(Ω), (13b)

where the space corresponds to in the case of scalar parameters or to a Hilbert function space in the case of distributed functions. Problem 13 corresponds to an optimization problem governed by a variational inequality of the second kind (see [17] and the references therein).

Next, we perform the analysis of the optimization problem (13). After proving existence of an optimal solution, a regularization approach will be also proposed in this context. We will prove the continuity of the control-to-state map and, based on it, convergence of the regularized images and the optimal regularized parameters. In the case of a smoother regularization of the TV term, also differentiablity of the solution operator will be verified, which will lead us afterwards to a first order optimality system characterizing the optimal solution to (13).

We start with the following existence theorem. {theorem} There exists an optimal solution for problem (13). {proof} Let be a minimizing sequence. Due to the Tikhonov term in the cost functional, we get that is bounded in . From (13b) we additionally get that the sequence of images satisfy

 ε∥un∥2H10+d∑i=1∫Ωλni [ϕ′i(un)−ϕ′i(0)]un dx+∫Ω|Dun| dx≤−d∑i=1∫Ωλni ϕ′i(0)un dx, (14)

which, due to the monotonicity of the operators on the left hand side, implies that

 ε∥un∥2H10≤d∑i=1∥λni∥X∥ϕ′i(0)∥Lr∥un∥Lp, (15)

for and . Thanks to the embedding , for all , we get that

 ∥un∥H10≤Cd∑i=1∥λni∥X∥ϕ′i(0)∥Lr, (16)

for some constant . Consequently,

 {un} is uniformly bounded in H10(Ω). (17)

Therefore, there exists a subsequence which converges weakly in to a limit point . Moreover, strongly in and, thanks to the continuity of , also

 (18)

Consequently, thanks to the continuity of and the properties of , we get

 a(^u,^u)+d∑i=1∫Ω^λiϕ′i(^u)^u +∫Ω|D^u| ≤liminfa(un,un)+d∑i=1∫Ωλniϕ′i(un)un+∫Ω|Dun| ≤liminfa(un,v)+d∑i=1∫Ωλniϕ′i(un)v+∫Ω|Dv| =a(^u,v)+d∑i=1∫Ω^λiϕ′i(^u)v+∫Ω|Dv|.

Since the cost functional is w.l.s.c., it follows, together with the fact that is weakly closed, that is optimal for (13).

Next, we consider the following family of regularized problems:

 ϵ(Duγ,Dv)+(hγ(Duγ),Dv)+d∑i=1∫Ωλi ϕ′i(uγ)v=0,∀v∈H10(Ω), (19)

where corresponds to an active-inactive-set approximation of the subdifferential of , i.e., coincides with an element of the subdifferential up to a small neighborhood of 0. The most natural choice is the function

 hγ(z)=γzmax(1,γ|z|), (20)

which corresponds to the derivative of the Huber function defined in (7). An alternative regularization is given by the function

 hγ(z)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩gz|z| if  γ|z|≥g+12γz|z|(g−γ2(g−γ|z|+12γ)2) if%  g−12γ≤γ|z|≤g+12γγz if  γ|z|≤g−12γ, (21)

where is a positive parameter. The latter smoothing for the total variation is going to be used in Proposition 3 where differentiability of the regulariser is needed.

###### Remark 3.1

For a fixed , it can be verified that (19) has a unique solution. Moreover, the sequence of regularized solutions converges strongly in to the solution of (12) (cf. [17]).

Based on the regularized problems (19), we now focus on the following optimization problem:

 minλi≥0, i=1,...,d g(u)+βd∑i=1∥λi∥2X (22a) subject to ϵ(Duγ,Dv)+(hγ(Duγ),Dv)+d∑i=1∫Ωλi ϕ′i(uγ)v=0,∀v∈H10(Ω), (22b)

where . In this setting we can prove the following continuity and convergence results.

{proposition}

Let be a sequence in such that weakly in as . Further, let denote the solution to (22b) associated with and . Then

 un→^u strongly in H10(Ω).
{proof}

Since weakly in , it follows, by the principle of uniform boundedness, that is bounded in . From (22b) we additionally get that

 ε∥un∥2H10+d∑i=1∫Ωλni [ϕ′i(un)−ϕ′i(0)]un dx+(hγ(Dun),Dun)≤−d∑i=1∫Ωλni ϕ′i(0)un dx, (23)

which, proceeding as in the proof of Theorem 3, implies that

 ∥un∥H10≤Cd∑i=1∥λni∥X∥ϕ′i(0)∥Lr, (24)

for some constant . Hence,

Consequently, there exists a subsequence (denoted the same) and a limit such that

 un⇀^u in H10(Ω)%andun→^u in Lp(Ω), 1≤p<+∞.

Thanks to the structure of the regularized VI (22b) it also follows (as in the proof of Theorem 3) that is solution of the regularized VI associated with . Since the solution to (22b) is unique, it additionally follows that the whole sequence converges weakly towards .

To verify strong convergence, we take the difference of the variational equations satisfied by and and obtain that

 ε(Dun−D^u,Dv)+(hγ(Dun)−hγ(D^u),Dv)=∫Ω[^λ ϕ′(^u)−λn ϕ′(un)]v dx,∀v∈H10(Ω).

Adding the term on both sides of the latter yields

 ε(Dun−D^u,Dv)+(hγ(Dun)−hγ(D^u),Dv)∫Ω[λn ϕ′(un)−λn ϕ′(^u)]v dx=∫Ω[^λ ϕ′(^u)−λn ϕ′(^u)]v dx,∀v∈H10(Ω).

Choosing and thanks to the monotonicity of the operator on the left hand side, we then obtain that

 ε∥Dun−D^u∥2≤∣∣∣∫Ω[^λ ϕ′(^u)−λn ϕ′(^u)](un−^u) dx∣∣∣,

which thanks to the strong convergence and the regularity , implies the result.

{theorem}

There exists an optimal solution for each regularized problem (22). Moreover, the sequence of regularized optimal parameters is bounded in and every weakly convergent subsequence converges towards an optimal solution of (13). {proof} Let be a minimizing sequence. From the structure of the cost functional and the properties of (22b) it follows that the sequence is bounded. Consequently, there exists a subsequence (denoted the same) and a limit such that weakly in . From Proposition 3 and the weakly lower semicontinuity of the cost functional, optimality of follows, and, therefore, existence of an optimal solution.

Let now be a sequence of optimal solutions to (22). Since is feasible for each , it follows that

 J(uγ(λγ),λγ)≤J(uγ(0),0)=J(0,0).

Thanks to the Tikhonov term in the cost functional it then follows that is bounded.

Let be the limit point of a weakly convergent subsequence (also denoted by ). From Remark 3.1 and Proposition 3 it follows, by using the triangle inequality, that

 uγ(λγ)→^u strongly in H10(Ω) as γ→∞,

where denotes the solution to (12) associated with .

From the weakly lower semicontinuity of the cost functional, we finally get that

 J(^u,^λ)≤liminfγ→∞J(uγ(λγ),λγ)≤liminfγ→∞J(uγ(¯λ),¯λ)=J(¯u,¯λ),

where is an optimal solution to (22).

The next proposition is concerned with the differentiability of the solution operator. This result will lead us thereafter (see Theorem 3) to get an expression for the gradient of the cost functional and also to obtain an optimality system for the characterization of the optimal solutions to (13). {proposition} Let be the solution operator, which assigns to each parameter the corresponding solution to the regularized VI (19), with the function given by (21). Then the operator is Gâteaux differentiable and its derivative at , in direction , is given by the unique solution of the following linearized equation:

 ϵ(Dz,Dv)+(h′γ(D¯u)Dz,Dv)+d∑i=1∫Ωλi ϕ′′i(¯u) z v dx+d∑i=1∫Ωξi ϕ′i(¯u) v dx=0,  for all v∈H10(Ω). (25)
{proof}

Existence and uniqueness of a solution to (25) follows from Lax-Milgram theorem by making use of the monotonicity properties of and .

Let , and let and be the unique solutions to (19) correspondent to and , respectively. By taking the difference between both equations, it follows that

 ϵ(D(ut−u),D(ut−u))+(hγ(Dut)−hγ(Du),D(ut−u))+λt(ϕ′(ut)−ϕ′(u),ut−u)=−t(ξϕ′(u),ut−u), (26)

which, by the monotonicity of and yields that

 κ∥ut−u∥2H10≤t ∥ξ∥Xd∥ϕ′(u)∥Lr∥ut−u∥Lp. (27)

Therefore, the sequence , with , is bounded and there exists a subsequence (denoted the same) such that weakly in .

Using the mean value theorem in integral form we get that

 a(zt,w) +1t(hγ(Dut)−hγ(Du),Dw)+1t(λt(ϕ′(ut)−ϕ′(u)),w) (28) =a(zt,w)+∫Ω⟨h′γ(ϑt)Dzt,Dw⟩ dx+∫Ωλtϕ′′(ζt)w dx (29) =−(ξϕ′(u),w), for all w∈V, (30)

where , with , and , with .

From the continuity of the bilinear form it follows that