1 Introduction
###### Abstract

We consider the problem of image denoising in the presence of noise whose statistical properties are a combination of two different distributions. We focus on noise distributions that are frequently considered in applications, in particular mixtures of salt & pepper and Gaussian noise, and Gaussian and Poisson noise. We derive a variational image denoising model that features a total variation regularisation term and a data discrepancy that features the mixed noise as an infimal convolution of discrepancy terms of the single-noise distributions. We give a statistical derivation of this model by joint Maximum A-Posteriori (MAP) estimation, and discuss in particular its interpretation as the MAP of a so-called infinity convolution of two noise distributions. Moreover, classical single-noise models are recovered asymptotically as the weighting parameters go to infinity. The numerical solution of the model is computed using second order Newton-type methods. Numerical results show the decomposition of the noise into its constituting components. The paper is furnished with several numerical experiments and comparisons with other existing methods dealing with the mixed noise case are shown.

Infimal convolution of data discrepancies for mixed noise removal

Luca Calatroni

Centre de Mathématiques Appliquées (CMAP)

École Polytechnique CNRS,

Route de Saclay, 91128 Palaiseau Cedex, France

(luca.calatroni@polytechnique.edu)

[0.4cm] Juan Carlos De Los Reyes

Research Center on Mathematical Modelling (MODEMAT), Escuela Politécnica Nacional

(juan.delosreyes@epn.edu.ec)

[0.4cm] Carola-Bibiane Schönlieb

Department of Applied Mathematics and Theoretical Physics (DAMTP)

University of Cambridge, Wilberforce Road, CB3 0WA, Cambridge, UK

(cbs31@cam.ac.uk)

## 1 Introduction

Let open and bounded with Lipschitz boundary and be a given noisy image. The image denoising problem can be formulated as the task of retrieving a denoised image from . In its general expression, assuming no blurring effect on the observed image, the denoising inverse problem assumes the following form:

 find us.t.f=T(u), (1.1)

where stands for the degradation process generating random noise that follows some statistical distribution. The problem (1.1) is often regularised and reformulated as the following minimisation problem

 find u s.t. u∈argminv∈V{J(v):=R(v)+λ\leavevmode\nobreak Φ(v,f)}, (1.2)

where and are elements in suitable function spaces, and the desired reconstructed image is computed as a minimiser of the energy functional in the Banach space . The functional is the sum of two different terms: the regularising energy which drives the reconstruction process encoding a-priori information about the desired image and the data fidelity function modelling the relation between the data and the reconstructed image . The positive weighting parameter balances the action of the regularisation against the trust in the data. In this paper we consider

 (1.3)

the Total Variation (TV) as image regulariser, which is a popular choice since the seminal works of Rudin, Osher and Fatemi [50], Chambolle and Lions [17] and Vese [58] due to its edge-preserving and smoothing properties. Under this choice, the minimisation problem (1.2) is formulated in a subspace of , the space of functions of bounded variation, [1].

Having fixed the regularisation term, depending on the imaging task and the specific application considered, several choices for the fidelity term and the balancing parameter can be made. The correct mathematical modelling of the data fidelity term in (1.2) is crucial for the design of an image reconstruction model fitting appropriately the given data. Its choice is typically driven by physical and statistical considerations on the noise corrupting the data in (1.1), cf. [33, 56, 28]. In this spirit, the general model (1.2) is derived as the MAP estimate of the likelihood distribution. In the simplest scenario, the noise is assumed to be an additive random component , Gaussian-distributed with zero mean and variance determining the noise intensity. In this case (1.1) can be simply written as

 f=u+w.

Gaussian noise is often used to model the noise statistics in many applications such as in medical imaging, as a simple approximation of more complicated noise models. Other additive noise distributions, such as the more heavy-tailed Laplace distribution can alternatively be considered. Another possibility – which is appropriate for modelling transmission errors affecting only a percentage of the pixels in the image – is to consider a type of noise where the intensity value of only a fraction of pixels in the image is switched to either the maximum/minimum value of its dynamic range or to a random value within it with positive probability. This type of noise is called “salt & pepper” noise or “impulse” noise, respectively. In some other cases, a different, signal-dependent property is assumed to conform to the actual physical application considered. For instance, a Poisson distribution of the noise is used for astronomical and microscopy imaging applications.

When Gaussian noise is assumed, an -type data fidelity

 Φ(u,f)=∫Ω(u−f)2\leavevmode\nobreak dx, (1.4)

can be derived for as the MAP estimate of the Gaussian likelihood function [28, 50, 17]. Similarly, in the case of additive Laplace noise, the statistically consistent data fidelity term for reads

 Φ(u,f)=∫Ω|u−f|\leavevmode\nobreak dx, (1.5)

see, e.g. [42, 3]. Furthermore, the same data fidelity is considered in [44, 25] to model the sparse structure of the salt & pepper and impulse noise distributions. Variational models where a Poisson noise distribution is assumed are often approximated by weighted-Gaussian distributions through variance-stabilising techniques [55, 11]. In [53, 52] a statistically-consistent analytical modelling has been derived: for the resulting data fidelity term is a Kullback-Leibler-type functional of the form

 Φ(u,f)=∫Ω(u−flogu)\leavevmode\nobreak dx. (1.6)

As a result of different image acquisition and transmission factors, very often in applications the observed image is corrupted by a mixture of noise statistics. For instance, mixed noise distributions can be observed when faults in the acquisition of the image are combined with transmission errors to the receiving sensors. In this case, the modelling of (1.2) shall encode a combination of salt & pepper and Gaussian noise in the choice of . In other applications, specific tools (such as fluorescence and/or high-energy beams) are used before the signal is actually acquired. This process is typical, for instance, in microscopy and astronomy and may result in a combination of a Poisson noise with an additive Gaussian noise component [51, 54].

From a modelling point of view, the presence of multiple noise distributions has been translated in the relevant literature in the combination of the data fidelity terms (1.4), (1.5) and (1.6) in different ways. In [30], for instance, a combined data fidelity with TV regularisation model is considered for joint impulsive and Gaussian noise removal. There, the image is spatially decomposed in two disjoint domains in which the image is separetely corrupted by pure salt & pepper and Gaussian noise, respectively. A two-phase approach is considered in [13] where two sequential steps with and data fidelity are performed to remove the impulsive and the Gaussian component of the mixed noise, respectively. Mixtures of Gaussian and Poisson noise have also been considered. In [36, 35], for instance, the exact log-likelihood estimator of the mixed noise model is derived and its numerical solution is computed via a primal-dual splitting. A similar model has been considered in [4] where a scaled gradient semi-convergent algorithm is used to solve the combined model. In [27] the discrete-continuous nature of the model (due to the different support of the Poisson-Gaussian distribution on the set of natural and real numbers, respectively) is approximated by an additive model, using homomorphic variance-stabilising transformations and weighted- approximations. In [41] a non-Bayesian framework to differentiate Poisson intensities from the Gaussian ones in the Haar wavelet domain is considered. In [39] a Gaussian-Poisson model similar to the one we consider in this paper is derived in a discrete setting. A more recent approach featuring a linear combination of different data fidelities of the type (1.4), (1.5) and (1.6) has been considered in [38] for a combination of Gaussian and impulse noise and in [22, 15, 14] in the context of bilevel learning approaches for the design of optimal denoising models.

In this work, we present an alternative variational model for denoising of images corrupted by mixed noise distributions that is based on an infimal convolution of the data fidelity terms (1.4), (1.5) and (1.6). For the case of Gaussian-Poisson noise a similar model is discussed in [39] in a finite-dimensional setting. Our model is derived from statistical assumptions on the data and can be studied rigorously in function spaces using standard tools in calculus of variations and functional analysis. The simple infimal convolution nature of the model derived makes its numerical solution amenable for standard first or second order optimisation methods. In this paper we use a semi-smooth Newton (SSN) method for the numerical realisation of our model. By using the classical operation of infimal convolution, our variational model combines different data fidelities , each associated to the corresponding noise component in the data, allowing for splitting the noise into its constituting elements. In what follows, we fix the regularisation term in (1.2) to be the TV energy. This is of course just a toy example and extensions to higher-order regularisations such as TV-TV [46] or TGV [9] can be considered as well. We refer to our model as TV-IC to highlight the infimal convolution combination of data fidelities. This should not be confused with the ICTV model [17, 32] where the same operation is used to combine TV regularisation with second-order regularisation terms.

### 1.1 The reference models

In the following, we consider two exemplar problems which extend (1.1) to the case of multiple noise distributions. We consider the following problem encoding noise mixtures

 findusuch thatf=T(u)+w, (1.7)

where models a general noising process possibly depending on in a non-linear way, while is an additive, Gaussian-distributed, noise component independent of . We will focus in the following on two particular cases of (1.7).

#### Salt & pepper and Gaussian noise

For this case, we suppose that the observed noisy image is corrupted by a mixture of salt & pepper and Gaussian noise, i.e. we consider the following instance of the noising model (1.7),

 f=(1−s)u+sc+w (1.8)

where, following [19, Section 1.2.2], the salt & pepper component is modelled by considering two independent random fields and defined, for every pixel defined as

 c(x)={0,with probability p=1/21,with probability q=1/2,s(x)={0,with probability p1,with probability 1−p.

Following [49, Section 2.2] and [42], we will approximate the nonlinear model (1.8) by the following additive model:

 f=u+v+w, (1.9)

where is now the realisation of a Laplace distributed random variable, independent of . This approximation will be made more clear in Section 3.

#### Poisson and Gaussian noise

In this case, we consider the problem of a mixture of noise distributions where a signal-dependent component is combined with an additive Gaussian noise component. In particular, we will focus on a Poisson distributed component having as Poisson parameter. In other words, we will consider the noising model (1.7) with

 f=z+w,where z∼Pois(u). (1.10)

#### Organisation of the paper

In Section 2 we present the infimal-convolution data fidelity terms considered for the reference models above and show that they are well-defined. Motivations for the TV-IC denoising model are given in Section 3 where some considerations based on the use of Bayes’ formula are given to support our modelling. In particular, we make precise how the mixed data fidelity terms can be derived in two different ways, using a joint MAP estimation strategy as in [39], and by deriving a joint posterior distribution using and variant of the classical convolution operation of two probability distributions. In Section 4 well-posedness results of the TV-IC model by means of classical tools of calculus of variations and functional analysis are given. In Section 5 we show how classical single noise models can be recovered asymptotically from our model by letting the fidelity parameters go to infinity. In Section 6 we confirm the validity of our model with several different numerical experiments using a semi-smooth Newton method for its efficient numerical solution. Finally, in Section 7.1 we report some preliminary results in the spirit of recent advances in noise model learning via bilevel optimisation á la [14].

## 2 The variational model

We consider a open and bounded image domain with Lipschitz boundary and denote by the given, noisy image. Further assumptions on the function spaces where lies will be specified in the following.

Let the total variation semi-norm defined in (1.3) and let denote the space of functions of bounded variation, [1] . For positive weights and admissible sets of functions and , the general proposed TV-IC denoising model for mixed noise distributions reads:

 minu∈BV(Ω)∩A{|Du|(Ω)+Φλ1,λ2(u,f)} (TV-ICa) where the data fidelity Φλ1,λ2(u,f) has the following infimal convolution-type structure: Φλ1,λ2(u,f):=infv∈L2(Ω)∩B{Fλ1,λ2(f,u,v):=λ1\leavevmode\nobreak Φ1(v)+λ2\leavevmode\nobreak Φ2(u,f−v)}, (TV-ICb)

for two data fidelity functions and defined over . The size of the weighting parameters and in (TV-ICb) balances the trust in the data against the smoothing effect of the regularisation as well as the fitting with respect to the intensity of each single noise distribution in .

We now consider the two particular noise combinations introduced in Section 1.1 and demonstrate how these fit into the general model above.

### 2.1 Salt & pepper-Gaussian fidelity

For the case of mixed salt & pepper and Gaussian noise (1.9), we specify the assumptions for the model in (TV-ICa)-(TV-ICb) as follows. We consider the noisy image , for the impulsive noise component and for the Gaussian noise component. The admissible sets are simply . In this case, the infimal convolution data fidelity in (TV-ICb) reads

 Φλ1,λ2(u,f)=infv∈L2(Ω){Fλ1,λ2(f,u,v)=λ1∥v∥L1(Ω)+λ22∥f−u−v∥2L2(Ω)}. (2.2)

Since the set is bounded, and both terms in (2.2) are well-defined.

###### Remark 2.1.

Note that (2.2) can be rewritten as the Moreau envelope or as a weighted proximal map of the -norm in terms of the ratio between the parameters and (see [2, Section 12.4]). Namely, we have:

 Φλ1,λ2(u,f)=proxλ1λ2Φ1(f−u)=λ1λ2\leavevmode\nobreak Mλ1λ2Φ1,

where denotes the proximal map of a generic function while indicates the Moreau envelope of with parameter . Therefore, inherits in this case several good properties of proximal maps such the firmly non-expansiveness ([2, Proposition 12.27]) which will be useful in the following analysis.

The following proposition asserts that the minimisation problem (2.2) is well-posed.

###### Proposition 2.2.

Let , and . Then the minimum in the minimisation problem (2.2) is uniquely attained.

###### Proof.

The proof of this proposition is based on the use of standard tools of calculus of variations. We report it in Appendix A. ∎

### 2.2 Gaussian-Poisson fidelity

For the case of mixed Gaussian and Poisson noise (1.10) some additional assumptions need to be specified. The given image is assumed to be bounded, that is and the data fidelities and , encoding the Gaussian and the Poisson component of the noise, respectively, are then chosen as and as the Kullback-Leibler (KL) divergence between and the “residual” , that is

 Φ2(u,f−v)=DKL(f−v,u)=∫Ω(u−(f−v)+(f−v)log(f−vu))\leavevmode\nobreak dμ. (2.3)

We refer the reader to Appendix B where more properties of the KL functional are given. The variational fidelity in (TV-ICb) then reads

 Φλ1,λ2(u,f)=infv∈L2(Ω)∩B{Fλ1,λ2(f,u,v)=λ12∥v∥2L2(Ω)+λ2\leavevmode\nobreak DKL(f−v,u)} (2.4)

with the following choice of the admissible sets:

 A={u∈L1(Ω),\leavevmode\nobreak logu∈L1(Ω)} and B={v∈L2(Ω):v≤f  a. e.}. (2.5)

We note that for we have almost everywhere. Together with the definition of , this ensures that the functional is well defined (using the standard convention ).

###### Remark 2.3.

Our choice for the Poisson data fidelity (2.3) differs from previous works on Poisson noise modelling such as [52, 22] where a reduced KL-type data fidelity

 ~Φ2(u,f)=∫Ω(u−f\leavevmode\nobreak logu)\leavevmode\nobreak dμ

is preferred. This choice is motivated through Bayesian derivation and MAP estimation (compare [52, 11, 3]), where the terms that do not depend on the de-noised image are neglected since they are not part of the optimisation argument. In our combined modelling, however, those quantities have to be taken into account to incorporate the additional variable encoding the Gaussian noise component.

Similarly as in Proposition 2.2, we guarantee that the minimisation problem (2.4) is well posed in the following proposition.

###### Proposition 2.4.

Let , and . Let and be as in (2.5). Then, the minimum in the minimisation problem (2.4) is uniquely attained.

###### Proof.

We report the proof in Appendix A It is based on standard tools of calculus of variations and on the properties of Kullback-Leibler divergence recalled in Appendix B. ∎

## 3 Statistical derivation and interpretation

In this section we want to give a statistical motivation for the choice of the IC data fidelity term introduced in the previous section. We do this by switching from the continuous setting with in an infinite dimensional function space, to the discrete setting, where is a vector in and the noise distribution of each is independent of the others. Correspondingly, we consider an appropriate discretisation of the continuous TV-IC variational model(TV-ICa)-(TV-ICb). In this setting, we will show, how the IC data fidelity can be derived in two ways: in terms of a joint MAP estimate for and in Section 3.1, and by considering a modified likelihood for the mixed noise distribution – which consists of an infinity convolution of the two noise distributions – and computing its MAP estimate with respect to in Section 3.2.

### 3.1 Joint maximum-a-posterior estimation

A standard approach used to derive variational regularisation methods for inverse problems which reflect the statistical assumptions on the data is the Bayesian approach [19, 56]. In this framework, the problem is formulated in terms of the maximisation of the posterior probability density , i.e. the probability of observing the desired de-noised image given the noisy image . This approach is commonly known as Maximum A Posteriori (MAP) estimation and relies on the simple application of the Bayes’ rule

 P(u|f)=P(f|u)P(u)P(f), (3.1)

by which maximising for translates in maximising the ratio on the right hand side of (3.1). Classically, the probability density is called prior probability density since it encodes statistical a priori information on . Frequently, a Gibbs model of the form

 P(u)=e−αR(u),\leavevmode\nobreak α>0, (3.2)

where is a convex regularising energy functional is assumed. In what follows, we take to be a discretisation of the total variation (1.3), that is denoting by the pixel-coordinates and reordering within a one-dimensional vector with index , thus considering , we have

 R(u)=∥∇u∥2,1=∑i|∇ui|=∑k,l√(uk+1,l−uk,l)2+(uk,l+1−uk,l)2,

where denotes the forward-difference approximation of the two-dimensional gradient. The conditional probability density , also called likelihood function, relates to the statistical assumptions on the noise corrupting the data . Its expression depends on the probability distribution of the noise assumed on the data. The denominator of (3.1) plays the role of a normalisation constant for to be a probability density.

Clearly, maximising (3.1) in our setting of a Gibbs prior is equivalent to minimising the negative logarithm of . Thus, the MAP estimation problem equivalently corresponds to the problem of finding

 uMAP∈argminu{−logP(u|f)}=argminu{−logP(f|u)+αR(u)},

where the term has been neglected since it does not affect the minimisation over .

We now want to use MAP estimation to derive a discretisation of the TV-IC variational model (TV-ICa)-(TV-ICb) for the mixed noise scenario. From a Bayesian point of view, the general model (1.7) can be written in terms of random variables and (for which the noisy image and the noise-free image are realisations, respectively) as follows

 F=Z+WZ∼PUZ,U∼PU,W∼PW:=N(0,σ2), (3.3)

Here, is the sum of the two random variables and . The random variable is distributed according to a probability distribution which may in general depend on (for instance, could be one of its parameters), whereas is fixed to be a Gaussian random variable independent of and , which combined with through an additive modelling. Here, we denote by the Gibbs model (3.2) on , i.e. the prior distribution on , the quantity of interest, so that .

For our derivation we consider a joint MAP estimation for a pair of realisations of the corresponding random variables and . Using Bayes’ rule and mutual independence between and , we have:

 (^u,^w)=argmax(u,w)\leavevmode\nobreak P(u,w|f)=argmax(u,w)\leavevmode\nobreak P(f|u,w)P(u,w)=argmax(u,w)\leavevmode\nobreak PUZ(f−w)PW(w)PU(u) (3.4)

which holds for general noise distributions . However, in the particular case where with being a random variable distributed as and independent of (in particular, when is not a parameter of the probability distribution of and is combined additively with another random variable ), (3.3) becomes

 F=U+V+W. (3.5)

In this case , and (3.4) reduces to:

 (^u,^w)=\leavevmode\nobreak argmax(u,w)\leavevmode\nobreak PUZ(f−w)PW(w)PU(u)=argmax(u,w)PV(f−u−w)PW(w)PU(u). (3.6)

After these general considerations, let us now carry on with the detailed MAP derivation, first for the additive mixture of Laplace noise noistribution (as an approximation for salt & pepper distribution) and Gaussian noise distribution (1.9), and then for the mixed Gaussian-Poisson case (1.10).

Let us focus first on the additive model (3.5) in the case of a mixture of Laplace- and Gaussian-distributed noise. Following [49], we will use the Laplace distribution as an approximation for the one describing salt & pepper noise. More precisely, such distribution is an example of heavy-tailed distribution whose behaviour can be well approximated by the heavy-tailed Laplace noise distribution [42, 59]. We start from formulating the problem (3.5) in a finite-dimensional statistical setting. For each image pixel , our problem is

 finduis.t.fi=ui+vi+wi, (3.7)

where and are realisations of two mutually independent random variables distributed according to Laplace and Gaussian distribution, respectively. More precisely, the equation above describes the structure of the realisations of the components of the random vector in terms of the ones of the random vectors and . The quantities and are, for every , independent realisations of identically distributed Laplace and Gaussian random variables and , respectively, i.e. and .

We recall that the Laplace probability density with parameter is defined as

 PVi(s)=e−|s|/τ2τ,s∈R, (3.8)

and the Gaussian probability density with zero-mean and variance is defined as

 PWi(s)=e−|s|2/2σ2√2πσ2,s∈R. (3.9)

Let us now take the negative logarithm in the joint MAP estimate (3.6) for all the realisations. By independence, we have that:

 (^u,^w) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)\leavevmode\nobreak −log(M∏i=1PVi(fi−ui−wi)PWi(wi)PUi(ui)) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)−M∑i=1\leavevmode\nobreak log(PVi(fi−ui−wi)PWi(wi)PUi(ui)) (3.10) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)M∑i=1(|fi−ui−wi|τ+|wi|22σ2+α|∇ui|).

where the constant terms which do not affect the minimisation over and have been neglected.

To pass from a discrete to a continuous expression of the model we follow [52]. We interprete the elements in the space as samples of functions defined on the whole image domain . For convenience, we use in the following the same notation to indicate the corresponding, continuous quantities. Introducing the indicator function

 χDi(x)={1\leavevmode\nobreak ,if x∈Di,0\leavevmode\nobreak ,else,

where is the region in the image occupied by the -th detector, we have that any discrete data can be interpreted as the mean value of a function over the region as follows

 hi=∫Dih(x)\leavevmode\nobreak dx=∫ΩχDi(x)h(x)\leavevmode\nobreak dx.

In this way, we can then express (3.10) as the following continuous model

 minu,w:\leavevmode\nobreak Ω→R\leavevmode\nobreak ∫Ω(|f(x)−u(x)−w(x)|τ+|w(x)|22σ2+α|∇u(x)|)\leavevmode\nobreak dμ(x)

where with being the usual Lebesgue measure in . We observe that at this level, the function spaces (i.e. the regularity) where the minimisation problem is posed still need to be specified. Defining and and replacing by , we derive from (3.21) the following variational model for noise removal of Laplace and Gaussian noise mixture

 minu∈BV(Ω)w∈L2(Ω)\leavevmode\nobreak |Du|(Ω)+λ1∥f−u−w∥L1(Ω)+λ2∥w∥2L2(Ω),

where the function spaces for and have been chosen so that all the terms are well defined. By a simple change of variables, we can equivalently write the model above as

 minu∈BV(Ω)v∈L2(Ω)\leavevmode\nobreak |Du|(Ω)+λ1∥v∥L1(Ω)+λ2∥f−u−v∥2L2(Ω), (3.11)

where the minimisation is taken over the de-noised image and the salt & pepper noise component .

#### The signal-dependent case

Let us now consider (3.3) in the case when the non-Gaussian noise component in the data follows a Poisson probability distribution with parameter . In this case, the model (3.3) can be formulated in a finite-dimensional setting as

 fi=zi+wi,with zi∼Poiss(ui) (3.12)

at every pixel . Similarly as before, the equation above describes the structure of the realisations of the random vector in terms of the sum of two mutually independent random vectors and , having as components independent and identically distributed Poisson and Gaussian random variables and , respectively, for every . Again, the values are random realisations of the random variables which are distributed according to the Gibbs model (3.2) introduced before.

The Poisson noise density with parameter is defined as

 PUiZi(zi)=uizie−uizi!,zi∈R, (3.13)

where the factorial function is classically extended to the whole real line by using the Gamma function. We note that (3.13) can be rewritten equivalently as:

 PUiZi(zi)=exp(−ui+log(uziizi!))zi∈R.

Similarly as above, let us now take the negative logarithm in the general joint MAP estimate (3.6), for all the realisations. By independence, we have:

 (^u,^w) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)\leavevmode\nobreak −log(M∏i=1PUiZi(fi−wi)PWi(wi)PUi(ui)) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)−M∑i=1log(PUiZi(fi−wi)PWi(wi)PUi(ui)) =\leavevmode\nobreak argminu=(u1,…,uM),w=(w1,…,wM)M∑i=1(ui−log(ufi−wiizi!)+|wi|22σ2+α|∇ui|),

where the constant terms which do not affect the minimisation over and have been neglected. Passing from a discrete to a continuous modelling similarly as described in the discussion for the additive case, we obtain the following continuous model

 minu,w\leavevmode\nobreak |Du|(Ω)+λ12∥w∥2L2(Ω)+λ2∫Ω(u(x)−log(u(x)f(x)−w(x)(f(x)−w(x))!))\leavevmode\nobreak dμ(x), (3.14)

where we have set and and we have still to specify the function spaces where the minimisation takes place. By a closer inspection on the third term in the model above, we have:

 ∫Ω(u(x)−log(u(x)f(x)−w(x)(f(x)−w(x))!))\leavevmode\nobreak dμ(x) =∫Ω(u(x)−log(u(x)f(x)−w(x))+log((f(x)−w(x))!))\leavevmode\nobreak dμ(x) =∫Ω(u(x)−(f(x)−w(x))log(u(x))+log((f(x)−w(x))!))\leavevmode\nobreak dμ(x) ≈∫Ω(u(x)−(f(x)−w(x))log(f(x)−w(x)u(x))−(f(x)−w(x)))\leavevmode\nobreak dμ(x)=DKL(f−w,u),

where we have used the standard Stirling approximation of the logarithm of the factorial function. The functional we end up with is the well-known Kullback-Leibler (KL) functional which has been used for the design of imaging models used for Poisson noise removal in previous works, such as [53, 52, 40]. We refer the reader also to Appendix B where some properties of are recalled. The variational Poisson-Gaussian data fidelity (3.14) can then be written in a more compact form as

 minu∈BV(Ω)Aw∈L2(Ω)∩B\leavevmode\nobreak |Du|(Ω)+λ12\leavevmode\nobreak ∥w∥2L2(Ω)+λ2\leavevmode\nobreak DKL(f−w,u), (3.15)

where the admissible sets and are chosen as specified in (2.5) to guarantee that all the terms in (3.15) are well-defined. We point out here again that in [39] a similar joint MAP estimation for a mixed Gaussian and Poisson modelling was considered.

In both the general model (3.3) and in its simplified additive version (3.5) we have derived through joint MAP estimation the two variational models (3.11) and (3.15) which, consistently with our statistical assumptions, model the single noise components by norms classically used for the corresponding single noise models and combined together for the mixed noise case through a joint minimisation procedure. The fidelity terms are weighted against each other by the parameters and , whose size depends on the intensity of each noise component, and weight the fidelity against the TV regularisation.

In the next section, we discuss a different interpretation of the TV-IC model, as the MAP estimate for a posterior distribution in which the two noise distributions are combined by a so-called infinity convolution, instead of the classical convolution of probability distributions.

### 3.2 Infinity convolution

From a probabilistic point of view, convolution-type data fidelities (TV-ICb) can be alternatively derived via a modification of the expression of the probability density of a sum of random variables. Given two independent real-valued random variables and with associated probability densities and , it is well-known that the random variable has probability density given by the convolution of and , that is

 fZ(z)=∫RfV(v)fW(z−v)\leavevmode\nobreak dv=(fV∗fW)(z). (3.16)

Following [31, Remark 2.3.2], since and are nonnegative, we can define for the convolution of order p as follows:

 fpZ(z):=(fV∗pfW)(z)=(∫R(fV(v)fW(z−v))p\leavevmode\nobreak dv)1/p, (3.17)

which clearly corresponds to the classical convolution (3.16) if . By letting it is easy to show that (3.17) converges to the infinity convolution of and

 f∞Z(z):=((fV∗∞fW)(z)=supv∈R\leavevmode\nobreak fV(v)fW(z−v). (3.18)

We note that in (3.16) and have the same domain. Also, note that in order for to be a probability density the following normalisation

 ~f∞Z(z):=1∫f∞Z(z)dz\leavevmode\nobreak f∞Z(z)

is needed. Then, is a probability density, having the same domain of , but potentially featuring heavier tails.

In the special case when the probability densities and have an exponential form of the type and , with and being positive constants and continuous and convex positive functions, (3.18) can be equivalently rewritten as:

 (fV∗∞fW)(z)=cVcW\leavevmode\nobreak e−infv∈R(gV(v)+gW(z−v)). (3.19)

Hence, the infinity convolution of and corresponds to the infimal convolution of and through a negative exponentiation and up to multiplication by positive constants.

Let us now recall the additive model in the finite dimensional setting (3.7). At every pixel, the random variable is the sum of the two independent random variables and having Laplace (3.8) and Gaussian (3.9) probability densities, respectively. In contrast to the previous Section 3.1 where we have computed a joint MAP estimate for and , here we want to compute the MAP estimate for the modified, infinity convolution likelihood (3.19), optimising over only. We will see that doing so we derive the same infimal-convolution models (3.11) and (3.15) as before. By independence of the realisations, we have that the probability density of the random vector in correspondence with the realisation reads

 fY(y)=M∏i=1fYi(yi). (3.20)

Each probability density is formally given by the convolution between the Laplace and Gaussian probability distribution. However, if we replace this convolution with the infinity convolution defined in (3.18) and we normalise appropriately in order to get a probability distribution, we get that the probability density of a single evaluated in correspondence of one realisation can be expressed as

 fYi(yi)=12τ√2πσ2e−infvi∈R\leavevmode\nobreak (|vi|/τ\leavevmode\nobreak +\leavevmode\nobreak |yi−vi|2/2σ2), i=1,…,M. ~fYi(yi):=1∑Mi=1fYi(yi)fYi(yi), i=1,…,M.

Plugging this expression in (3.20) and computing the negative log-likelihood of , we get:

 −logP(y|u)=−logM∏i=1~fYi(yi)=−M∑i=1log(~fYi(yi))=−M∑i=1log(~fYi(fi−ui)).

Thus, we have that the log-likelihood we intend to minimise is

 M∑i=1\leavevmode\nobreak infvi∈R\leavevmode\nobreak (|vi|τ+|fi−ui−vi|22σ2),

where the constant terms which do not affect the minimisation over have been neglected.

Similar to before, passing from a discrete to a continuous representation we get

 ∫Ωinfv(x)∈R(|v(x)|τ+|f(x)−u(x)−v(x)|22σ2)\leavevmode\nobreak dμ(x)=infv∈L(Ω)∫Ω(|v(x)|τ+|f(x)−u(x)−v(x)|22σ2)\leavevmode\nobreak dμ(x), (3.21)

where the infimum and the integral operators commute by assuming that the function space for is as stated in [48, Theorem 3A] (see also [29] for further details). This also ensures that the integrand terms are both well defined. Similarly as above, we now define and and derive from (3.21) the same following data fidelity term as in (3.11).

The same computation can be done for the signal dependent case of the Poisson-Gaussian noise mixture (3.12) with (3.13), thus obtaining the same model (3.15).

### 3.3 Connection with existing approaches

Several variational models for image denoising in the presence of combined noise distributions have been considered in the literature.

In the case of a mixture of impulsive and Gaussian noise these models can be roughly divided in two categories. The first category considers additive, + data fidelities. In [30], for instance, the image domain is decomposed in two parts, with impulsive noise in one and Gaussian noise in the other, modelled by the sum of an and data fidelity supported on the respective domain with Gaussian or impulse noise. For the numerical solution an efficient domain decomposition approach is used. In [22, 15, 14] semi-smooth Newton’s methods are employed to solve a denoising model where and data fidelities are combined in an additive fashion. A second category of methods renders the removal of impulsive and Gaussian noise in a two-step procedure (see, e.g. [13]). In the first phase the pixels damaged by the impulsive noise component are identified via an outlier-removal procedure and removed making use of a variational regularisation model with data fitting. Then a variational denoising model with an data fidelity is employed for removing the Gaussian noise in all other pixels. These methods are often presented as image inpainting strategies where the pixels corrupted by impulsive noise are first identified and then filled in using the information nearby with a variational regularisation approach.

When a mixture of Gaussian and Poisson noise is assumed, an exact log-likelihood model has been considered in [35, 36]. In the same discrete setting described above, still denoting by the total number of pixels, the expression of the negative log-likelihood reads

 (3.22)

In comparison with our derivation presented in Section 3.2, here the log-likelihood is derived via MAP estimation for the logarithm of the convolution (3.16) of the Gaussian probability density (3.9) with the Poisson probability density (3.13). Instead, the model we propose is derived equivalently either by replacing the convolution with the infinity convolution or by a joint MAP estimation. The resulting variational model (3.15) is much simpler since, in particular, it makes easier dealing with the infinite sum appearing in (3.22) which is related to the discrete Poisson density function. In order to design an efficient optimisation strategy, the authors first split the expression above into the sum of two different terms, the former being a convex Lipschitz-differentiable function, the latter being a proper, convex and lower semi-continuous function. In particular, the authors are able to design a competitive first-order optimisation method based on the use of primal-dual splitting algorithms exploiting the closed-form of the proximal operators associated to the convex Lipschitz-differentiable function. The advantage of such algorithms is that it is only based on the iterative application of the proximal mapping operations, without requiring any matrix inversion. Furthermore, the approximation error accumulating throughout the iterations of the algorithm are shown to be absolutely summable sequences, a property which is essential in this framework due to the presence of infinite sums.

Another approach for mixed Gaussian and Poisson noise is considered in [39], where the authors design a data fidelity term similar to (3.15) combining it with TV regularisation for image denoising. Despite the analogies between the joint MAP estimation of their and our model, in their work no well-posedness results in fuction species nor properties of noise decompositions are discussed. Nonetheless, in [39] the good performance of the combined model is observed in terms of improvements of the Peak Signal to Noise Ratio (PSNR) for several synthetic and microscopy images.

Differently from most aforementioned works, the simple structure of our models (3.11) and (3.15) allows for the design of efficient first and second-order numerical schemes. In this paper we focus on the latter. Our approach is further able to ‘decompose’ the noise into its different statistical components, each corresponding to one particular noise distribution in the data. The authors are not aware of any existing method dealing with such a feature.

## 4 Well-posedness of the TV-IC model

Thanks to Proposition 2.2 and Proposition 2.4, we can conclude that in both cases (2.2) and (2.4), the function is well-defined and that for every there exists a unique element minimising the functional . Problem (TV-ICa) can then be rewritten as

 minu∈BVΩ)∩A{J(u):=|Du|(Ω)+λ1\leavevmode\nobreak Φ1(v∗(u))+λ2\leavevmode\nobreak Φ2(u,f−v∗(u))}, (4.1)

where is the unique solution of (TV-ICb) in one of the two cases (2.2) and (2.4). In particular, for every , there is a positive finite constant such that

 ∥v∗(u)∥L2(Ω)≤C(u). (4.2)

Note that the constant in (4.2) may depend on and hence does not necessarily bound uniformly. For the following existence proof, therefore, we restrict the admissible set of solutions for by intersecting it with the closed ball in with fixed radius , centred in . That is, in (TV-ICb) we consider a new admissible set

 ^B:=L2(Ω)∩B∩BR(f), (4.3)

where for some and is defined as before in each case. Since is compact and convex, the well-posedness properties studied in Section 2 still hold true. In addition, one can now easily compute the constant of (4.2) by Young’s inequality. Namely, for every one has now

 ∥v∗(u)∥L2(Ω)≤C, (4.4)

with . This additional assumption is reasonable for our applications since we want the noise component to preserve some similarities with the given image in terms of its noise features.

After this modification, we can now state and prove the main well-posedness result or the salt & pepper-Gaussian noise combination described in Section 2.1 and the Gaussian-Poisson model in Section 2.2 by means of standard tools of calculus of variations. We refer the reader also to [58] and [10] where similar results are proved for and Kullback-Leibler-type data fidelities, respectively.

###### Theorem 4.1.

Let and let us denote with defined in (4.3) the solution of the minimisation problem (TV-ICb) in one of the two cases (2.2) and (2.4) provided by Propositions 2.2 and 2.4, respectively, for every . Then, there exists a unique solution of the minimisation problem (4.1).

###### Proof.

Let be a minimising sequence for . Such sequence exists since the functional is nonnegative. Neglecting the positive contribution given by , we have:

 |Dun|(Ω)+λ2\leavevmode\nobreak Φ2(un,f−v∗(un))≤|Dun|(Ω)+Φλ1,λ2(un,f)≤M,%foralln (4.5)

for some finite constant . To show the uniform boundedness of the sequence in we first observe that using the positivity of , (4.5) we have

 |Dun|(Ω)≤M, for all n. (4.6)

Next, in order to get appropriate bounds for in we need to differentiate the two cases considered.

• Gaussian-salt & pepper case: For , we get from (4.5) that by Young’s inequality:

 M≥Φ2(