Regularized inversion and white Gaussian noise

# Analysis of regularized inversion of data corrupted by white Gaussian noise

Hanne Kekkonen, Matti Lassas and Samuli Siltanen Department of Mathematics and Statistics P.O. Box 68, 00014 University of Helsinki FINLAND
###### Abstract.

Tikhonov regularization is studied in the case of linear pseudodifferential operator as the forward map and additive white Gaussian noise as the measurement error. The measurement model for an unknown function is

 m(x)=Au(x)+δε(x),

where is the noise magnitude. If was an -function, Tikhonov regularization gives an estimate

 Tα(m)=argminu∈Hr{∥Au−m∥2L2+α∥u∥2Hr}

for where is the regularization parameter. Here penalization of the Sobolev norm covers the cases of standard Tikhonov regularization () and first derivative penalty ().

Realizations of white Gaussian noise are almost never in , but do belong to with probability one if is small enough. A modification of Tikhonov regularization theory is presented, covering the case of white Gaussian measurement noise. Furthermore, the convergence of regularized reconstructions to the correct solution as is proven in appropriate function spaces using microlocal analysis. The convergence of the related finite-dimensional problems to the infinite-dimensional problem is also analysed.

###### Key words and phrases:
Keywords: Regularization, inverse poblem, white noise, pseudodifferential operator

## 1. Introduction

### 1.1. Discrete and continuous regularization

Consider the following continuous model for indirect measurements:

 (1.1) m=Au+noise,

where the data and the quantity of interest are real-valued functions of real variables and is a bounded linear operator. A large class of practical measurements can be modelled by operators arising from partial differential equations of mathematical physics. We focus on ill-posed inverse problems where does not have a continuous inverse.

Physical measurement devices produce a discrete data vector , which we model by adding a linear operator to (1.1):

 (1.2) m:=Pk(Au)+Pk(noise).

Furthermore, practical solution of the inverse problem calls for a discrete representation of the unknown . This can be done using some computationally feasible approximation of the form , for example Fourier series truncated to terms. The practical inverse problem is now

 (1.3)

We study the most common computational appoach to (1.3), namely classical Tikhonov regularization defined by

 (1.4) Tα(m):=argminu∈Rn{∥Au−m∥22+α∥Lu∥22}.

Here is a matrix approximation to the operator , and is the regularization parameter. The matrix is used to introduce a priori information to the inversion. For example,

• , the identity matrix, models the a priori information that is not very large in norm.

• , where is a finite-difference first-order derivative matrix, models the a priori information that is continuously differentiable and or its derivative are not very large in square norm.

Our aim is to provide new analytic insight to the relationship between the continuous model (1.1) and practical inversion based on (1.4) in the case of Gaussian noise.

Note that the reconstruction given by (1.4) depends on both and . Practical computational inversion may involve modifying both of them: updating the measurement device changes the number of data points, and refining the computational grid in the hope of extra accuracy will increase . Furthermore, sometimes the most efficient numerical algorithm is based on a multigrid strategy, involving the computation of with several different .

Since there is a common continuous inverse problem behind the discrete model, it is desirable that the reconstruction converges to a meaningful limit as . Such convergence would also ensure that the dependency of on and is stable, at least for large enough values. Therefore, we discuss a continuous version of (1.4) based directly on the ideal model (1.1).

Under certain assumptions (including that should be an -function) the finite-dimensional problem (1.4) -converges as to the following infinite-dimensional minimization problem in a Sobolev space :

 (1.5) argminu∈Hr{∥m−Au∥2L2+α∥u∥2Hr}.

See Section 5 below for a proof. In (1.5) the case corresponds to (a) and corresponds, roughly, to (b) above. However, formula (1.5) only makes sense if the noise in (1.1) is square integrable. This brings us to the main topic of the paper: noise modeling.

### 1.2. Properties of white noise

Next we will give the definitions for the discrete and continuous white noise and describe the ’white noise paradox’ arising from the infinite -norm of the natural limit of white Gaussian noise in when .

We model the -dimensional noise in (1.2) as , where plays the role of noise amplitude. The vector is a realization of a -valued Gaussian random variable having mean zero and unit variance: . In terms of a probability density function we have

 (1.6) πE(k)(E)=cexp(−12∥E∥22),E∈Rk,∥E∥2=(k∑j=1E2j)1/2.

The appearance of in (1.6) is the reason why square norm is used in the data fidelity term of (1.4). The above noise model is appropriate for example for photon counting under high radiation intensity, see e.g. [27, 48].

Let us relate the above to the continuous model (1.1). We take and to be functions defined on a closed, compact -dimensional manifold , and the operator to be a pseudodifferential operator (DO). Furthermore, the noise in (1.1) is modelled as , where is the noise amplitude and is a realization of normalised Gaussian white noise .

Rigorous treatment of white noise on is based on generalized functions (distributions). We denote the pairing of a distribution and a test function by . Let be a probability space. A random generalized function , where and , on is a measurable map . Below, following the tradition used in study of stochastical processes, we often omit the variable and just denote a random generalized function by .

White noise is a random generalized function on such that the inner products are Gaussian random variables for all , , and

 (1.7) E(⟨W,ϕ⟩⟨W,ψ⟩)=⟨ϕ,CWψ⟩L2(N)for ϕ,ψ∈C∞(N).

The covariance operator of Gaussian white noise is the identity operator. Then can be considered as a function where is the probability space. A realization of is the generalized function on with a fixed .

Below, we consider the case when

 (1.8) Pk(f)=(⟨f,ϕj⟩)kj=1,

where are such that is an orthogonal basis in . Then with e as above. For example, when is a -dimensional torus, can be the truncation of the Fourier series. See Section 5 for a detailed discussion on discrete and continuous noise models.

Now we can state the main motivation behind this study. The probability density function of is often formally written in the form

 (1.9) πW(w)=cexp(−∥w∥2L2(N)/2).

However, despite formula (1.9), the realizations of the white Gaussian noise are almost surely not in . Thus we cannot use formula (1.5) when the error in the measurement is white Gaussian noise. Let us illustrate this “white noise paradox” by a simple example.

Example 1. Let be normalized Gaussian white noise defined on the -dimensional torus . The Fourier coefficients of are normally distributed with variance one, that is, , where and . Hence

 E∥W∥2L2(Td)=∑→ℓ∈ZdE|⟨W,e→ℓ⟩|2=∑→ℓ∈Zd1=∞.

This implies that with probability zero. However, when

 (1.10) E∥W∥2Hs(Td)=∑→k∈Zd(1+|→ℓ|2)sE|⟨W,eℓ⟩|2<∞

and hence takes values in almost surely (that is, with probability one).

On the other hand [46, Theorem 2] implies that if almost surely then which yields . This concludes that the realisations of white noise are almost surely in the space if and only if . In particular for the function is in only when where .

Even though the previous example is proven in we note that the same result is valid in all open bounded subsets .

### 1.3. Main result

Let us again consider a general closed -dimensional Riemannian manifold and let be the Laplace operator on . Furthermore, let be a pseudodifferential operator. Consider the following measurement model:

 (1.11) m=Au+δε,

where with is a realization of white noise.

The pseudodifferential operator can be, for example,

 Au(x)=∫NA(x,z)u(z)dz

where and in an open neighbourhood of the diag, we have

 A(x,z)=b(x,z)dg(x,z)p,(x,z)∈U

where is a distance function, , and . In this case is a pseudodifferential operator of order .

Let us now modify formula (1.5) to arrive at something useful for white Gaussian noise. Expand the data fidelity term like this: . Simply omitting the “constant term” leads to the definition

 (1.12) Tα(m):=argminu∈Hr(N){∥Au∥2L2(N)−2⟨m,Au⟩+α∥u∥2Hr(N)},

where we can interpret as a suitable duality pairing instead of inner product. When is a pseudodifferential operator of order , we can define .

It is well-known that the solution of the finite-dimensional problem (1.4) can be calculated using the following formula:

 (1.13) Tα(m)=(ATA+αLTL)−1ATm.

The regularized solution of the continuous problem (1.12) is

 (1.14) Tα(m)=(A∗A+α(I−Δ)r)−1A∗m.

The regularization parameter is chosen to be a function of the noise amplitude: where is a constant and . We will now formulate the main theorem of this paper, concerning the continuous regularized solution (1.14).

###### Theorem 1.

Let be a -dimensional closed manifold and with . Here . Let with some and consider the measurement

 (1.15) mδ=Au+δε,

where , is an elliptic pseudodifferential operator of order on the manifold with and . Assume that is injective. The regularization parameter is chosen to be where is a constant and .

Take . Then the following convergence takes place in norm:

 limδ→0Tα(δ)(mδ)=u.

Furthermore, we have the following estimates for the speed of convergence:

• If then

 ∥Tα(δ)(mδ)−u∥Hs1≤Cmax{δκ(r−ζ)2(t+r),δ}.
• If then

 ∥Tα(δ)(mδ)−u∥Hs1≤Cmax{δκ(r−ζ)2(t+r),δ1+κ(s−t−s1)2(t+r)}.

Above we have .

Notice that in case (i) and in case (ii) . The different convergence speeds (i) and (ii) show the trade-off between smoothness of the space and the speed of convergence. In case (i) we get better convergence rates but in case (ii) we can use a stronger norm. In section 4 we give two counterexamples to show that even though and the regularized solution does not converge to the real solution in norm.

### 1.4. Literature review

There are two main ways in inverse problems literature for modelling noise. The first approach based on the deterministic regularization techniques is to assume that the noise is deterministic and small. In that case one has a norm estimate of the noise and can study what happens when . This approach was originated by Tikhonov [51, 52], and studied in depth in [5, 11, 17, 24, 42, 40, 53]. The second approach to handling the noise is based on statistical point of view. The statistical modeling of noise in the inverse problems started in the early papers of [14, 15, 49, 50] and it is notable that with this approach one needs not assume smallness of the noise. For some recent references of the frequentist view of statistical problems see [3, 19, 35, 39]. Another statistical way to study inverse problems with random noise is based on Bayesian approach where and are considered to be realizations of random variables, see [6, 18, 22, 28, 29, 30, 31, 32, 33, 43, 44, 45].

The deterministic regularization and statistical approaches differ both in assumptions and techniques. This paper aims to bridge the gap between them. Our results are closely related to earlier studies of Eggermont, LaRiccia, and Nashed [8, 9, 10], who studied weakly bounded noise. They assume that the noise is a -function and discuss regularization techniques when the noise tends to zero in the weak topology of . This kind of relaxed assumption of noise covers small low frequency noise and large high frequency noise. However, even though tends to zero in weak sense as when is a realization of the normalized white noise, this type of noise lies outside the definition of the weakly bounded noise as is not almost surely -valued.

A related approach of smoothing the noise before the analysis is described in [37, 38]. A similar regularization method where no smoothness of the operator is assumed, but instead the regularization method is modified, is studied in [7]. Another possible approach to deal with white noise is to first perform a data projection step and then proceed to Tikhonov regularization [26, 25]. Also, Hohage and Werner have earlier studied inverse problems taking into account the fact that white noise is not square-integrable in [20].

Our new results are different from all of those previous studies. Our approach aims to study the effect of the continuous white noise not being an function in Tikhonov regularisation (1.5) instead of modifying the problem by altering the regularisation method or assumptions.

## 2. Analysis of the translation-invariant case

Before giving the general proof of Theorem 1 in Section 3 we motivate the proof by proving a similar kind of lemma for translation-invariant case.

The regularized solution we are studying is of the form

 Tα(δ)(m):=argminu∈Hr(Td){∥Au∥2L2(Td)−2⟨m,Au⟩+α∥(I−Δ)r/2u∥2L2(Td)},

where , for some constant and . As mentioned before solution to this is

 (2.1) Tα(δ)(m)=(A∗A+α(I−Δ)r)−1A∗m.

Let us consider the case when , where is a constant, and is an elliptic pseudodifferential operator of order that commutes with translations. Then, in we have that . As and commute with translations they are Fourier multipliers,

 ˆAu(n)=a(n)ˆu(n)

and since is elliptic there is so that

 c1|n|−t≤|a(n)|≤c2|n|−t,for |n|>n0.

The symbol of is

 zδ(n)=|a(n)|2+α0δ2(1+n2)r

and thus

 zδ(n)≥max(|a(n)|2,α0δ2(1+n2)r).

If and

 zδ(n) ≥ |a(n)|2(1−β)(α0δ2(1+n2)r)β ≥ c3|n|−2(1−β)t+2rβδ2βαβ0.

Now when we have

 ε∈Hs(Td).

Thus writing

 (2.2)

we see that

 Tα(δ)(mδ) = vδ+wδ

where

 ˆvδ(n)=1zδ(n)|a(n)|2ˆu(n), ˆwδ(n)=1zδ(n)¯¯¯¯¯¯¯¯¯¯a(n)ˆε(n)δ.

Here,

 ∣∣∣1zδ(n)|a(n)|2∣∣∣≤1,limδ→01zδ(n)|a(n)|2=1

and thus if by dominated convergence theorem

 limδ→0vδ=u,in Hr(Td).

Above the limit speed of convergence can be analysed using the standard regularization theory [11] and the fact that

 vδ = (A∗A+α0δ2(I−Δ)r)−1A∗Au = u−α0δ2(A∗A+α0δ2(I−Δ)r)−1(I−Δ)ru.

We can use the fact that and write

 ∥Z−1/2δ(I−Δ)ru∥L2 ≤ (α0δ2)−1/2∥(I−Δ)r/2u∥L2 ≤ (α0δ2)−1/2∥u∥Hr.

We also have the inequality . When we can define and so that, , . We get

 ∥Z−1δ (I−Δ)ru∥L2=∥Z−γ−η−1/2δ(I−Δ)ru∥L2 ≤(α0δ2)−1/2∥Z−γ−ηδ(I−Δ)r/2u∥L2 ≤(α0δ2)−1/2∥(c1(I−Δ)−t)−γ(α0δ2(I−Δ)r)−η(I−Δ)r/2u∥L2 ≤c−γ1(α0δ2)−η−1/2∥(I−Δ)r/2u∥L2 ≤c−γ1(α0δ2)−η−1/2∥u∥Hr.

Hence we obtain

 ∥α0δ2Z−1δ(I−Δ)ru∥L2≤c−γ1(α0δ2)1/2−η∥u∥Hr=c−γ1δrt+r∥u∥Hr.

On the other hand,

 ∣∣∣1zδ(n)¯¯¯¯¯¯¯¯¯¯a(n)δ∣∣∣ ≤ 1c3|n|−2(1−β)t+2rβδ2βαβ0c2|n|−tδ ≤ c4|n|(1−2β)t−2rβδ1−2βα−β0.

Hence

 ∥wδ∥Hs1(Td)≤c5δ1−2β

where . Because we proved the convergence of in we have to have . This is true at least when . Thus adding the above results together we can formulate the next lemma.

###### Lemma 2.

Let , , be Gaussian distributed, , , and

 mδ=Au+δε

where , , is an elliptic pseudodifferential operator of order that commutes with translations. We assume that . Then for the regularized solution

 Tα(δ)(m)=(A∗A+α(I−Δ)r)−1A∗m

of we have

 limδ→0Tα(δ)(mδ)=u,in % Hs1(Td)

where and . Furthermore we have the following estimate of the speed of convergence

 ∥Tα(δ)(mδ)−u∥Hs1≤Cmax{δrt+r,δ1−2β}.

Proof. The convergence is immediate consequence of the above results. For the convergence speed we get

 ∥Tα(δ)(mδ)−u∥Hs1 = ∥−α0δ2Z−1δ(I−Δ)ru+wδ∥Hs1 ≤ ∥α0δ2Z−1δ(I−Δ)ru∥L2+∥wδ∥Hs1 ≤ C1(α0δ2)1/2−η+C2δ1−2βα−β0 ≤ C3max{δrt+r,δ1−2β}

where and .

## 3. Proof of the main theorem

Here we study the general case where is an elliptic pseudodifferential operator of order . We denote and where is a closed manifold and . As in the previous example we have

 (3.1) Tα(δ)(mδ)=Z−1δA∗Au+Z−1δA∗(δε)=u−αZ−1δ(I−Δ)ru+Z−1δA∗(δε)

where .

First we will show that is invertible. We define as the adjoint of an operator . We assume that is one-to-one. If then

 0=⟨A∗Au,u⟩L2=⟨Au,Au⟩L2=∥Au∥2L2

which implies and furthermore . Thus the operator is one-to-one.

Next we recall the fact that an elliptic operator is a Fredholm operator and ([21] Theorem 19.2.1). Indeed index of a Fredholm operator is

 (3.2) index(B)=dim(Ker B)−dim(Coker B).

If is compact and is the adjoint of the operator then

Define as an extension of and show that for all . Define

 P=(I−Δg)s/2Bs(I−Δg)−s/2:L2(N)→H2t(N).

We can write where is compact. Now

 index(B0) = index(P) = index(I−Δg)s/2+index(Bs)+index(I−Δg)−s/2 = index(Bs).

Because is compact we can write

and hence we see that . Using this, the knowledge that is one-to-one and (3.2) we get

 0=dim(Ker B)=dim(Coker B)

which means that is also onto. Thus we have shown that there exist .

Next we will examine DOs that depend on spectral variable . For the general theory see [47]. The symbol class consist of the functions such that

1. for every fixed and

2. for arbitrary multi-indices and and for any compact set there exist constants such that

 |∂αξ∂βxa(x,ξ,λ)|≤Cα,β,K(1+|ξ|+|λ|1/p)m−|α|

for , and .

We consider the pseudodifferential operators depending on the parameter . To define such operators, one considers local coordinates of the manifold , where we emphasize that the set does not need to be connected (see [47, Sect. I.4.3]). A bounded linear operator , depending on the parameter , is a pseudodifferential operator with spectral variable if for any local coordinates of manifold , , there is a symbol such that for we have

 (Aλu)(Y−1(x))=∫V×Rdei(x−y)⋅ξa(x,ξ,λ)u(Y−1(y))dydξ,x∈V,

where . In this case we will write

 Aλ∈Ψmp(N,R+),

and say that in local coordinates the operator has the symbol . If for all compact sets there are constants such that the symbol satisfies

 C1(|ξ|+|λ|1/p)m≤|a(x,ξ,λ)|≤C2(|ξ|+|λ|1/p)m,

for and , we say that is hypoelliptic with parameter and denote . We will denote by the class of DOs depending on the parameter whose symbol in all local coordinates belongs in

We want to prove that

 Fλ=(A∗A)−1(I−Δ)r+λI

is invertible. Operator is elliptic since is elliptic and . Denote and its symbol . Then for the symbol of the operator we have in compact subsets of any local coordinates

 |∂αξ∂βx(q(x,ξ)+λ)|≤Cα,β,K(1+|ξ|+|λ|1/(2(t+r)))2(t+r)−|α|, x∈K.

By ([47] Theorem 9.2.) there exist such that for the operator is invertible with

 F−1λ∈HΨ−2(t+r)2(t+r)(N,[R,∞)).

Now we have shown that the operator can be rewritten

 (3.3) Z−1δ=λ((A∗A)−1(I−Δ)r+λ)−1(A∗A)−1

where .

We denote by the norm of where . We have the following norm estimates for when and large enough

 (3.4) ∥Fλ∥s,s−ℓ≤Cs,l(1+