On the lifting of deterministic convergence rates for inverse problems with stochastic noise

On the lifting of deterministic convergence rates for inverse problems with stochastic noise

Abstract

Both for the theoretical and practical treatment of Inverse Problems, the modeling of the noise is a crucial part. One either models the measurement via a deterministic worst-case error assumption or assumes a certain stochastic behavior of the noise. Although some connections between both models are known, the communities develop rather independently. In this paper we seek to bridge the gap between the deterministic and the stochastic approach and show convergence and convergence rates for Inverse Problems with stochastic noise by lifting the theory established in the deterministic setting into the stochastic one. This opens the wide field of deterministic regularization methods for stochastic problems without having to do an individual stochastic analysis for each problem.

In Inverse Problems, the model of the inevitable data noise is of utmost importance. In most cases, an additive noise model

(1)

is assumed. In (1), is the true data of the unknown under the action of the (in general) nonlinear operator ,

(2)

and in (1) corresponds to the noise. The spaces are assumed to be Banach- or Hilbert spaces. When speaking of Inverse Problems, we assume that (2) is ill-posed. In particular this means that solving (2) for with noisy data (1) is unstable in the sense that “small” errors in the data may lead to arbitrarily large errors in the solution. Hence, (1) is not a sufficient description of the noise. More information is needed in order to compute solutions from the data in a stable way. In the deterministic setting, one assumes

(3)

for some where is an appropriate distance functional. Typically, is induced by a norm such that (3) reads . Here and further on we use the superscript to indicate the deterministic setting. Solutions of (2) under the assumption (1),(3) are often computed via a Tikhonov-type variational approach

(4)

where again is a distance function and is the penalty term used to stabilize the problem and to incorporate a-priori knowledge into the solution. The regularization parameter is used to balance between data misfit and the penalty and has to be chosen appropriately. The literature in the deterministic setting is rich, at this point we only refer to the monographs [30, 12, 23] for an overview.

The deterministic worst-case error stands in contrast to stochastic noise models where a certain distribution of the noise in (1) is assumed. We shall indicate the stochastic setting by the superscript . In this paper, will be the parameter controlling the variance of the noise. Depending on the actual distribution of , may be arbitrarily large, but with low probability. A very popular approach to find a solution of (2) is the Bayesian method. For more detailed information, we refer to [25, 33, 31, 34, 7]. In the Bayesian setting, the solution of the Inverse Problem is given as a distribution of the random variable of interest, the posterior distribution , determined by Bayes formula

(5)

That is, roughly spoken, all values are assigned a probability of being a solution to (2) given the noisy data . In (5), the likelihood function represents the model for the measurement noise whereas the prior distribution represents a-priori information about the unknown. The data distribution as well as the normalization constants are usually neglected since they only influence the normalization of the posterior distribution. In practice one is often more interested in finding a single representation as solution instead of the distribution itself. Popular point estimates are the conditional expectation (conditional mean, CM)

(6)

and the maximum a-posteriori (MAP) solution

(7)

i.e., the most likely value for under the prior distribution given the data . Both point estimators are widely used. The computation of the CM-solution is often slow since it requires repeated sampling of stochastic quantities and the evaluation of high-dimensional integrals. The MAP-solution, however, essentially leads to a Tikhonov-type problem. Namely, assuming and , one has

analogously to (4).

Also non-Bayesian approaches for Inverse Problems often seek to minimize a functional (4), see e.g. [24, 2] or use techniques known from deterministic theory such as filter methods [5, 3]. Finally, Inverse Problems appear in the context of statistics. Hence, the statistics community has developed methods to solve (2), partly again based on the minimization of (4). We refer to [13] for an overview.

In summary, Tikhonov-type functionals (4) and other deterministic methods frequently appear also in the stochastic setting. From a practical point of view, one would expect to be able to use deterministic regularization methods for (2) even when the noise is stochastic. Indeed, the main question for the actual computation of the solution, given a particular sample of noisy data , is the choice of the regularization parameter. A second question, mostly coming from the deterministic point of view, is the one of convergence of the solutions when the noise approaches zero. In the stochastic setting these questions are answered often by a full stochastic analysis of the problem. In this paper we present a framework that allows to find appropriate regularization parameters, prove convergence of regularization methods and find convergence rates for Inverse Problems with a stochastic noise model by directly using existing results from the deterministic theory.

The paper takes several ideas from the dissertation [20], which is only publicly available as book [21] and not published elsewhere. It is organized as follows. In Section 1 we discuss an issue occurring in the transition from deterministic to stochastic noise for infinite dimensional problems. The Ky Fan metric, which will be the main ingredient of our analysis, and its relation to the expectation will be introduced in Section 2. We present our framework to lift convergence results from the deterministic setting into the stochastic setting in Section 3. Examples for the lifting strategy are given in Section 4.

1 On the noise model

Before addressing the convergence theory, we would like to discuss stochastic noise modeling and its intrinsic conflict with the deterministic model. Here and throughout the rest of the paper, assume

(8)

to be a complete probability space with a set of outcomes of the stochastic event, the corresponding -algebra and a probability measure, . We restrict ourselves here to probability measures for the sake of simplicity. Extensions to more general measures are straightforward. In the Hilbert-space setting, the noise is typically modeled as follows, see for example [29, 30, 3, 27]. Let be a stochastic process. Then for

(9)

defines a real-valued random variable. Assuming that

(10)

for all and that this expectation is continuous in ,

defines a continuous, symmetric nonlinear bilinearform. In particular, there exists the covariance operator

with

For the stochastic analysis of infinite dimensional problems via deterministic results, (9) is problematic. Namely, if is an orthonormal basis in , the set consists of infinitely many identically distributed random variables with [30]. Thus

(11)

is almost surely infinite and a realization of the noise is an element of the Hilbert space with probability zero. Let us take the common example of Gaussian white noise which can be modeled via the above construction. Namely, with

and the covariance operator

where is the identity and the variance parameter, the Gaussian white noise is described [30, 27]. As consequence of (11) and explained for example in [27], a realization of such a Gaussian random variable is with probability zero an element of an infinite dimensional -space. It is therefore inappropriate to use an -norm for the residual in case of an infinite dimensional problem. Since in this case a realization of Gaussian white noise only lies (almost surely) in any Sobolev space with where is the dimension of the domain, one should adjust the norm for the residual accordingly. Except for the paper [27] this issue seems not to have been addressed in the literature. A main reason for this might be that for the practical solution of the Inverse Problem this is not a severe issue since in reality the measurements are finite dimensional and, in order to use a computer to solve the problem, a finite dimensional approximation of the unknown object has to be used. In this case the sum in (11) is finite and the noise lies almost surely in the finite dimensional space. However, difficulties arise whenever one seeks to investigate convergence of the discretized problem to its underlying infinite dimensional problem. We will not address this issue and assume throughout the whole work that or use the slightly weaker bound on the Ky-Fan metric (see Section 2). In order to handle the Ky Fan metric we need to be able to evaluate probabilities , , which is only meaningful if . Assuming that is finite dimensional, then this is clear. For infinite dimensional problems, however, we have to assume that the noise is smooth enough for the sum in (11) to converge. Examples for this are Brownian noise (-noise) or pink noise (-noise), see e.g. [14, 26]. At this point we would also like to mention that as a consequence of our rather generic noise model we might not make use of some specific properties of the noise as would be possible when focusing on a particular distribution of the noise. However, we are able to show convergence for a large variety of regularization methods.

2 The Ky Fan metric

In order to measure the magnitude of the stochastic noise and the quality of the reconstructions, we need metrics that incorporate the stochastic nature of the problem. One such metric, which will be the the main tool for our stochastic convergence analysis, is the Ky Fan metric (cf. [28]). It is defined as follows.

Definition 2.1.

Let and be random variables in a probability space with values in a metric space . The distance between and in the Ky Fan metric is defined as

(12)

We will often drop the explicit reference to . This metric essentially allows to lift results from a metric space to the space of random variables as the connection to the deterministic setting is inherent via the metric used in its definition. The deterministic metric is often induced by a norm . We will implicitly assume that equation (2) is scaled appropriately since by definition. Note that one can use definition (12) also if is a more general distance function than a metric. Then the construction (12) itself is no longer a metric, however, the techniques used in later parts of the paper can readily be expanded to this setting.

An immediate consequence of (12) is that if and only if almost surely. Convergence in the Ky Fan metric is equivalent to convergence in probability, i.e., for a sequence and one has

Hence convergence in the Ky Fan metric also leads to pointwise (almost sure) convergence of certain subsequences in the metric [10].

A somewhat more intuitive and more frequently used stochastic metric is the expectation, or more general, a (stochastic) metric. For random variables and with values in a metric space ,

defines the -th moment of for , assuming the existence of the integral. We will use and refer to it as convergence in expectation. Note that since the variance is defined as

one always has

(13)

We will show later that for parameter choice rules the expectation of the noise has to be slightly overestimated, hence estimating via the popular and often easier to compute -norm with (13) is not problematic.

While the main part of our analysis is based on the description of the noise and the reconstruction quality in the Ky Fan metric, we will also allow the expectation as measure of the stochastic noise and partially show convergence of the reconstructed solutions in expectation. To this end, we comment in the following on the connection between those two metrics.

It is well-known that convergence in expectation implies convergence in probability, see for example [10]. Hence, convergence in the Ky Fan metric is implied by convergence in expectation (and also by convergence of higher moments). Namely, with Markovs inequality one has, for an arbitrary nonnegative random variable with and

(14)

Under an additional assumption one can show conversely that convergence in probability implies convergence in expectation. We have the following definition.

Definition 2.2 ([6], Definition A.3.1.).

Let be a complete probability space. A family is called uniformly integrable if

Theorem 2.1 ([6], Theorem A.3.2.).

Let be a sequence convergent almost everywhere (or in probability) to a function . If the sequence is uniformly integrable, then it converges to in the norm of .

From a practical point of view, uniform integrability of a sequence of regularized solutions to an Inverse Problem is a rather natural condition. Since Inverse Problems typically arise from some real-world application, it is to be expected that the true solution is bounded. For example, in Computer Tomography, the density of the tissue inside the body cannot be arbitrarily high. Although for an Inverse Problem with a stochastic noise model, boundedness of the regularized solutions can not be guaranteed due to the possibly huge measurement error, one can enforce the condition from a priori knowledge of the solution.

Assume that the true solution fulfills and globally for some fixed . Under this assumption, let be a sequence of regularized solution for noisy data with variance . Let and define

(15)

Then the sequence is uniformly integrable. In other words, by discarding solutions that must be far away from the true solution in regard of a priori knowledge, convergence in the Ky Fan metric implies convergence in expectation.

To close this section, let us remark on the computation of the Ky Fan distance. It can be estimated via the moments of the noise.

Theorem 2.2.

Let be random variables in a complete probability space and for some . Then

(16)
Proof.

One has, due to Markov’s inequality (14) and the monotonicity of the mapping for ,

for . Solving for yields the claim. ∎

Note that even if moments exist for all

see [20, 15], due to the tail of the distributions. In the Gaussian case, a direct estimate has been derived in [32, 22]. We present it in Proposition 3.6.

3 Convergence in the stochastic setting

3.1 Deterministic Inverse Problems with stochastic noise

As mentioned previously, the intention of this paper is to show convergence for Inverse Problems under a stochastic noise model using results from the deterministic setting. Assume we have at hand a deterministic regularization method of our liking for the solution of (2) under the noisy data (1) where now for some . By regularization method we understand (possibly nonlinear) mappings

(17)

where is the regularized solution to the regularization parameter given the data . Often, is obtained via the minimization of functionals of the type (4). In order to deserve the name regularization we require to fulfill

(18)

under a certain choice of the regularization parameter chosen either a priori or a posteriori . In our notation is the true solution, usually the minimum norm solution with respect to the penalty in (4), i.e.,

Note that, in particular for nonlinear problems, does not need to be unique. In [20, 21] it was pointed out that this is problematic for the lifting arguments. A standard argument in the deterministic theory is to prove convergence of subsequences to the desired solution, and then deduces convergence of the whole series of regularized solutions, if possible. In the stochastic setting, this is not possible in general since subsequences for different do not have to be related. A constructed example for this behavior can be found in Section 4.1. of [20, 21]. In order to lift general deterministic regularization methods into the stochastic setting we must therefore require that is unique. We formulate our convergence results assuming the noise to be bounded with respect to the Ky fan metric or in expectation. As we will see, in the latter case we have to “inflate” the expectation for decreasing variance in order to obtain convergence. For the analysis we mainly use a lifting argument using deterministic theory. In [20, 21, Theorem 4.1], it was proved how by means of the Ky Fan metric deterministic results can be lifted to the space of random variables for nonlinear Tikhonov regularization. Since the theorem is based solely on the fact that there is a deterministic regularization theory and that the probability space can be decomposed into a part where the deterministic theory holds and a small part where it does not, it is easily generalized. Before we state the Theorem, we need the following Lemmata.

Lemma 3.1.

([11], see also [10]) Let be a complete probability space. Let and be measurable functions from into a metric space with metric . Suppose for -almost all .Then for any there is a set with such that uniformly on , that is

Lemma 3.2 ([20, 21], Proposition 1.10).

Let be a sequence of random variables that converges to in the Ky Fan metric. Then for any and there exist , , and a subsequence with

Furthermore there exists a subsequence that converges to almost surely.

Proof.

We give a sketch of the proof for the first statement taken from [20, 21].
Set . By definition of the Ky Fan metric (12), for given , there exists a set with and such that . For arbitrary and we pick a subsequence with and introduce the set . One can check that . Since is a subset of every we have

which proves he first statement. The second one follows since convergence in Ky Fan metric is equivalent to convergence in probability, which itself implies almost-sure convergence of a subsequence, cf [10]. ∎

With this, we are ready for the convergence theorem which we shall split in two parts, one for the Ky Fan metric as error measure and one for the expectation.

Theorem 3.3.

Let be a regularization method for the solution of (2) in the deterministic setting under a suitable choice of the regularization parameter. Let now where is a stochastic error such that as . Then, assuming (2) has a unique solution and all necessary assumptions for the deterministic theory (except the bound on the noise) hold with probability one, the regularization method fulfills

under the same parameter choice rule as in the deterministic setting with replaced by . If the regularized solutions are defined by (15), then it holds that

Proof.

Denote . Define . (Note that due to the properties of the Ky Fan metric). We show in the following that for arbitrary we have and hence

As a first step we pick a “worst case” subsequence of , a subsequence for which the corresponding solutions satisfy . We now show that even from this “worst case” sequence we can pick a subsequence for which we have for arbitrary .
Let . According to Lemma 3.2 we can pick a subsequence and a set with as well as , arbitrarily small, on . For all , the noise tends to zero. We can therefore use the deterministic result with and deduce that converges to the unique solution for , where in the choice of the regularization parameter is substituted by . The convergence is not uniform in ; nevertheless, pointwise convergence implies uniform convergence except on sets of small measure according to Lemma 3.1. Therefore there exist , and such that and . We thus have

Since we split with , we have shown existence of a subsequence such that

for sufficiently small. This is by definition of the Ky Fan metric an upper bound for the distance between and . Therefore we have

On the other hand, the original sequence satisfied . Since it follows . Because was arbitrary, this implies , which concludes the proof of convergence in the Ky Fan metric. Convergence in expectation follows from Theorem 2.1 noting that by (15) the sequence of regularized solutions is uniformly integrable. ∎

Theorem 3.4.

Let be a regularization method for the solution of (2) in the deterministic setting under a suitable choice of the regularization parameter. Let now where is a stochastic error such that as . Then, assuming (2) has a unique solution and all necessary assumptions for the deterministic theory (except the bound on the noise) hold with probability one, the regularization method fulfills

under the same parameter choice rule as in the deterministic setting with replaced by where fulfills

(19)

If the regularized solutions are defined by (15) then it holds that

Proof.

As previously we pick a “worst case” subsequence of , a subsequence for which the corresponding solutions satisfy .
Let . We can now pick a subsequence which we again denote by fulfilling , where without loss of generality , such that

This again defines, via the complement in , with on which . As before, we can now apply the deterministic theory by substituting with . The remainder of the proof is identical to the one of Theorem 3.4. ∎

The theorems justify the use of deterministic algorithms under a stochastic noise model. Since the proof is solely based on relating the stochastic noise to a deterministic one on subsets of and does not use any specific properties of the regularization methods or the underlying spaces, it opens most of the deterministic methods for the a stochastic noise model. In particular, the parameter choice rules from the deterministic setting are easily adapted.

As usual in deterministic literature, the general convergence theorem is followed by convergence rates which are obtained under additional assumptions. Often these conditions ensure at least local uniqueness of the true solution. If not, we have to require such a property for the same reason as previously.

Theorem 3.5.

Let be a regularization method for the solution of (2) in the deterministic setting such that, under a set of assumptions on the operator and the solutions and a suitable choice of the regularization parameter,

with a monotonically increasing right-continuous function .

Let now where is a stochastic error such that

  • or

as . Then, assuming all necessary assumptions for the deterministic theory (except the bound on the noise) hold with probability one and that there is (either by the deterministic conditions or by additional assumption) a (locally) unique solution to (2), the regularization method fulfills

in case a) or, respectively, in case b),

under the same parameter choice rule as in the deterministic setting with replaced by (case a)) or where fulfills (19) (case b)).

Proof.

We start again with the Ky Fan distance as noise measure. Since we have the deterministic theory at hand, we know that whenever . With we have, since is monotonically increasing and right continuous,

and hence by definition .

If the expectation is used as measure for the data error, we have

by Markovs inequality. Hence, with probability we are in the deterministic setting with and

The convergence rate follows by the definition of the Ky Fan metric. ∎

For Inverse Problems, the convergence rates are most often given by functions which decay at most linearly fast, i.e.,

Hence in this case the convergence rates are preserved in the Ky Fan metric. For the expectation this is not the case. We have to gradually inflate the expectation by the parameter in order to obtain convergence (and rates). Let us discuss the simple example of Gaussian noise in the finite dimensional setting, i.e. from (1) consists of i.i.d. random variables with zero mean and variance . Then it has been shown in [15] that for any

(20)

with the gamma functions and defined as

In particular, (20) is independent of the variance . In order to to decrease the probability to zero, we therefore have to link with the variance. For Gaussian noise of the above kind the following estimate for the Ky Fan distance between true and noisy data has been given in [32].

Proposition 3.6.

Let be a random variable with values in . Assume that the distribution of is with . Then it holds in that

(21)

Recall that

(22)

see e.g. [15]. Comparing (21) and (22), one sees that and in particular the decay of slows down with decreasing . In other words, the artificial inflation we had to impose on the expectation is automatically included in the Ky Fan distance which we suppose is the reason why the convergence theory carries over in such a direct fashion for the Ky Fan metric.

For many nonlinear Inverse Problems the requirement of a unique solution is too strong. Often one has several solutions of the same quality, in particular there exists more than one minimum norm solution. In this case, Theorem 3.3 is not applicable. In the example [20, 21, Example 4.3 and 4.5] with two minimum norm solutions the noise was constructed such that, while the error in the data converges to zero, for each fixed the regularized solutions jump between both solutions such that no converging subsequence can be found. The main problem there is that the Ky Fan distance cannot incorporate the concept that all minimum norm solutions are equally acceptable. We will now define a pseudo metric that resolves this issue.

Definition 3.1.

Let be a metric space. Denote with the set of minimum-norm solutions to (2). Then

(23)

measures the distance between an element and the set , in particular it is

With this, one can define a pseudometric on via

(24)

Obviously (24) is positive, symmetric and fulfills the triangle inequality. However, does not imply a.e. but instead which fixes the aforementioned issue of the Ky Fan metric and allows the following theorems.

Theorem 3.7.

Let be a regularization method for the solution of (2) in the deterministic setting under a suitable choice of the regularization parameter. Let now where is a stochastic error such that

  • or

as . Then, assuming all necessary assumptions for the deterministic theory (except the bound on the noise) hold with probability one, the regularization method fulfills

under the same parameter choice rule as in the deterministic setting with replaced by (case a)) or where fulfills (19) (case b)). In particular, the series of regularized solutions fulfills

Proof.

The proof follows the lines of the one of Theorem 3.3 with replaced by . Also Lemma 3.1 is easily adjusted to incorporate multiple solutions. ∎

So far we assumed that only the noise is stochastic whereas the operator and the unknown were assumed to be deterministic. In [20, 21] general stochastic Inverse Problems

were considered. It was shown how deterministic conditions such as source conditions can be incorporated into the stochastic setting by assuming that the deterministic conditions hold with a certain probability. However, additional conditions may occur when lifting these in order to ensure the deterministic requirements up to a certain probability. Since this is easier seen given an example, we move the discussion of the complete stochastic formulation in the next section. Although we will address only one particular example, the technique can be applied to general approaches.

3.2 Fully stochastic Inverse Problems

Due to the possible multiplicity of stochastic conditions which might appear in this context it seems not possible to develop a lifting strategy in such a general fashion as in the previous section. We will therefore consider two classical examples, namely nonlinear Tikhonov regularization and Landweber’s method for nonlinear Inverse Problems. The theory is taken completely from [20, 21].

Nonlinear Tikhonov Regularization

We seek the solution of a nonlinear ill-posed problem (2) via the variational problem

with a reference point and given noisy data according to (1) where the stochastic distribution of the noise is assumed to be known. We shall skip the general convergence theorem (which follows as in the previous section) and move to convergence rates directly. In the deterministic theory, i.e. when is the noisy data with , we have the following theorem from [12].

Theorem 3.8.

Let be convex, such that and denote the -minimum norm solution of (2). Furthermore let the following conditions hold.

  • is Fréchet-differentiable

  • There exists such that in a sufficiently large ball

  • satisfies the source condition for some .

  • The source element satisfies .

Then for the choice with some fixed we obtain

(25)

As given in Theorem 4.6 of [20], the following stochastic formulation of Theorem 3.8 holds.

Theorem 3.9.

Let be convex, let be such that and denote the -minimum norm solution of (2) for almost all . Furthermore let the following conditions hold.

  • is Frechet-differentiable for almost all

  • satisfies

    in a sufficiently large ball