Blockwise SVD with error in the operator and application to blind deconvolution

Blockwise SVD with error in the operator and application to blind deconvolution

S. Delattre 111Université Denis Diderot Paris 7 and CNRS-UMR 7099, 175 rue du Chevaleret 75013 Paris, France. E-mail: sylvain.delattre@univ-paris-diderot.fr, dominique.picard@univ-paris-diderot.fr, thomas.vareschi@univ-paris-diderot.fr M. Hoffmann222ENSAE and CNRS-UMR 8050, 3 avenue Pierre Larousse, 92245 Malakoff Cedex, France. E-mail: marc.hoffmann@ensae.fr      D. Picard footnotemark: and T. Vareschi footnotemark:
Abstract

We consider linear inverse problems in a nonparametric statistical framework. Both the signal and the operator are unknown and subject to error measurements. We establish minimax rates of convergence under squared error loss when the operator admits a blockwise singular value decomposition (blockwise SVD) and the smoothness of the signal is measured in a Sobolev sense. We construct a nonlinear procedure adapting simultaneously to the unknown smoothness of both the signal and the operator and achieving the optimal rate of convergence to within logarithmic terms. When the noise level in the operator is dominant, by taking full advantage of the blockwise SVD property, we demonstrate that the block SVD procedure overperforms classical methods based on Galerkin projection [14] or nonlinear wavelet thresholding [18]. We subsequently apply our abstract framework to the specific case of blind deconvolution on the torus and on the sphere.

Keywords: Blind deconvolution; blockwise SVD; circular and spherical deconvolution; nonparametric adaptive estimation; linear inverse problems; error in the operator.
Mathematical Subject Classification: 62G05, 62G99, 65J20, 65J22.

1 Introduction

1.1 Motivation

Consider the following idealised statistical problem: estimate a function (a signal, an image) from data

(1.1)

where

is a linear operator between two Hilbert spaces and . The observation of the unknown is challenged by the action of the linear degradation as well as contaminated by an experimental Gaussian white noise on with vanishing noise level as . Alternatively, in a density estimation setting, we observe a random sample drawn from a probability distribution333In that setting, must therefore also be a probability density . with density . In each case, we do not know the operator exactly, but we have access to

(1.2)

where is a Gaussian white noise on thanks to preliminary experiments or calibration through trial functions. This setting has been discussed in details in [14, 18]. In this paper, we are interested in operators admitting a singular value decomposition (SVD) or a blockwise SVD. In essence, we know the typical eigenfunctions of but not the eigenvalues. We cover two specific examples of interest: spherical and circular deconvolution.

Spherical deconvolution.

Used for the analysis of data distributed on the celestial sphere, see Section 4.1 below. One observes a random sample with

where the are random elements in , the group of rotation matrices, and the are independent and identically distributed on the sphere , with common density with respect to the uniform probability distribution on . In this setting, if the have common density with respect to the Haar measure on , we have

We are interested in the case where the exact form is unknown. However, is block-diagonal in the spherical harmonic basis. ∎

Circular deconvolution.

Used for restoring signal or images, see Section 4.2 below. We take the space of square integrable functions on the torus (or ) appended with periodic boundary conditions. We have

The degradation process is characterised by the impulse response function which we do not know exactly. However, is diagonal in the Fourier basis. ∎

Although the problem of estimating is fairly classical and well understood when is known (a selected literature is [32, 8, 12, 1, 17, 29, 28] and the references therein), only moderate attention has been paid in the case of an unknown despite its relevance in practice. When the eigenfunctions of are known solely, we have the results of Cavalier and Hengartner [5], Cavalier and Raimondo [6] but they are confined to the case where the error in the operator is negligible . In a general setting with error in the operator, Efromovitch and Kolchinskii [14] and later Hoffmann and Reiß  [18] studied the recovery of when the eigenfunctions and eigenvalues of are unknown. In both contributions, a marginal attention is paid to the case of sparse or diagonal operators, but it is showed in both papers that unusual rates of convergence can be obtained when . In a univariate setting, Neumann [24] and Comte and Lacour [9] consider the case of deconvolution with an error density, known only through an auxiliary set of learning data. This formally corresponds to having in our setting. Minimax rates and adaptive estimators are derived in both regimes and . We address in the paper the following program:

  • Construction of a feasible procedure estimating from data (1.1) and (1.2) that achieves optimal rates of convergence (up to inessential logarithmic terms). We require to be adaptive with respect to smoothness constraints on and .

  • Identification of best achievable accuracy for under smoothness constraints on and so that the interplay between and can be explicitly related in the asymptotic and ; this includes the comparison with earlier results of [24, 14, 18] in the context of blockwise SVD.

  • Application to spherical deconvolution on or circular deconvolution on the torus; this includes the discussion of our findings in terms of the existing literature on the topic [7, 25, 22] and some practical aspects of numerical implementation.

1.2 Main results and organisation of the paper

In Section 2, we present an abstract framework that allows for operators to admit a so-called blockwise SVD. This property is simply turned into the existence of pairs of increasing finite dimensional spaces ( that are stable under the action of . The blockwise SVD property is further appended with a smoothness condition quantified by the arithmetic decay of the operator norm of and its inverse on (resp. ) (the so-called ordinary smooth assumption, see e.g. [32]). By means of a reconstruction formula, we obtain in Section 2.2 an estimator of by first inverting on with a thresholding tuned with and then filter the resulting signal by a block thresholding tuned with . As for i) and ii), we establish in Theorems 3.1 and 3.4 of Section 3 the minimax rates of convergence for Sobolev constraints on under squared error loss and we demonstrate that is optimal and adaptive to within logarithmic terms. The explicit interplay between and is revealed and discussed in the case of sparse operator when , completing earlier findings in [14, 18] and to some extent [24] in the univariate case for density deconvolution. In particular, we demonstrate that a certain parametric regime dominates when the smoothness of the signal dominates the smoothing properties of the operator. Concerning iii), the method is applied to the case of spherical and circular deconvolution in Section 4 where harmonic Fourier analysis enables to provide explicit blockwise SVD for the convolution operator. We illustrate the numerical feasability of and the phenomena that appear in the case . Section 5 is devoted to the proofs.

We choose to state and prove our results in the white Gaussian model generated by the observation of and defined by (1.1) and (1.2). The extension to the case of density estimation, when is replaced by the observation of a random sample of size drawn from the distribution , like for instance in [24, 9] is a bit more involved, due to the intrinsic heteroscedasticity that appears when enforcing a formal analogy with the Gaussian setting (1.1). It is briefly addressed in the discussion Section 3.2

2 Estimation by blockwise SVD

2.1 The blockwise SVD property

Let denote a family of linear operators

between two Hilbert spaces and that shall represent our parameter set of unknown .

A fundamental property (Assumption 2.1 below) is that an explicit singular value decomposition (SVD) or blockwise SVD is known for all simultaneously. More specifically, we suppose that there exist two explicitly known bases of and of , as well as a partition of with if , and a constant such that:

where means inequality up to a multiplicative constant that does not depend on . Here .

It is worthwhile to notice that in our examples as well as in the rates of convergence that we will exhibit later, plays the role of a dimension. In particular, corresponds to a ’standard SVD’, whereas creates blocks and deserves the name of ’blockwise’ SVD. However, there is no need in the paper to assume that is in . Set

The Galerkin projection of an operator onto is defined by , where is the orthogonal projector onto .

Assumption 2.1 (Blockwise SVD).

We further need to quantify the action of on . We denote by the operator norm of .

Assumption 2.2 (Spectral behaviour of ).

For every , is invertible and there exists such that

and

for every .

We associate with the bases and the following decompositions

where denotes the inner product either in or and the scale of Sobolev spaces

(2.1)

For , Assumption 2.2 implies that is continuous. In particular, when , the operator is ill-posed with degree , see for instance [26].

2.2 Blockwise SVD reconstruction with noisy data

Under Assumption 2.1 and 2.2, we have the reconstruction formula

(2.2)

By the observed blurred version of in (1.2), we obtain a family of estimators of from data (1.2) by considering the operator

(2.3)

where is a cutoff level, possibly depending on . Likewise, the coefficient can be estimated by

(2.4)

Mimicking the reconstruction formula (2.2) with the estimates (2.4) and (2.3), we finally obtain a (family of) estimator(s) of by setting

where

The procedure is specified by the maximal frequency level and the threshold levels

(2.5)

and

(2.6)

for some prefactors . The threshold rule we introduce in both the signal (with level ) and the operator (with level ) is inspired by classical block thresholding [21, 4, 3] and will enable to adapt with respect to the smoothness properties of both the signal and the operator , see below.

3 Main results

3.1 Minimax rates of convergence

We assess the performance of the estimator defined in Section 2.2 over the Sobolev spaces linked to the basis defined in (2.1). Define the Sobolev balls for and let

(3.1)

for with , , where the mapping constants are defined in Assumption 2.2.

Theorem 3.1 (Upper bounds).

Let be a class of operators satisfying Assumptions 2.1 and 2.2. Assume we observe given by (1.1) and (1.2), with and . Specify with

(3.2)

and as in (2.5) and (2.6). For sufficiently small and sufficiently large , for every , with and such that , we have

(3.3)

where means inequality up to a multiplicative constant that depends on and only.

The bounds for and are explicitly computable. In the model generated by in (1.1) and in (1.2), they depend on the dimension and on the absolute constants and of the concentration lemmas 5.3 and 5.6 below. However, they are in practice much too conservative, as is well known in the signal detection case [13] or the classical inverse problem case [1], see the numerical implementation Section 4.

Our next result shows that the rate achieved by is indeed optimal, up to logarithmic terms. The lower bound in the case is classical (Nussbaum and Pereverzev [26]) and will not decrease for increasing noise levels or whence it suffices to provide the case which formally corresponds to observing without noise while remains unknown.

Theorem 3.2 (Lower bounds).

In the same setting as Theorem 3.1, with in addition , assume we observe exactly and given by (1.2). For sufficiently small , we have

(3.4)

where means inequality up to a positive multiplicative constant that depends on and only.

Combining (3.3) together with (3.4) and the results of [26], we conclude that is minimax over to within logarithmic terms in and , and that this result is uniform over the nuisance parameter .

3.2 Discussion

The case of diagonal operators

It is interesting to notice that the condition in Theorem 3.2 excludes the case where is diagonal. In this particular case, considered especially in the deconvolution example of Section 4.2 below, a closer inspection of the proof of the upper bound shows that the rate

can be obtained (up to some extra logarithmic factors) as in the case where , which improves on the rate

provided by Theorem 3.1. This sheds some light on the role of the number . It is in fact twofolds: it acts as a ’dimension’ in the term ; in the term involving error in the operator , it reflects the distance to the diagonal case expanding from in the diagonal case, to in the case . It is very plausible, though beyond the scope of this paper, to express conditions on leading to rates of the form , with continuously varying from 0 to . Note that in the case , we recover the minimax rate of density deconvolution with unknown error as proved by Neumann [24], see also [9].

Relation to other works in the case of sparse operators

For an unknown signal with smoothness and unknown operator with degree of ill-posedness , the optimal rates of convergence are

(3.5)

up to inessential logarithmic terms. The exponents and are linked respectively to the error in the signal and the error in the operator . Efromovitch and Kolchinskii [14] established that under fairly general conditions on the operator , the optimal exponents are given by

They noted however that if certain sparsity properties on are moreover assumed (and that we shall not describe here, for instance if is diagonal in an appropriate basis) then the exponent is no longer optimal, while remains unchanged.

In the related context of operators acting on Besov spaces of functions with smoothness measured in -norm, Hoffmann and Reiß  [18] introduce an ad hoc hypothesis on the sparsity of the unknown operator (that we shall not describe here either), expressed in terms of the wavelet discretization of . They subsequently obtain new rates of convergence for a certain nonlinear wavelet procedure, and these rates overperform (3.5) as expected from the results by [14]. In particular, if one considers the estimation of , in the extreme case where the operator is diagonal in a wavelet basis, the procedure in [18] achieves the rate

(3.6)

up to extra logarithmic terms. We may compare our results with the rate (3.6). In our setting, if we pick as the Fourier basis described in Section 4.2, then we have . Assuming to be diagonal in the basis which is the exact counterpart of the approach of Hoffmann and Reiß  with being diagonal in a wavelet basis, then by Theorem 3.1, our estimator (nearly) achieves the rate

which already outperforms the rate (3.6) whenever the error in the signal is dominated by the error in the operator and is small compared to , as follows from the elementary inequality

The superiority of the blockwise SVD in this setting is explained by the fact that the wavelet procedure in [18] is agnostic to the diagonal structure of in the wavelet basis, in contrast to that takes full advantage of the block structure of . As already explained in the preceding section, one could actually improve further this result in the specific case of being diagonal in and show that (nearly) achieves the rate , thus deleting the ‘dimensional effect’ of for the error in the operator.

Adaptation over the scales and

The estimator is fully adaptive over the family of Sobolev balls (in the sense that does not require the knowledge of nor ). However, the knowledge of the degree of ill-posedness of is required through the choice of the maximal frequency in (3.2). This restriction can actually be relaxed further in dimension . Indeed, setting formally in (3.2), one readily checks that becomes adaptive over and simultaneously. In dimension however, setting in (3.2) is forbidden, but an alternative adaptivity result can be obtained by taking for some , in which case is fully adaptive over the scale and the restricted family .

Extension to density estimation

We briefly show a line of proof for extending Theorem 3.1 to the framework of density estimation. Suppose that instead of we observe a random sample drawn from assumed to a probability density. By analogy to (2.4), we have an estimator of replacing with

Writing

with , an inspection of the proof of Theorem 3.1 reveals that an extension to the density estimation setting carries over as soon as the vector satisfies a concentration inequality, namely

see (5.6) in Lemma 5.2. To that end, we may apply a concentration inequality by Bousquet [2] as developed for instance in Massart [23], Eq (5.51) p. 171. The precise control of this extension requires further properties on the basis and on the density via the behaviour of , see Eq. (5.52) p. 171 in [23]. We do not pursue that here.

4 Application to blind deconvolution

4.1 Spherical deconvolution

Scientific context.

A common challenge in astrophysics is the analysis of complex data sets consisting of a number of objects or events such as galaxies of a particular type or ultra high energy cosmic rays (UHECR) and that are genuinely distributed over the celestial sphere. Such objects or events are distributed according to a probability density distribution on the sphere, depending itself on the physics that governs the production of these objects or events. For instance, UHECR are particles of unknown nature arriving at the earth from apparently random directions of the sky. They could originate from long-lived relic particles from the Big Bang. Alternatively, they could be generated by the acceleration of standard particles, such as protons, in extremely violent astrophysical phenomena. They could also originate from Active Galactic Nuclei (AGN), or from neutron stars surrounded by extremely high magnetic fields. As a consequence, in some hypotheses, the underlying probability distribution for observed UHECRs would be a finite sum of point-like sources. In other hypotheses, the distribution could be uniform, or smooth and correlated with the local distribution of matter in the universe. The distribution could also be a superposition of the above. Identifying between these hypotheses is of primordial importance for understanding the origin and mechanism of production of UHECRs. The observations, denoted by , are often perturbated by an experimental noise, say , that lead to the deconvolution problem described in Section 1.1. Following van Rooj and Ruymgart [33], Healy et al. [16], Kim and Koo [22] and Kerkyacharian et al. [25], we assume the following model: we observe an -sample with

where the are distributed on the sphere , with common density with respect to the uniform probability distribution on and independent of the that have a common density with respect to the Haar probability measure on the group of rotation matrices. One proves in [16, 22] that the density of the is

(4.1)

and we are interested in the case where the exact form of the convolution operator is unknown, due for instance to insufficient knowledge of the device that is used to measure the observations. However, we assume approximate knowledge of through as defined in (1.2). ∎

Checking the blockwise SVD Assumptions 2.1 and 2.1.

We closely follow the exposition of [16, 22, 25] for an overview of Fourier theory on and in order to establish rigorously the connection to Theorem 3.1 and 3.4. Define

where . Every rotation has representation for some . Define the rotational harmonics

for where are the second type Legendre functions described in details in [34]. The are the eigenfunctions of the Laplace-Beltrami operator on hence the family forms a complete orthonormal basis of on , where is the Haar probability measure. Every has a rotational Fourier transform

and for every we have a reconstruction formula

An analogous analysis is available on . Any point is determined by its spherical coordinates for some . Define

(4.2)

for where are the Legendre functions. We have and the constitute an orthonormal basis of on , generally referred to as the spherical harmonic basis. Any has a spherical Fourier transform

and a reconstruction formula

If the spherical convolution operator defined in (4.1) satisfies

(4.3)

and we retrieve the blockwise SVD formalism of Section 2.1 in dimension by setting , where the probability Haar measure on and

We have and by (4.3), is the finite dimension operator stable on with matrix having entries

Hence Assumption 2.1 is satisfied. Notice also that in this case is generally not diagonal. Assumption 2.2 is satisfied as we assume that is ordinary smooth in the terminology of Kim and Koo [22]. Our Assumption 2.2 exactly matches the constraint (3.6) in their paper with examples given by the Laplace distribution on the sphere () or the Rosenthal distribution ( arbitrary). ∎

Numerical implementation.

Following Kerkyacharian, Pham Ngoc and Picard [25] in their Example 2, we take with and . We have .
is the density of a Laplace distribution on , defined through . Hence, the matrices are homotheties whose ratios behave as . We have .
We plot in Figures 1 a -sample of with density on the sphere, and the action by on the , where the are distributed according to in Figure 2. Note that for the estimation of , we have access to a noisy version of with noise level only.

Figure 1: Data from . Plot of data with common distribution on the sphere (planar representation).
Figure 2: Data from . Plot of data on the sphere with common distribution . The are the data pictured in Figure 1 and the are sampled according to (planar representation).

We display below the (renormalised) empirical squared error of (oracle choice ) for Monte-Carlo for several values of . The noise level is to to be compared with the noise level . The latter is chosen non-negative, in order to show the interaction between the two types of error, and sufficiently small to emphasize the influence of on the process of estimation.

Noise level
Mean error 0.0466 0.0542 0.1732 0.2784 0.4335
Standard dev. 0.0011 0.0022 0.0126 0.0355 0.0466

Finally, on a specific sample of data, we plot the target density (Figure 3) and its reconstruction for data with (Figure 4) and (Figure 5). At a visual level, we oversimplify the representation by plotting and its reconstruction with a view from above the sphere through the axis. We see that the contour in Figure 5 is not well recovered in the regions where is small (on the right side of the graph in Figure 5). The choice of remains unchanged.

Figure 3: Target density . The representation is simplified through a view from above the sphere through the -axis.
Figure 4: Reconstruction for and .
Figure 5: Reconstruction for and . The reconstruction is polluted simultaneously by the limited number of observations and the noise level in the blurring .

4.2 Circular deconvolution

Scientific context.

In many engineering problems, the observation of a signal or image is distorted by the action of a linear operator . We assume for simplicity that lives on the torus (or ) appended with periodic boundary conditions. In many instances, the restoration of from the noisy observation of is challenged by the additional uncertainty about the operator . This is the case for instance in electronic microscopy [27] for the restoration of fluorescence Confocal Laser Scanning Microscope (CLSM) images. In other words, the quality of the image suffers from two physical limitations: error measurements or limited accuracy, and the fact that the exact PSF (the incoherent point spread function) that accounts for the blurring of (mathematically the action of ) is not precisely known. This is a classical issue that goes back to [31, 15]. An idealised additive Gaussian model for the noise contamination yields the observation (1.1) with

The degradation process is characterised by the impulse response function . In most cases of interest, we do not know the exact form of . In a condensed idealised statistical setup, we have access to

(4.4)

where is another Gaussian white noise defined on and independent of . Experimental approaches that justify representation (4.4) are described in [10, 20, 30]. ∎

Checking Assumptions 2.1 and 2.1.

We obviously have and the bases and will coincide with the -dimensional extension of the circular trigonometric basis if we set:

where we put

Any has a Fourier transform and moreover, if , we have

Therefore, is diagonal in the basis henceforth stable. Moreover, with , we have . Moreover and Assumption 2.1 follows. Assuming that satisfies for some and constants , we readily obtain Assumption 2.2. Note also that since is diagonal in the basis observing in the representation (4.4) is equivalent to observing in (1.2). ∎

Numerical implementation.

We numerically implement in dimension in the case where there is no noise in the signal (formally ) in order to illustrate the parametric effect that dominates in the optimal rate of convergence in Theorems 3.1 and 3.4 that becomes in that case. We take as target function belonging to for all and defined by its Fourier coefficients

We pick a family of blurring functions defined in the same manner by the formula

Figure 6: Estimation of the rate exponent when . Empirical squared-error versus in log-log scale. Top-to-bottom: . The target function has smoothness for all . For , the slope of the curve is constant and close to , confirming the parametric rate predicted by the theory when the smoothess of the signal dominates the degree of ill-posedness of the operator. The empirical errors were computed using Monte-Carlo simulations.

We show in Figure 6 in a log-log plot the mean-squared error of for the oracle choice over 1000 Monte-Carlo simulations for and . For small values of the predicted slope of the curve gives a rough estimate of the rate of convergence. We visually see that for the critical case with and below, the slope is close to confirming the parametric rate that is obtained whenever . ∎

5 Proofs

5.1 Preliminary estimates

Preparation.

Recall that , and that (resp. ) denotes the orthogonal projector onto (resp. ). For , we have

Using Assumption 2.1, we have and therefore . As a consequence

(5.1)

In turn, we have a convenient description of the observation defined in (1.2) and defined in (1.1) and in terms of a sequence space model that we shall now describe. ∎

Notation.

If , we denote by the (column) vector of coordinates of in the basis . If is a linear operator, we write for the matrix of the Galerkin projection of . ∎

Sequence model for error in the operator.

The observation of in (1.2) leads to the representation , or equivalently, in matrix notation

(5.2)

where is a matrix with entries that are independent centred Gaussian random variables, with unit variance. The following estimate is a classical concentration property of random matrices. For , denotes the operator norm for matrices (we shall skip the dependence upon in the notation).

Lemma 5.1 ([11], Theorem II.4).

There are positive constants such that

(5.3)

An immediate consequence of Lemma 5.1 is the following moment bound:

(5.4)

Sequence model for error in the signal.

From (1.1), we observe the Gaussian measure , or equivalently, thanks to (5.1)