Blockwise SVD with error in the operator and application to blind deconvolution
We consider linear inverse problems in a nonparametric statistical framework. Both the signal and the operator are unknown and subject to error measurements. We establish minimax rates of convergence under squared error loss when the operator admits a blockwise singular value decomposition (blockwise SVD) and the smoothness of the signal is measured in a Sobolev sense. We construct a nonlinear procedure adapting simultaneously to the unknown smoothness of both the signal and the operator and achieving the optimal rate of convergence to within logarithmic terms. When the noise level in the operator is dominant, by taking full advantage of the blockwise SVD property, we demonstrate that the block SVD procedure overperforms classical methods based on Galerkin projection  or nonlinear wavelet thresholding . We subsequently apply our abstract framework to the specific case of blind deconvolution on the torus and on the sphere.
Keywords: Blind deconvolution; blockwise SVD; circular and spherical deconvolution; nonparametric adaptive estimation; linear inverse problems; error in the operator.
Mathematical Subject Classification: 62G05, 62G99, 65J20, 65J22.
Consider the following idealised statistical problem: estimate a function (a signal, an image) from data
is a linear operator between two Hilbert spaces and . The observation of the unknown is challenged by the action of the linear degradation as well as contaminated by an experimental Gaussian white noise on with vanishing noise level as . Alternatively, in a density estimation setting, we observe a random sample drawn from a probability distribution333In that setting, must therefore also be a probability density . with density . In each case, we do not know the operator exactly, but we have access to
where is a Gaussian white noise on thanks to preliminary experiments or calibration through trial functions. This setting has been discussed in details in [14, 18]. In this paper, we are interested in operators admitting a singular value decomposition (SVD) or a blockwise SVD. In essence, we know the typical eigenfunctions of but not the eigenvalues. We cover two specific examples of interest: spherical and circular deconvolution.
Used for the analysis of data distributed on the celestial sphere, see Section 4.1 below. One observes a random sample with
where the are random elements in , the group of rotation matrices, and the are independent and identically distributed on the sphere , with common density with respect to the uniform probability distribution on . In this setting, if the have common density with respect to the Haar measure on , we have
We are interested in the case where the exact form is unknown. However, is block-diagonal in the spherical harmonic basis. ∎
Used for restoring signal or images, see Section 4.2 below. We take the space of square integrable functions on the torus (or ) appended with periodic boundary conditions. We have
The degradation process is characterised by the impulse response function which we do not know exactly. However, is diagonal in the Fourier basis. ∎
Although the problem of estimating is fairly classical and well understood when is known (a selected literature is [32, 8, 12, 1, 17, 29, 28] and the references therein), only moderate attention has been paid in the case of an unknown despite its relevance in practice. When the eigenfunctions of are known solely, we have the results of Cavalier and Hengartner , Cavalier and Raimondo  but they are confined to the case where the error in the operator is negligible . In a general setting with error in the operator, Efromovitch and Kolchinskii  and later Hoffmann and Reiß  studied the recovery of when the eigenfunctions and eigenvalues of are unknown. In both contributions, a marginal attention is paid to the case of sparse or diagonal operators, but it is showed in both papers that unusual rates of convergence can be obtained when . In a univariate setting, Neumann  and Comte and Lacour  consider the case of deconvolution with an error density, known only through an auxiliary set of learning data. This formally corresponds to having in our setting. Minimax rates and adaptive estimators are derived in both regimes and . We address in the paper the following program:
1.2 Main results and organisation of the paper
In Section 2, we present an abstract framework that allows for operators to admit a so-called blockwise SVD. This property is simply turned into the existence of pairs of increasing finite dimensional spaces ( that are stable under the action of . The blockwise SVD property is further appended with a smoothness condition quantified by the arithmetic decay of the operator norm of and its inverse on (resp. ) (the so-called ordinary smooth assumption, see e.g. ). By means of a reconstruction formula, we obtain in Section 2.2 an estimator of by first inverting on with a thresholding tuned with and then filter the resulting signal by a block thresholding tuned with . As for i) and ii), we establish in Theorems 3.1 and 3.4 of Section 3 the minimax rates of convergence for Sobolev constraints on under squared error loss and we demonstrate that is optimal and adaptive to within logarithmic terms. The explicit interplay between and is revealed and discussed in the case of sparse operator when , completing earlier findings in [14, 18] and to some extent  in the univariate case for density deconvolution. In particular, we demonstrate that a certain parametric regime dominates when the smoothness of the signal dominates the smoothing properties of the operator. Concerning iii), the method is applied to the case of spherical and circular deconvolution in Section 4 where harmonic Fourier analysis enables to provide explicit blockwise SVD for the convolution operator. We illustrate the numerical feasability of and the phenomena that appear in the case . Section 5 is devoted to the proofs.
We choose to state and prove our results in the white Gaussian model generated by the observation of and defined by (1.1) and (1.2). The extension to the case of density estimation, when is replaced by the observation of a random sample of size drawn from the distribution , like for instance in [24, 9] is a bit more involved, due to the intrinsic heteroscedasticity that appears when enforcing a formal analogy with the Gaussian setting (1.1). It is briefly addressed in the discussion Section 3.2
2 Estimation by blockwise SVD
2.1 The blockwise SVD property
Let denote a family of linear operators
between two Hilbert spaces and that shall represent our parameter set of unknown .
A fundamental property (Assumption 2.1 below) is that an explicit singular value decomposition (SVD) or blockwise SVD is known for all simultaneously. More specifically, we suppose that there exist two explicitly known bases of and of , as well as a partition of with if , and a constant such that:
where means inequality up to a multiplicative constant that does not depend on . Here .
It is worthwhile to notice that in our examples as well as in the rates of convergence that we will exhibit later, plays the role of a dimension. In particular, corresponds to a ’standard SVD’, whereas creates blocks and deserves the name of ’blockwise’ SVD. However, there is no need in the paper to assume that is in . Set
The Galerkin projection of an operator onto is defined by , where is the orthogonal projector onto .
Assumption 2.1 (Blockwise SVD).
We further need to quantify the action of on . We denote by the operator norm of .
Assumption 2.2 (Spectral behaviour of ).
For every , is invertible and there exists such that
for every .
2.2 Blockwise SVD reconstruction with noisy data
where is a cutoff level, possibly depending on . Likewise, the coefficient can be estimated by
The procedure is specified by the maximal frequency level and the threshold levels
for some prefactors . The threshold rule we introduce in both the signal (with level ) and the operator (with level ) is inspired by classical block thresholding [21, 4, 3] and will enable to adapt with respect to the smoothness properties of both the signal and the operator , see below.
3 Main results
3.1 Minimax rates of convergence
for with , , where the mapping constants are defined in Assumption 2.2.
Theorem 3.1 (Upper bounds).
where means inequality up to a multiplicative constant that depends on and only.
The bounds for and are explicitly computable. In the model generated by in (1.1) and in (1.2), they depend on the dimension and on the absolute constants and of the concentration lemmas 5.3 and 5.6 below. However, they are in practice much too conservative, as is well known in the signal detection case  or the classical inverse problem case , see the numerical implementation Section 4.
Our next result shows that the rate achieved by is indeed optimal, up to logarithmic terms. The lower bound in the case is classical (Nussbaum and Pereverzev ) and will not decrease for increasing noise levels or whence it suffices to provide the case which formally corresponds to observing without noise while remains unknown.
Theorem 3.2 (Lower bounds).
The case of diagonal operators
It is interesting to notice that the condition in Theorem 3.2 excludes the case where is diagonal. In this particular case, considered especially in the deconvolution example of Section 4.2 below, a closer inspection of the proof of the upper bound shows that the rate
can be obtained (up to some extra logarithmic factors) as in the case where , which improves on the rate
provided by Theorem 3.1. This sheds some light on the role of the number . It is in fact twofolds: it acts as a ’dimension’ in the term ; in the term involving error in the operator , it reflects the distance to the diagonal case expanding from in the diagonal case, to in the case . It is very plausible, though beyond the scope of this paper, to express conditions on leading to rates of the form , with continuously varying from 0 to . Note that in the case , we recover the minimax rate of density deconvolution with unknown error as proved by Neumann , see also .
Relation to other works in the case of sparse operators
For an unknown signal with smoothness and unknown operator with degree of ill-posedness , the optimal rates of convergence are
up to inessential logarithmic terms. The exponents and are linked respectively to the error in the signal and the error in the operator . Efromovitch and Kolchinskii  established that under fairly general conditions on the operator , the optimal exponents are given by
They noted however that if certain sparsity properties on are moreover assumed (and that we shall not describe here, for instance if is diagonal in an appropriate basis) then the exponent is no longer optimal, while remains unchanged.
In the related context of operators acting on Besov spaces of functions with smoothness measured in -norm, Hoffmann and Reiß  introduce an ad hoc hypothesis on the sparsity of the unknown operator (that we shall not describe here either), expressed in terms of the wavelet discretization of . They subsequently obtain new rates of convergence for a certain nonlinear wavelet procedure, and these rates overperform (3.5) as expected from the results by . In particular, if one considers the estimation of , in the extreme case where the operator is diagonal in a wavelet basis, the procedure in  achieves the rate
up to extra logarithmic terms. We may compare our results with the rate (3.6). In our setting, if we pick as the Fourier basis described in Section 4.2, then we have . Assuming to be diagonal in the basis which is the exact counterpart of the approach of Hoffmann and Reiß with being diagonal in a wavelet basis, then by Theorem 3.1, our estimator (nearly) achieves the rate
which already outperforms the rate (3.6) whenever the error in the signal is dominated by the error in the operator and is small compared to , as follows from the elementary inequality
The superiority of the blockwise SVD in this setting is explained by the fact that the wavelet procedure in  is agnostic to the diagonal structure of in the wavelet basis, in contrast to that takes full advantage of the block structure of . As already explained in the preceding section, one could actually improve further this result in the specific case of being diagonal in and show that (nearly) achieves the rate , thus deleting the ‘dimensional effect’ of for the error in the operator.
Adaptation over the scales and
The estimator is fully adaptive over the family of Sobolev balls (in the sense that does not require the knowledge of nor ). However, the knowledge of the degree of ill-posedness of is required through the choice of the maximal frequency in (3.2). This restriction can actually be relaxed further in dimension . Indeed, setting formally in (3.2), one readily checks that becomes adaptive over and simultaneously. In dimension however, setting in (3.2) is forbidden, but an alternative adaptivity result can be obtained by taking for some , in which case is fully adaptive over the scale and the restricted family .
Extension to density estimation
We briefly show a line of proof for extending Theorem 3.1 to the framework of density estimation. Suppose that instead of we observe a random sample drawn from assumed to a probability density. By analogy to (2.4), we have an estimator of replacing with
with , an inspection of the proof of Theorem 3.1 reveals that an extension to the density estimation setting carries over as soon as the vector satisfies a concentration inequality, namely
see (5.6) in Lemma 5.2. To that end, we may apply a concentration inequality by Bousquet  as developed for instance in Massart , Eq (5.51) p. 171. The precise control of this extension requires further properties on the basis and on the density via the behaviour of , see Eq. (5.52) p. 171 in . We do not pursue that here.
4 Application to blind deconvolution
4.1 Spherical deconvolution
A common challenge in astrophysics is the analysis of complex data sets consisting of a number of objects or events such as galaxies of a particular type or ultra high energy cosmic rays (UHECR) and that are genuinely distributed over the celestial sphere. Such objects or events are distributed according to a probability density distribution on the sphere, depending itself on the physics that governs the production of these objects or events. For instance, UHECR are particles of unknown nature arriving at the earth from apparently random directions of the sky. They could originate from long-lived relic particles from the Big Bang. Alternatively, they could be generated by the acceleration of standard particles, such as protons, in extremely violent astrophysical phenomena. They could also originate from Active Galactic Nuclei (AGN), or from neutron stars surrounded by extremely high magnetic fields. As a consequence, in some hypotheses, the underlying probability distribution for observed UHECRs would be a finite sum of point-like sources. In other hypotheses, the distribution could be uniform, or smooth and correlated with the local distribution of matter in the universe. The distribution could also be a superposition of the above. Identifying between these hypotheses is of primordial importance for understanding the origin and mechanism of production of UHECRs. The observations, denoted by , are often perturbated by an experimental noise, say , that lead to the deconvolution problem described in Section 1.1. Following van Rooj and Ruymgart , Healy et al. , Kim and Koo  and Kerkyacharian et al. , we assume the following model: we observe an -sample with
where the are distributed on the sphere , with common density with respect to the uniform probability distribution on and independent of the that have a common density with respect to the Haar probability measure on the group of rotation matrices. One proves in [16, 22] that the density of the is
and we are interested in the case where the exact form of the convolution operator is unknown, due for instance to insufficient knowledge of the device that is used to measure the observations. However, we assume approximate knowledge of through as defined in (1.2). ∎
where . Every rotation has representation for some . Define the rotational harmonics
for where are the second type Legendre functions described in details in . The are the eigenfunctions of the Laplace-Beltrami operator on hence the family forms a complete orthonormal basis of on , where is the Haar probability measure. Every has a rotational Fourier transform
and for every we have a reconstruction formula
An analogous analysis is available on . Any point is determined by its spherical coordinates for some . Define
for where are the Legendre functions. We have and the constitute an orthonormal basis of on , generally referred to as the spherical harmonic basis. Any has a spherical Fourier transform
and a reconstruction formula
If the spherical convolution operator defined in (4.1) satisfies
and we retrieve the blockwise SVD formalism of Section 2.1 in dimension by setting , where the probability Haar measure on and
We have and by (4.3), is the finite dimension operator stable on with matrix having entries
Hence Assumption 2.1 is satisfied. Notice also that in this case is generally not diagonal. Assumption 2.2 is satisfied as we assume that is ordinary smooth in the terminology of Kim and Koo . Our Assumption 2.2 exactly matches the constraint (3.6) in their paper with examples given by the Laplace distribution on the sphere () or the Rosenthal distribution ( arbitrary). ∎
Following Kerkyacharian, Pham Ngoc and Picard  in their Example 2, we take with and . We have .
is the density of a Laplace distribution on , defined through . Hence, the matrices are homotheties whose ratios behave as . We have .
We plot in Figures 1 a -sample of with density on the sphere, and the action by on the , where the are distributed according to in Figure 2. Note that for the estimation of , we have access to a noisy version of with noise level only.
We display below the (renormalised) empirical squared error of (oracle choice ) for Monte-Carlo for several values of . The noise level is to to be compared with the noise level . The latter is chosen non-negative, in order to show the interaction between the two types of error, and sufficiently small to emphasize the influence of on the process of estimation.
Finally, on a specific sample of data, we plot the target density (Figure 3) and its reconstruction for data with (Figure 4) and (Figure 5). At a visual level, we oversimplify the representation by plotting and its reconstruction with a view from above the sphere through the axis. We see that the contour in Figure 5 is not well recovered in the regions where is small (on the right side of the graph in Figure 5). The choice of remains unchanged.
4.2 Circular deconvolution
In many engineering problems, the observation of a signal or image is distorted by the action of a linear operator . We assume for simplicity that lives on the torus (or ) appended with periodic boundary conditions. In many instances, the restoration of from the noisy observation of is challenged by the additional uncertainty about the operator . This is the case for instance in electronic microscopy  for the restoration of fluorescence Confocal Laser Scanning Microscope (CLSM) images. In other words, the quality of the image suffers from two physical limitations: error measurements or limited accuracy, and the fact that the exact PSF (the incoherent point spread function) that accounts for the blurring of (mathematically the action of ) is not precisely known. This is a classical issue that goes back to [31, 15]. An idealised additive Gaussian model for the noise contamination yields the observation (1.1) with
The degradation process is characterised by the impulse response function . In most cases of interest, we do not know the exact form of . In a condensed idealised statistical setup, we have access to
We obviously have and the bases and will coincide with the -dimensional extension of the circular trigonometric basis if we set:
where we put
Any has a Fourier transform and moreover, if , we have
Therefore, is diagonal in the basis henceforth stable. Moreover, with , we have . Moreover and Assumption 2.1 follows. Assuming that satisfies for some and constants , we readily obtain Assumption 2.2. Note also that since is diagonal in the basis observing in the representation (4.4) is equivalent to observing in (1.2). ∎
We numerically implement in dimension in the case where there is no noise in the signal (formally ) in order to illustrate the parametric effect that dominates in the optimal rate of convergence in Theorems 3.1 and 3.4 that becomes in that case. We take as target function belonging to for all and defined by its Fourier coefficients
We pick a family of blurring functions defined in the same manner by the formula
We show in Figure 6 in a log-log plot the mean-squared error of for the oracle choice over 1000 Monte-Carlo simulations for and . For small values of the predicted slope of the curve gives a rough estimate of the rate of convergence. We visually see that for the critical case with and below, the slope is close to confirming the parametric rate that is obtained whenever . ∎
5.1 Preliminary estimates
Recall that , and that (resp. ) denotes the orthogonal projector onto (resp. ). For , we have
Using Assumption 2.1, we have and therefore . As a consequence
If , we denote by the (column) vector of coordinates of in the basis . If is a linear operator, we write for the matrix of the Galerkin projection of . ∎
Sequence model for error in the operator.
The observation of in (1.2) leads to the representation , or equivalently, in matrix notation
where is a matrix with entries that are independent centred Gaussian random variables, with unit variance. The following estimate is a classical concentration property of random matrices. For , denotes the operator norm for matrices (we shall skip the dependence upon in the notation).
Lemma 5.1 (, Theorem II.4).
There are positive constants such that
An immediate consequence of Lemma 5.1 is the following moment bound: