Spatially inhomogeneous linear inverse problems with possible singularities

# Spatially inhomogeneous linear inverse problems with possible singularities

[ [ University of Central Florida Department of Mathematics
University of Central Florida
Orlando, Florida 32816-1354
USA
\smonth10 \syear2012\smonth8 \syear2013
\smonth10 \syear2012\smonth8 \syear2013
\smonth10 \syear2012\smonth8 \syear2013
###### Abstract

The objective of the present paper is to introduce the concept of a spatially inhomogeneous linear inverse problem where the degree of ill-posedness of operator depends not only on the scale but also on location. In this case, the rates of convergence are determined by the interaction of four parameters, the smoothness and spatial homogeneity of the unknown function  and degrees of ill-posedness and spatial inhomogeneity of operator .

Estimators obtained in the paper are based either on wavelet–vaguelette decomposition (if the norms of all vaguelettes are finite) or on a hybrid of wavelet–vaguelette decomposition and Galerkin method (if vaguelettes in the neighborhood of the singularity point have infinite norms). The hybrid estimator is a combination of a linear part in the vicinity of the singularity point and the nonlinear block thresholding wavelet estimator elsewhere. To attain adaptivity, an optimal resolution level for the linear, singularity affected, portion of the estimator is obtained using Lepski [Theory Probab. Appl. 35 (1990) 454–466 and 36 (1991) 682–697] method and is used subsequently as the lowest resolution level for the nonlinear wavelet estimator. We show that convergence rates of the hybrid estimator lie within a logarithmic factor of the optimal minimax convergence rates.

The theory presented in the paper is supplemented by examples of deconvolution with a spatially inhomogeneous kernel and deconvolution in the presence of locally extreme noise or extremely inhomogeneous design. The first two problems are examined via a limited simulation study which demonstrates advantages of the hybrid estimator when the degree of spatial inhomogeneity is high. In addition, we apply the technique to recovery of a convolution signal transmitted via amplitude modulation.

[
\kwd
\doi

10.1214/13-AOS1166 \volume41 \issue5 2013 \firstpage2668 \lastpage2697 \newproclaimremarkRemark \newproclaimexampleExample \newproclaimdefinitionDefinition

\runtitle

spatially inhomogeneous linear inverse problems

{aug}

A]\fnmsMarianna \snmPensky\correflabel=e1]Marianna.Pensky@ucf.edu\thanksreft1 \thankstextt1Supported in part by NSF Grant DMS-11-06564.

class=AMS] \kwd[Primary ]62C10 \kwd65J10 \kwd[; secondary ]62G05 \kwd62G20 Linear inverse problems \kwdinhomogeneous \kwdminimax convergence rates \kwdsingularity

## 1 Introduction

### 1.1 Formulation

Let be a known linear operator on a Hilbert space with inner product . The objective is to recover by observing

 y(x)=(Qf)(x)+√εW(x),x∈X, (1)

where is the white noise process and is noise level. Assume that observations can be taken as functionals of

 ⟨y,g⟩=⟨Qf,g⟩+√εξ(g),g∈H, (2)

where is a Gaussian random variable with zero mean and variance such that . In what follows, denotes the -norm, all other norms are explicitly marked.

Model (1) is a common representation of a linear inverse problem with the Gaussian noise and has been studied by many authors [see, Abramovich and Silverman (1998), Bissantz et al. (2007), Cavalier and Golubev (2006), Cavalier et al. (2002), Cohen, Hoffmann and Reiß (2004), Hoffmann and Reiss (2008), Donoho (1995), Golubev (2010), Kalifa and Mallat (2003) and Mair and Ruymgaart (1996), among others]. A typical assumption in the problem above is that operator acts uniformly over the spaces of functions represented at a common scale, independently of the location of a function. In particular, consider a set of “test” functions where , has a bounded support and unit -norm . Then, functions have scale , supports concentrated around and unit norms. Conditions which are commonly imposed on operator imply that it contracts the norms of all functions uniformly, that is, the value of depends considerably on the scale but hardly at all on . Moreover, if there exist , where is the adjoint of operator , then values of follow the same pattern. However, not all linear operators necessarily have those properties.

In order to illustrate the discussion above, consider linear operator with the adjoint given by

 (Qf)(x)=μ(x)∫x0f(t)dt,(Q∗v)(x)=∫1xμ(z)v(z)dz, (3)

where is a smooth function. Assume that function is continuously differentiable and integrates to zero: . Denote and observe that whenever . Then, direct calculations yield , , so that as and

If is a constant or, at least, for some relatively small , then dependence of and on can be ignored, so equation (1) with given by (3) can be treated as a spatially homogeneous problem. However, if is large, dependence on becomes essential and equation (1) is a spatially inhomogeneous inverse problem.

Dependence on becomes even more extreme if vanishes at some point , for example, . Indeed, in this case, is the singularity point and it is easy to show that and

Since wavelets provide an adequate tool for scale-location representations of functional spaces, it is convenient to introduce spatially inhomogeneous linear inverse problems using a wavelet–vaguelette decomposition proposed by Donoho (1995). In particular, in the case when , , Donoho’s assumptions appear as follows: {longlist}[(D3)]

There exist three sets of functions: , an orthonormal wavelet basis of , and nearly orthogonal sets and such that , , , , where depend on resolution index but not on spatial index .

and are such that , .

Sets and are nearly orthogonal, that is, for any sequence one has

 ∥∥∥∑j,kajkλjujk∥∥∥2≍∑j,ka2jk,∥∥∥∑j,kajkλ−1jvjk∥∥∥2≍∑j,ka2jk.

Under conditions (D1)–(D3), can be recovered using reproducing formula

 f=∑j,k⟨Qf,ujk⟩ψjk, (4)

which is analogous to the reproducing formula for the SVD. Assumptions (D1)–(D3) are quite standard. Indeed, similar assumptions were introduced in Cavalier et al. (2002), Cavalier and Golubev (2006), Golubev (2010) and Knapik, van der Vaart and van Zanten (2011). The common premise is that operator acts “uniformly” over subspaces of , so singular values or their surrogate equivalents depend on the resolution level only but not on location. If is the subspace of functions at resolution level , the above assumptions reduce to a common assumption of Galerkin method [see, e.g., Cohen, Hoffmann and Reiß (2004) or Hoffmann and Reiss (2008)] that on subspace operator has a bounded inverse with the norm dependent on only, that is, there exist such that

 supj≥0[λj∥∥Q−1∥∥Vj→Q−1(Vj)]<∞, (5)

which is very similar to combination of assumptions (D1) and (D3) above.

Note that both, assumptions (D1) and (5) imply that any function with has an inverse image, the norm of which is bounded by a constant which is independent of the support of . In this sense, operator is an ill-posed spatially homogeneous operator. In the present paper, we shall be interested in a different situation when assumptions (D1) and (D3) may not be true. In particular, we assume that the norms of the inverse images of depend on the spatial index and may be unbounded, that is, condition (D1) and possibly condition (D3) are violated. We shall refer to the such inverse linear problems as spatially inhomogeneous in comparison with spatially homogeneous problems which satisfy conditions (D1)–(D3) above.

### 1.2 Motivation

Spatially inhomogeneous ill-posed problems appear naturally in the case when either the noise level is spatially dependent or observations are irregularly spaced. Problems of this kind have been considered previously, both theoretically and in practical applications. Nevertheless, in former studies, it was always assumed that the noise level is uniformly bounded above or the design density of observations is bounded away from zero. The situations investigated in the present paper rather refer to locally extreme noise or extremely inhomogeneous design (which can be also described as a local data loss). Traditionally, in the first situation, measurements are treated as outliers and are removed from future analysis while the second one is dealt with using missing data techniques. Approach suggested in the present paper provides an alternative to those methodologies. Extreme noise or extremely inhomogeneous design occur in analysis of forensic data [see, e.g., Li and Satta (2011)] and deconvolution of LIDAR signals [see, e.g., Harsdorf and Reuter (2000) and Gurdev, Dreischuh and Stoyanov (2002)] or astronomical images [see, e.g., Starck and Pantin (2002)]. In addition, spatially inhomogeneous ill-posed problems arise whenever the kernel is spatially inhomogeneous, as, for example, in the case of the amplitude modulation. Below we consider some examples in more detail.

{example}

[(Deconvolution of LIDAR signals)] LIDAR (Light Detection And Ranging or Laser Imaging Detection And Ranging) is an optical remote sensing technology that can measure the distance to, or other properties of, targets by illuminating the target with laser light and analyzing the backscattered light. LIDAR technology has applications in archaeology, geography, geology, geomorphology, seismology, forestry, remote sensing, atmospheric physics. LIDAR data model is mathematically described by convolution equation where is the time-resolved LIDAR signal, is the impulse response function and  is the system response function to be determined [see, e.g., Harsdorf and Reuter (2000) and Gurdev, Dreischuh and Stoyanov (2002)]. However, if the system response function of the LIDAR is longer than the time resolution interval, then the measured LIDAR signal is blurred and the effective accuracy of the LIDAR decreases. This loss of precision becomes extreme when, for example, LIDAR is used to for emergency response and natural disaster management such as assessment of the extent of damage due to volcanic eruptions or forest fires. In this situation, routinely, distances are calculated through filtering of the data set (removing outliers) and applying interpolation techniques. However, keeping all existing data and accounting for extreme noise may improve precision of the analysis of LIDAR signals.

{example}

[(Amplitude modulation)] Amplitude Modulation (AM) is a way of transmitting information in the form of electro-magnetic waves. In AM, a radio wave known as the “carrier” is modulated in amplitude by the signal, that is, to be transmitted, while the frequency remains constant [see, e.g., Miller, Vandome and McBrewster (2009)]. In video or image transmission (such as TV) where the base-band signal has inherent large bandwidth, AM is usually preferred to Frequency Modulation (FM) systems since the latter ones require additional bandwidth. Since in an AM, signal information is “stored” in amplitude which is affected by noise, AM is more susceptible to noise than FM. Mathematically, the problem reduces to multiplying the transmitted signal by the function with large and . In Section 8.2, we provide an in-depth description of application of the methodology developed in the paper to recovery of a convolution signal transmitted via AM.

### 1.3 Objectives and layout of the paper

The objective of the present paper is to introduce the concept of a spatially inhomogeneous linear inverse problem which, to the best of the author’s knowledge, has never been considered previously in statistical framework. It turns out that spatially inhomogeneous problems exhibit properties which are very different from their spatially homogeneous counterparts. In particular, if the norms of vaguelettes are infinite in the vicinity of a singularity point, reproducing formula (4) cease working and the usual wavelet–vaguelette estimators cannot be applied. In this case, we propose a hybrid estimator which is based on combination of wavelet–vaguelette decomposition and Galerkin method. We study two application of the general theory, deconvolution with spatially inhomogeneous design and deconvolution with a spatially inhomogeneous kernel (the case of heterogeneous noise being a particular case of the latter).

Another interesting feature of the model is that the rates of convergence are determined by the interaction of four parameters, the smoothness and spatial homogeneity of the unknown function and degrees of ill-posedness and spatial inhomogeneity of operator . In particular, if operator is weakly inhomogeneous, then the rates of convergence are not influenced by spatial inhomogeneity of operator and coincide with the rates which are usual for homogeneous linear inverse problems.

In what follows, we assume that operator in (1) is completely known. If, in practical applications, this is not true, one has to account for the extraneous errors which stem from the uncertainty in the operator by using, for example, ideas of Hoffmann and Reiss (2008). Also, to simplify our considerations, we limit our study to the case when , and is a scalar. The theory presented below can be generalized to the case when , and is a -dimensional vector. This extension should be relatively straightforward if one is dealing with isotropic Besov spaces but becomes much more interesting and involved in the case of anisotropic Besov spaces [see, e.g., Kerkyacharian, Lepski and Picard (2001)]. However, we leave those extensions for future investigations since considering them below will prevent us from focusing on the main objective of the paper.

The rest of the paper is organized as follows. Section 2 introduces the concept of a spatially inhomogeneous ill-posed problem and formulates major definitions and assumptions which are used throughout the paper. Section 3 presents the asymptotic minimax lower bounds for the -risk of the estimators of the solution of the problem over a wide range of Besov balls. Section 4 talks about estimation strategies, in particular, about partitioning the unknown response function and its estimator into the singularity-affected and the singularity-free parts, the main idea at the core of the hybrid estimator. Section 5 elaborates on the risk of the estimator constructed in the previous section when the lowest resolution level in the zero-affected portion of the estimator is fixed. Section 6 discusses the adaptive choice of the lowest resolution level resolution level and derives the asymptotic minimax upper bounds for the -risk. In Section 7, we consider two examples of spatially inhomogeneous ill-posed problems, deconvolution with the spatially inhomogeneous operator (Section 7.1) which can be viewed as a version of a deconvolution equation with spatially inhomogeneous noise, and deconvolution based on irregularly spaced sample (Section 7.2). Section 8 presents a limited simulation study of deconvolution with heteroscedastic noise and also studies application of the hybrid estimator to recovery of a convolution signal transmitted via amplitude modulation. Section 9 concludes the paper with a discussion. Proofs of the statements are contained in the supplementary material [Pensky (2013)].

## 2 Spatially inhomogeneous ill-posed problem: Assumptions and definitions

Consider a scaling function and a corresponding wavelet with bounded supports and form an orthonormal wavelet basis of . We further impose the following set of assumptions on spatially inhomogeneous operator . {longlist}[(A3)]

There exist functions and such that , , where .

There exists a singularity point and a constant such that if and, for any and any , , one has

 ∥∥∥∑|k−k0j|≥Dajkλjkujk∥∥∥2≤Cu∑|k−k0j|≥Da2jk, (6)

where is independent of and is the parameter corresponding to location ( is not necessarily an integer).

Functions are such that, for any , inequality

 ∥∥∥2j−1∑k=0ajkλ−1jkvjk∥∥∥2≤Cv2j−1∑k=0a2jk (7)

holds for any , , where is independent of .

Note that assumptions (A1)–(A3) are weaker than assumptions (D1)–(D3) above. First, depends not only on resolution level but also on location of the wavelet coefficient. Also, if , then, in the neighborhood of the singularity point , wavelet coefficients cannot be recovered directly since , and we say that operator has a singularity at .

Since one usually start wavelet expansion at some finite resolution level , we need an extra assumption which mirrors assumption (A2) and can be derived from it:

There exist functions and positive constants and independent of such that, for any , ,

 Q∗tmk=φmk,∥∥∥∑|k−k0j|≥D0amkλmktmk∥∥∥2≤Ct∑|k−k0j|≥D0a2mk. (8)

If in assumptions (A2) and (A4), then and for any . Hence, can be expressed using reproducing formula (4) which, in this case, becomes

 f(x)=2m−1∑k=0amkφmk(x)+∞∑j=m2j−1∑k=0bjkψjk(x), (9)

where and . If , reproducing formula (9) cease working and one needs an alternative solution to recovering . Indeed, if in expressions for and is replaced by , then the variances of the wavelet coefficients in the vicinity of singularity are infinite: if and similar consideration applies to . For this reason, at each resolution level, we partition the set of all indices into the singularity-affected indices

 K0m = {k=0,…,2m−1\dvtx|k−k0m|

and the singularity-free indices

 Kc0m = {k\dvtx0≤k≤2m−1,k∉K0m}, Kc1j = {k\dvtx0≤k≤2j−1,k∉K1j}.

To be specific, in what follows, we assume that are such that, for some positive constants , and independent of and , one has

 Cλ02−j(α+β)(1+|k−k0j|)α≤λ2jk≤Cλ2−j(α+β)(1+|k−k0j|)α. (10)

We shall refer to coefficients and in (10) as degrees of ill-posedness and spatial inhomogeneity, respectively. Observe that with satisfying condition (10), the variances of the coefficients at the lower resolution levels may be significantly higher than the variances of the coefficients at higher resolution levels as long as the locations of the lower resolution level coefficients lie in a close proximity of a singularity point.

In the present paper, we consider estimation of a solution of inhomogeneous linear inverse problems in the case when the unknown function is possibly spatially inhomogeneous itself, in particular, belongs to a Besov ball of radius . Interplay between spatial inhomogeneity of operator and properties of  lead to various very interesting phenomena. In particular, if is small or and are relatively large, spatial inhomogeneity does not affect convergence rates and can be recovered as well as in the case of .

{remark}

[(Multiple singularity points)] Note that one can consider a spatially inhomogeneous problems with multiple singularity points and corresponding constants where and for some fixed positive . The theory developed below can be easily extended to this case, with the convergence rates of the estimators determined by the “worst case scenario” among singular points , .

## 3 Minimax lower bounds for the risk over Besov balls

Before constructing an estimator of the unknown function under model (1), we derive the asymptotic minimax lower bounds for the -risk over a wide range of Besov balls.

Recall that for an -regular multiresolution analysis [see, e.g., Meyer (1992), pp. 21–25], with , and for a Besov ball of radius with , and , one has

 ∥f∥Bsp,q=(2m−1∑k=0|amk|p)1/p+(∞∑j=m2js′q(2j−1∑k=0|bjk|p)q/p)1/q

with respective sum(s) replaced by maximum if and/or [see, e.g., Johnstone et al. (2004)].

In what follows, we use the symbol for a generic positive constant, which takes different values at different places and is independent of the noise level . The following statement provides the asymptotic minimax lower bounds for the -risk over Besov balls .

###### Theorem 1

Let and . Then, under assumptions (A1)–(A3), as ,

 Rε(Bsp,q(A))=inf~fsupf∈Bsp,q(A)E∥~f−f∥2≥CΔ(ε), (11)

where the infimum is taken over all possible square-integrable estimators of based on from model (1) and

 Δ(ε)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩A(2(α+β))/(2s′+α+β)ε2s′/(2s′+α+β),if 2s(α−1)≥(β+1)(1−2/p),A(2(β+1))/(2s+β+1)ε2s/(2s+β+1),if 2s(α−1)<(β+1)(1−2/p). (12)
{remark}

[(Convergence rates)] As we show below, the minimax global convergence rates in Theorem 1 are attainable up to a logarithmic factor. The rates are determined by the interaction of four parameters, and . Parameters and describe, respectively, smoothness and spatial homogeneity of the unknown function , while and , defined in (10), are referred to as degrees of ill-posedness and spatial inhomogeneity of operator . If the value of is large, in particular, , convergence rate is significantly affected by the degree of spatial inhomogeneity of . On the other hand, if , spatial inhomogeneity of operator does not affect convergence rate which is determined entirely by the degree of ill-posedness .

## 4 Estimation strategies in the presence of a singularity

To be more specific, consider a periodized version of the wavelet basis on the unit interval

 {φmk,ψjk\dvtxj≥m,k=0,1,…,2j−1}, (13)

where Note that the latter requires that the resolution level is high enough, in particular, , where is such that

 2m1>max(Lφ∗,Lψ∗). (14)

Here, and are the lengths of supports of the mother and father wavelets, and , that generate periodized wavelet basis. Then, for any , the set (13) forms an orthonormal wavelet basis for and, hence, any , can be expanded using formula (9). Under assumptions (A1), (A2) and (A4), one can construct unbiased estimators of coefficients and

 ^amk=⟨y,tmk⟩,^bjk=⟨y,ujk⟩. (15)

If and , respectively, then estimators and have finite variances

 Var(^amk) ≍ λ−2mk,k∈Kc0m, (16) Var(^bjk) ≍ λ−2jk,k∈Kc1j

and have infinite variances otherwise. In order to account for the latter, for any , we partition into the sum of singularity-affected and singularity-free parts

 f(x)=f0,m(x)+fc,m(x),x∈[0,1],

where

 f0,m(x) = ∑k∈K0mamkφmk(x)+∞∑j=m∑k∈K1jbjkψjk(x),x∈[0,1], (17) fc,m(x) = ∑k∈Kc0mamkφmk(x)+∞∑j=m∑k∈Kc1jbjkψjk(x),x∈[0,1]. (18)

We then construct estimators and of and , respectively, and estimate by a hybrid estimator

 ^fm(x)=^f0,m(x)+^fc,m(x),x∈[0,1]. (19)

In particular, we shall use a linear estimator with the resolution level estimated from the data as and a nonlinear block thresholding wavelet estimator as , where the lowest resolution level in is determined by the linear part . In what follows, we shall consider estimation of and separately.

First, we construct a block thresholding wavelet estimator of . For this purpose, we divide the wavelet coefficients at each resolution level into blocks of length to the left of and blocks to the right of , where

 lLjε = (k0j−D)/ln(ε−1), lRjε = (20) ljε = max(lLjε,lRjε).

Define blocks and of indices to the left of and to the right of , respectively, as

 ULjl = {k\dvtx−D−lln(ε−1)

where

 ULj = {l\dvtx1≤l≤lLjε}, URj = {l\dvtx1≤l≤lRjε}, (21) Uj = ULj∪URj.

To simplify the narrative, we shall write and without a specific reference whether a block lies to the right or to the left of . Denote

 Bjl = ∑k∈Ujlb2jk,ˆBjl=∑k∈Ujl^b2jk, (22) Rjlε = εln(ε−1)2j(α+β)|D+(l−1)ln(ε−1)|α≍ε∑k∈Ujlλ−2jk. (23)

For any , estimate by

 ^fc,m(x) = ∑k∈Kc0m^amkφmk(x) +J−1∑j=m∑l∈Uj∑k∈Ujl^bjkI(ˆBjl≥τ2Rjlε)ψjk(x),

where is the indicator function of the set , the value of will be defined later and

 2J=ε−2/(α+β+2). (25)

Now, consider estimation of the singularity-affected part. Since the estimators of , given in (15), have infinite variances when , we estimate those coefficients by solving a system of linear equations. Denote and observe that, for a given , , one has . Here

 fm = ∑k∈K0mamkφmk+∑k∈Kc0mamkφmk, (26) Rm = ∞∑j=m2j−1∑k=0bjkψjk

and, hence,

 Qf=∑k∈K0mamkwmk+∑k∈Kc0mamkwmk+QRm. (27)

Taking scalar products of both sides of (27) with , , obtain

 ⟨wml,Qf⟩ = ∑k∈K0mamk⟨wml,wmk⟩ +∑k∈Kc0mamk⟨wml,wmk⟩+⟨wml,QRm⟩,l∈K0m.

Introduce matrices and and vectors , , , , and with elements

 A(m)lk = ⟨wml,wmk⟩,B(m)lν=⟨wml,wmν⟩, (29) c(m)l = ⟨wml,Qf⟩,^c(m)l=⟨wml,y⟩, (30) r(m)l = ⟨wml,QRm⟩,z(m)k=amk, (31) h(m)ν = amν,^h(m)ν=^amν=⟨y,tmν⟩, (32)

where , and with , are defined in (15). Then, one can rewrite an exact system of linear equations (4) as and obtain its approximate version

 ^c(m)=A(m)^z(m)+B(m)^h(m). (33)

Since matrix is a nonnegative definite matrix of a finite size, in order to guarantee that it is nonsingular, it is sufficient to impose the following almost trivial assumption: {longlist}[(A5)]

Functions , are linearly independent. Under assumption (A5), one has

 z(m) = (A(m))−1(c(m)−B(m)h(m)−r(m)), (34) ^z(m) = (A(m))−1(^c(m)−B(m)^h(m)).

Finally, for a given , we set , and estimate by the following wavelet linear estimator

 ^f0,m(x)=∑k∈K0m^amkφmk(x),x∈[0,1]. (35)
{remark}

[(Relation to nonparametric regression estimation based on spatially inhomogeneous data)] We need to touch upon the relationship between the present paper and the paper by Antoniadis, Pensky and Sapatinas (2013) which considered nonparametric regression estimation based on irregularly spaced data, in particular, in the case when design density has zeros. The latter problem is the well-known formulation which has been already studied extensively [see, e.g., Gaïffas (2005, 2007a, 2007b, 2009)] and, indeed, can be considered as a trivial case of deconvolution with the spatially inhomogeneous design studied in Section 7.2 with being an identity operator. For this reason, the hybrid estimator was proposed in Antoniadis, Pensky and Sapatinas (2013). However, due to the fact that, in the regression set up, one observes function directly, construction of the hybrid estimator is much more involved in the case of an inverse problem than in the case of nonparametric regression. In addition, the present paper provides the implementation of the hybrid estimator and studies its performance via simulations which has never been done previously since Antoniadis, Pensky and Sapatinas (2013) considered only theoretical construction of the hybrid estimator.

## 5 The risks of the estimators of the singularity-free and the singularity-affected parts

In this section, we shall provide asymptotic expressions for the risks of estimators (4) and (35) when resolution level is a fixed, nonrandom quantity possibly dependent on .

Let us first construct an asymptotic upper bound for the singularity-free portion (4) of the estimator. Denote

 λ−2m=∑k∈Kc0mλ−2mk (36)

and observe that, under condition (10), there exist positive constants and independent of such that

 Cλα02m(β+max(1,α))mI(α=1)≤λ−2m≤Cλα2m(β+max(1,α))mI(α=1). (37)
###### Lemma 1

Let , , and assumptions (A1)–(A4) hold. Let be given by (4) where the nonrandom quantity is such that , with defined in (25). If

 τ2=4CuCλ(√2χ+1)2andχ2≥4(β+max(1,α))2+α+β, (38)

where and are defined in (6) and (10), respectively, then, as ,

 (39)

Here is defined in (12),

 (40)

and is the indicator function. Moreover, as ,

 supf∈Bsp,q(A)E∥^fc,m−fc,m∥4=o(ε−2). (41)

Now, we find upper bounds for the singularity-affected portion of the estimator . Recall that and let be such that

 Cw1ρm≤∥wmk∥≤Cw2ρmif k∈K0m. (42)

Note that, since the set contains at most indices, satisfying condition (42) can always be found. The advantage of using the system of equations (33) rests upon the fact that matrix is a finite-dimensional positive definite matrix with all eigenvalues of order . In particular,

 (43)

for some positive constants and independent of , as it is shown in the proof of the following lemma which states the rate of convergence of the singularity-affected portion of the estimator.

###### Lemma 2

Let and . Let assumptions (A1)–(A5) hold and there exists , , independent of , such that

 ρ2m≥Cρλλ2m, (44)

where and are defined in (36) and (42), respectively. Let and also

 ρ−4mmaxl∈K0m∑k∈Kc0mλ−2mk⟨wml,wmk⟩2 ≤ K1λ−2m, (45) maxl∈K0m∞∑j=m2j−1∑k=0∣∣⟨wml,vjk⟩∣∣ ≤ K2ρ2m (46)

for some absolute constants and independent of . Let estimator of  be given by (35). Then, for any , , and some constant independent of and , as , one has

 supf∈Bsp,q(A)E∥^f0,m−f0,m∥2 ≤ C(ελ−2m+2−2ms′), (47) supf∈Bsp,q(A)E∥^f0,m−f0,m∥4 = o(ε−2). (48)

Note that in order estimates , one needs to start the estimator in (4) at exactly the same resolution level at which the linear estimator in (35) is constructed. Thus, the choice of the lowest resolution level in (4) is driven by the choice of in (35).

Let be such that

 (49)

so that, for , one has

 ελ−2m0+2−2m0s′≤CΔ(ε)[ln(ε−1)]ρ. (50)

The following statement delivers the total squared risk of the estimator (19) of if .

Let and