Anisotropic Denoising in Functional Deconvolution Model with Dimension-free Convergence Rates

# Anisotropic Denoising in Functional Deconvolution Model with Dimension-free Convergence Rates

Department of Mathematics, University of Central Florida,
Dominique Picard,
University Paris-Diderot, CNRS, LPMA
###### Abstract

In the present paper we consider the problem of estimating a periodic -dimensional function based on observations from its noisy convolution. We construct a wavelet estimator of , derive minimax lower bounds for the -risk when belongs to a Besov ball of mixed smoothness and demonstrate that the wavelet estimator is adaptive and asymptotically near-optimal within a logarithmic factor, in a wide range of Besov balls. We prove in particular that choosing this type of mixed smoothness leads to rates of convergence which are free of the ”curse of dimensionality” and, hence, are higher than usual convergence rates when is large.

The problem studied in the paper is motivated by seismic inversion which can be reduced to solution of noisy two-dimensional convolution equations that allow to draw inference on underground layer structures along the chosen profiles. The common practice in seismology is to recover layer structures separately for each profile and then to combine the derived estimates into a two-dimensional function. By studying the two-dimensional version of the model, we demonstrate that this strategy usually leads to estimators which are less accurate than the ones obtained as two-dimensional functional deconvolutions. Indeed, we show that unless the function is very smooth in the direction of the profiles, very spatially inhomogeneous along the other direction and the number of profiles is very limited, the functional deconvolution solution has a much better precision compared to a combination of solutions of separate convolution equations. A limited simulation study in the case of confirms theoretical claims of the paper.

Keywords and phrases: functional deconvolution, minimax convergence rate, hyperbolic wavelets, seismic inversion

AMS (2000) Subject Classification: Primary: 62G05, Secondary: 62G08,62P35

## 1 Introduction.

Consider the problem of estimating a periodic -dimensional function with , based on observations from the following noisy convolution

 y(u,t)=∫10g(u,t−x)f(u,x)dx+εz(u,t),   u∈[0,1]r,t∈[0,1]. (1.1)

Here, is a positive small parameter such that asymptotically , Function in (1.1) is assumed to be known and is an -dimensional Gaussian white noise, i.e., a generalized -dimensional Gaussian field with covariance function

 E[z(u1,t1)z(u2,t1)] = δ(t1−t2) r∏l=1δ(u1l−u2l),

where denotes the Dirac -function and , .

Denote

 h(u,t)=∫10g(u,t−x)f(u,x)dx.

Then, equation (1.1) can be rewritten as

 y(u,t)=h(u,t)+εz(u,t) (1.2)

In order to simplify the narrative, we start with the two dimensional version of equation (1.1)

 y(u,t)=∫10g(u,t−x)f(u,x)dx+εz(u,t),   u,t∈[0,1]. (1.3)

The sampling version of problem (1.3) appears as

 y(ul,ti)=∫10g(ul,ti−x)f(ul,x)dx+σzli,l=1,⋯,M, i=1,⋯,N, (1.4)

where is a positive constant independent of and , , and are i.i.d normal variables with , and .

Equation (1.4) seems to be equivalent to separate convolution equations

 yl(ti)=∫10fl(x)gl(ti−x)dx+σzli,l=1,⋯,M, i=1,⋯,N, (1.5)

with , and . This is, however, not true since the solution of equation (1.4) is a two-dimensional function while solutions of equations (1.5) are unrelated functions . In this sense, problem (1.3) and its sampling equivalent (1.4) are functional deconvolution problems.

Functional deconvolution problems have been introduced in Pensky and Sapatinas (2009) and further developed in Pensky and Sapatinas (2010, 2011). However, Pensky and Sapatinas (2009, 2010, 2011) considered a different version of the problem where was a function of one variable, i.e. . Their interpretation of functional deconvolution problem was motivated by solution of inverse problems in mathematical physics and multichannel deconvolution in engineering practices. Functional deconvolution problem of types (1.3) and (1.4) are motivated by experiments where one needs to recover a two-dimensional function using observations of its convolutions along profiles . This situation occurs, for example, in geophysical explorations, in particular, the ones which rely on inversions of seismic signals (see, e.g., monographs of Robinson et al. (1996) and Robinson (1999) and, e.g., papers of Wason et al. (1984), Berkhout (1986)and Heimer and Cohen (2008)).

In seismic exploration, a short duration seismic pulse is transmitted from the surface, reflected from boundaries between underground layers, and received by an array of sensors on the Earth surface. The signals are transmitted along straight lines called profiles. The received signals, called seismic traces, are analyzed to extract information about the underground structure of the layers along the profile. Subsequently, these traces can be modeled under simplifying assumptions as noisy outcomes of convolutions between reflectivity sequences which describe configuration of the layers and the short wave like function (called wavelet in geophysics) which corresponds to convolution kernel. The objective of seismic deconvolution is to estimate the reflectivity sequences from the measured traces. In the simple case of one layer and a single profile, the boundary will be described by an univariate function which is the solution of the convolution equation. The next step is usually to combine the recovered functions which are defined on the set of parallel planes passing through the profiles into a multivariate function which provides the exhaustive picture of the structure of the underground layers. This is usually accomplished by interpolation techniques. However, since the layers are intrinsically anisotropic (may have different structures in various directions) and spatially inhomogeneous (may experience, for example, sharp breaks), the former approach ignores the anisotropic and spatially inhomogeneous nature of the two-dimensional function describing the layer and loses precision by analyzing each profile separately.

The paper carries out the following program:

• Construction of a feasible procedure for estimating the -dimensional function which achieves optimal rates of convergence (up to inessential logarithmic terms). We require to be adaptive with respect to smoothness constraints on . In this sense, the paper is related to a multitude of papers which offered wavelet solutions to deconvolution problems (see, e.g., Donoho (1995), Abramovich and Silverman (1998), Pensky and Vidakovic (1999), Walter and Shen (1999), Fan and Koo (2002), Kalifa and Mallat (2003), Johnstone, Kerkyacharian, Picard and Raimondo (2004), Donoho and Raimondo (2004), Johnstone and Raimondo (2004), Neelamani, Choi and Baraniuk (2004) and Kerkyacharian, Picard and Raimondo (2007)).

• Identification of the best achievable accuracy under smoothness constraints on . We focus here on obtaining fast rates of convergence. In this context, we prove that considering multivariate functions with ’mixed’ smoothness and hyperbolic wavelet bases allows to obtain rates which are free of dimension and, as a consequence, faster than the usual ones. In particular, the present paper is related to anisotropic de-noising explored by, e.g., Kerkyacharian, Lepski and Picard (2001, 2008). We compare our functional classes as well as our rates with the results obtained there.

• Comparison of the two-dimensional version of the functional deconvolution procedure studied in the present paper to the separate solutions of convolution equations. We show especially that the former approach delivers estimators with higher precision. For this purpose, in Section 5, we consider a discrete version of functional deconvolution problem (1.4) (rather than the continuous equation (1.3)) and compare its solution with solutions of separate convolution equations (1.5). We show that, unless the function is very smooth in the direction of the profiles, very spatially inhomogeneous along the other direction and the number of profiles is very limited, functional deconvolution solution has a better precision than the combination of solutions of separate convolution equations.

The rest of the paper is organized as follows. In order to make the paper more readable and due to the application to seismic inversion, we start, in Section 2, with the two-dimensional version of the functional deconvolution problem (1.3), describe the construction of a two-dimensional wavelet estimator of given by equation (1.3). In Section 3, we give a brief introduction on spaces of anisotropic smoothness. After that, we derive minimax lower bounds for the -risk, based on observations from (1.3), under the condition that belongs to a Besov ball of mixed regularity and has certain smoothness properties. In Section 4, we prove that the hyperbolic wavelet estimator derived in Section 2 is adaptive and asymptotically near-optimal within a logarithmic factor (in the minimax sense) in a wide range of Besov balls. Section 5 is devoted to the discrete version of the problem (1.4) and comparison of functional deconvolution solution with the collection of individual deconvolution equations. Section 6 extends the results to the -dimensional version of the problem (1.1). Section 7 contains a limited simulation study which supports theoretical claims of the paper. We conclude the paper by discussion of the results in Section 8. Finally, Section 9 contains the proofs of the theoretical results obtained in the earlier sections.

## 2 Estimation Algorithm.

In what follows, denotes the inner product in the Hilbert space (the space of squared-integrable functions defined on the unit interval ), i.e., for . We also denote the complex conjugate of by . Let be a Fourier basis on the interval . Let , , , and be functional Fourier coefficients of functions , , , and respectively. Then, applying the Fourier transform to equation (1.2), one obtains for any

 ym(u)=gm(u)fm(u)+εzm(u)

and

 hm(u)=gm(u)fm(u). (2.1)

Consider a bounded bandwidth periodized wavelet basis (e.g., Meyer-type) and finitely supported periodized -regular wavelet basis (e.g., Daubechies) . The choice of the Meyer wavelet basis for is motivated by the fact that it allows easy evaluation of the the wavelet coefficients in the Fourier domain while finitely supported wavelet basis gives more flexibility in recovering a function which is spatially inhomogeneous in . Let and be the lowest resolution levels for the two bases and denote the scaling functions for the bounded bandwidth wavelet by and the scaling functions for the finitely supported wavelet by . Then, can be expanded into wavelet series as

 f(u,x)=∞∑j=m0−1∞∑j′=m′0−12j−1∑k=02j′−1∑k′=0βj,k,j′,k′ψj,k(x)ηj′,k′(u). (2.2)

Denote , then, . If are Fourier coefficients of , then, by formula (2.1) and Plancherel’s formula, one has

 βj,k(u)=∑m∈Wjfm(u)¯¯¯¯¯¯¯¯¯¯¯¯¯ψj,k,m=∑m∈Wjhm(u)gm(u) ¯¯¯¯¯¯¯¯¯¯¯¯¯ψj,k,m, (2.3)

where, for any ,

 Wj={m:ψjkm≠0}⊆2π/3[−2j+2,−2j]∪[2j,2j+2], (2.4)

due to the fact that Meyer wavelets are band-limited (see, e.g., Johnstone, Kerkyacharian, Picard & Raimondo (2004), Section 3.1). Therefore, are of the form

 βj,k,j′,k′=∑m∈Wj¯¯¯¯¯¯¯¯¯¯¯¯¯ψj,k,m ∫hm(u)gm(u) ηj′,k′(u)du, (2.5)

and allow the unbiased estimator

 ˜βj,k,j′,k′=∑m∈Wj¯¯¯¯¯¯¯¯¯¯¯¯¯ψj,k,m ∫ym(u)gm(u) ηj′,k′(u)du. (2.6)

We now construct a hard thresholding estimator of as

 ˆf(u,t)=J−1∑j=m0−1J′−1∑j′=m′0−12j−1∑k=02j′−1∑k′=0ˆβjk,j′k′ψjk(t)ηj′k′(u) (2.7)

where

 ˆβj,k,j′,k′=˜βj,k,j′,k′1(∣∣˜βj,k,j′,k′∣∣>λjε). (2.8)

and the values of and will be defined later.

In what follows, we use the symbol for a generic positive constant, independent of , which may take different values at different places.

## 3 Smoothness classes and minimax lower bounds

### 3.1 Smoothness classes

It is natural to consider anisotropic multivariate functions, i.e., functions whose smoothness is different in different directions. It is, however, much more difficult to construct appropriate spaces of mixed regularity which are meaningful for applications. One of the objectives of the present paper is to prove that classes of mixed regularity allow to obtain rates of convergence which are free of dimension. This is specifically due to the application of hyperbolic wavelets, i.e., wavelets which allow different resolution levels for each direction (see, e.g., Heping (2004)).

Although comprehensive study of functional classes of mixed regularity is not the purpose of this paper, below we provide a short introduction of functional classes that we are going to consider. Due to relation of this paper to anisotropic de-noising explored by Kerkyacharian, Lepski and Picard (2001, 2008), we also compare classes of mixed regularity used therein to the Nikolski classes considered in the papers cited above.

First, let us recall definition of the Nikolski classes (see Nikolskii (1975)). In this section we consider dimensional multivariate functions. In what follows, we set or .

Let be a measurable function defined on For any we define

 Δyf(x)=f(x+y)−f(x).

If then is the iterated version of the operator (Of course where is the identity operator.) Then, Nikolski classes can be defined as follows:
(recall that denotes for , with the usual modification for .)

1. Let be the canonical basis of . For , we say that belongs to if and only if there exists such that for any one has

 ∥Δlheif∥Lpi(Rd,dx)≤C(si,l)|h|si.

The Nikolski classes defined above were investigated by Kerkyacharian, Lepski and Picard (2001, 2008), they are anisotropic but do not involve mixed smoothness. Quite differently, in the present paper we shall consider classes of mixed regularity defined as follows. Denote , , and let , , . For a subset , we set to be the vector with coordinates when belongs to , and 0 otherwise. For a fixed integer and , we denote

 Δl,ehef(x):=(∏j∈eΔlhjej)f(x),Ωl,e(f,te)p:=sup|hj|≤tj∥Δl,ehef∥p.

Now, in order to construct Besov classes of mixed regularity, we choose and define

 Bs1,…,sdp,∞=⎧⎨⎩f∈Lp, ∑e⊂{1,…,d}supt>0supj∈et−sjjΩl,e(f,te)p<∞⎫⎬⎭. (3.1)

It is proved in, e.g., Heping (2004) that under appropriate (regularity) conditions which we are omitting here, classes (3.1) can be expressed in terms of hyperbolic-wavelet coefficients, thus, providing a convenient generalization of the one-dimensional Besov spaces. Furthermore, Heping (2004) considers more general Besov classes of mixed regularity that correspond to rather than . In this paper, we shall assume that the hyperbolic wavelet basis satisfies required regularity conditions and follow Heping (2004) definition of Besov spaces of mixed regularity

 Bs1,…,sdp,q=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩f∈L2(U): ⎛⎜ ⎜⎝∑j1,…,jd2(∑di=1ji[si+12−1p])q⎛⎝∑k1,…,kd∣∣βj1,k1…,jdkd∣∣p⎞⎠qp⎞⎟ ⎟⎠1/q<∞⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭. (3.2)

Besov classes (3.2) compare quite easily to the Nikolski classes: it is easy to prove that the former form a subset of the latter.

### 3.2 Lower bounds for the risk:two-dimensional case

Denote and

 s∗i=si+1/2−1/p,s′i=si+1/2−1/p′,   i=1,2,p′=min{p,2}. (3.3)

In what follows, we assume that the function belongs to a two-dimensional Besov ball as described above (), so that wavelet coefficients satisfy the following condition

 Bs1,s2p,q(A)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩f∈L2(U): ⎛⎜ ⎜⎝∑j,j′2(js∗1+j′s∗2)q⎛⎝∑k,k′∣∣βj,k,j′k′∣∣p⎞⎠qp⎞⎟ ⎟⎠1/q≤A⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭. (3.4)

Below, we construct minimax lower bounds for the -risk. For this purpose, we define the minimax -risk over the set as

 Rε(V)=inf~fsupf∈VE∥~f−f∥2,

where is the -norm of a function and the infimum is taken over all possible estimators (measurable functions taking their values in a set containing ) of .

Assume that functional Fourier coefficients of function are uniformly bounded from above and below, that is, there exist positive constants , and and , independent of and such that

 C1|m|−2ν≤|gm(u)|2≤C2|m|−2ν. (3.5)

Then, the following theorem gives the minimax lower bounds for the -risk of any estimator of .

###### Theorem 1

Let with , let and , , be defined in (3.3). Then, under assumption (3.5), as

 Rε(Bs1,s2p,q(A))≥CA2(ε2A2)d (3.6)

where

 d=min(2s22s2+1,2s12s1+2ν+1,2s′12s′1+2ν). (3.7)

Note that the value of in (3.7) can be re-written as

 d=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩2s22s2+1,if  s1>s2(2ν+1),2s12s1+2ν+1,if  (1p−12)(2ν+1)≤s1≤s2(2ν+1),2s′12s′1+2ν,if  s1<(1p−12)(2ν+1). (3.8)
###### Remark 1

Note that the rates obtained here are in fact the worst rate associated to the one dimensional problem in each direction, which is not surprising since a function of only one variable and constant in the other direction, e.g., belongs to as soon as belongs to a ball of the usual one-dimensional Besov space , for any .

Also it is worthwhile to observe that the third rate (involving ) corresponds in dimension one to a “sparse” rate. Hence we observe here the so-called “elbow phenomenon” occurring only along the direction 2, because we are considering an -loss and the problem has a degree of ill-posedness precisely in this direction.

## 4 Minimax upper bounds.

Before deriving expressions for the minimax upper bounds for the risk, we formulate several useful lemmas which give some insight into the choice of the thresholds and upper limits and in the sums in (2.7).

###### Lemma 1

Let be defined in (2.6). Then, under assumption (3.5), one has

 Var(˜βj,k,j′,k′)≍ε222jν. (4.1)

Lemma 1 suggests that thresholds should be chosen as

 λjε=Cβ√ln(1/ε) 2jνε (4.2)

where is some positive constant independent of . We choose and as

 2J=(ε2)−12ν+1,2J′=(ε2)−1. (4.3)

Note that the choices of , and are independent of the parameters, , , , and of the Besov ball , and therefore our estimator (2.7) is adaptive with respect to those parameters.

The next two lemmas provide upper bounds for the wavelet coefficients and the large deviation inequalities for their estimators.

###### Lemma 2

Under assumption (3.4), one has

 2j−1∑k=02j′−1∑k′=0∣∣βj,k,j′,k′∣∣2≤A22−2(js′1+j′s′2)

for any .

###### Lemma 3

Let and be defined by formulae (2.6) and (4.2), respectively. For some positive constant , define the set

 Θj,k,j′k′,α={Θ:∣∣˜βj,k,j′,k′−βj,k,j′,k′∣∣>αλjε}. (4.4)

Then, under assumption (3.5), as , one has

 Pr(Θj,k,j′k′,α)=O⎛⎜ ⎜⎝εα2C2β2σ20[ln(1/ε)]−12⎞⎟ ⎟⎠ (4.5)

where and is defined in (3.5).

Using the statements above, we can derive upper bounds for the minimax risk of the estimator (2.7).

###### Theorem 2

Let be the wavelet estimator defined in (2.7), with and given by (4.3). Let condition (3.5) hold and , with . If in (4.2) is such that

 C2β≥80(C1)−1(2π/3)2ν (4.6)

where is defined in (3.5), then, as ,

 supf∈Bs1,s2p,q(A))E∥ˆf−f∥2≤CA2 (ε2ln(1/ε)A2)dln(1ε)d1 (4.7)

where is defined in (3.7) and

 d1=1(s1=s2(2ν+1))+1(s1=(2ν+1)(1/p−1/2)). (4.8)
###### Remark 2

Looking at the previous results, we conclude that the rates obtained by the wavelet estimator defined in (2.7) are optimal, in the minimax sense, up to logarithmic factors. These factors are standard and coming from the thresholding procedure.

## 5 Sampling version of the equation and comparison with separate deconvolution recoveries

Consider now the sampling version (1.4) of the problem (1.3). In this case, the estimators of wavelet coefficients can be constructed as

 ˜βj,k,j′,k′=1M∑m∈Wj¯¯¯¯¯¯¯¯¯¯¯¯¯ψj,k,m M∑l=1ym(ul)gm(ul) ηj′,k′(ul). (5.1)

In practice, are obtained simply by applying discrete wavelet transform to vectors .

For any two sequences and , one says that as if for some constants and independent of . Recall that the continuous versions (2.6) of estimators (5.1) have (see formula (4.1)). In order to show that equation (1.4) is the sampling version of (1.3) with , one needs to show that, in the discrete case, . This indeed is accomplished by the following Lemma.

###### Lemma 4

Let be defined in (5.1). Then, under assumption (3.5), as , one has

 Var(˜βj,k,j′,k′)≍σ2(MN)−122jν. (5.2)

Using tools developed in Pensky and Sapatinas (2009) and Lemma 4, it is easy to formulate the lower and the upper bounds for convergence rates of the estimator (2.7) with given by (2.8) and the values of and defined in (4.2) and (4.3), respectively. In particular, we obtain the following statement.

###### Theorem 3

Let with , let and be defined in (3.3). Then, under assumption (3.5), as , for some absolute constant one has

 R(MN)(Bs1,s2p,q(A))≥C(σ2(MN)−1)d. (5.3)

Moreover, if is the wavelet estimator defined in (2.7), , and and given by (4.3), then, under assumption (3.5), as ,

 supf∈Bs1,s2p,q(A))E∥ˆf−f∥2≤C(σ2(MN)−1ln(MN))d(ln(MN))d1. (5.4)

where and are defined in (3.7) and (4.8), respectively.

Now, let us compare the rates in Theorem 3 with the rates obtained by recovering each deconvolution , , , separately, using equations (1.5). In order to do this, we need to determine in which space functions are contained. The following lemma provides the necessary conclusion.

###### Lemma 5

Let with , and . Then, for any , we have

 fl(t)=f(ul,t)∈Bs1p,q(~A).

Using Lemma 5 and standard arguments (see, e.g., Johnstone, Kerkyacharian, Picard and Raimondo (2004)), we obtain for each

 supfl∈Bs1p,q(~A)E∥~fl−fl∥2≍⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩CN−2s12s1+2ν+1,if  s1≥(1p−12)(2ν+1),CN−2s′12s′1+2ν,if  s1<(1p−12)(2ν+1).

Now, consider estimator of with . If and exist and uniformly bounded for , then rectangle method for numerical integration yields

 E∥~f−f∥2=M−1M∑l=1E∥~fl−fl∥2+RM,

where

 RM≤(12M2)−1 [E∥~fu−fu∥2+√E∥~f−f∥2 E∥~fuu−fuu∥2].

If is large enough, then as and we derive

 E∥~f−f∥2≍⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩CN−2s12s1+2ν+1,if  s1≥(1p−12)(2ν+1),CN−2s′12s′1+2ν,if  s1<(1p−12)(2ν+1). (5.5)

By straightforward calculations, one can check that the only case when convergence rates of separate deconvolution recoveries can possibly be better than that of the simultaneous estimator is when . In this case, , so that comparing the rates, by straightforward calculations we derive that simultaneous recovery delivers better precision than separate ones unless

 limM→∞N→∞MN−s1−s2(2ν+1)s2(2s1+2ν+1)<1,s1>s2(2ν+1). (5.6)

It is easy to see that relation (5.6) holds only if is large, is small and is relatively small in comparison with .

## 6 Extension to the (r+1)-dimensional case

In this section, we extend the results obtained above to the -dimensional version of the model (1.1). In this case, expanding both sides of equation (1.1) over Fourier basis, as before, we obtain for any

Construction of the estimator follows the path of the two-dimensional case. With and defined earlier, we consider vectors , , and , and subsets and of the set of -dimensional vectors with nonnegative integer components:

 Υ(m′,J′)={j′:m′l≤j′l≤J′l, l=1,⋯,r},K(j′)={k′:0≤k′l≤j′l−1, l=1,⋯,r}.

If is the -dimensional vector with all components being , one can expand into wavelet series as

 f(u,t)=∞∑j=m0−12j−1∑k=0∑j′∈Υ(m′,∞) ∑k′∈K(j′) βj,k,j′,k′ψjk(t)r∏l=1ηj′l,k′l(ul), (6.1)

where coefficients are of the form

 βj,k,j′,k′=∑m∈Wj¯ψj,k,m ∫[0,1]d hm(u)gm(u)r∏l=1[ηj′l,k′l(ul)]du, (6.2)

the set is defined by formula (2.4) and . Similarly to the two-dimensional case, we estimate by

 ˆf(u,t)=J−1∑j=m0−12j−1∑k=0∑j′∈Υ(m′,J′)∑k′∈K(j′) ˆβj,k,j′,k′ψjk(t)r∏l=1ηj′l,k′l(ul) (6.3)

with