Nonparametric Regression Estimation Based on Spatially Inhomogeneous Data: Minimax Global Convergence Rates and Adaptivity
We consider the nonparametric regression estimation problem of recovering an unknown response function on the basis of spatially inhomogeneous data when the design points follow a known density with a finite number of well-separated zeros. In particular, we consider two different cases: when has zeros of a polynomial order and when has zeros of an exponential order. These two cases correspond to moderate and severe data losses, respectively. We obtain asymptotic (as the sample size increases) minimax lower bounds for the -risk when is assumed to belong to a Besov ball, and construct adaptive wavelet thresholding estimators of that are asymptotically optimal (in the minimax sense) or near-optimal within a logarithmic factor (in the case of a zero of a polynomial order), over a wide range of Besov balls.
The spatially inhomogeneous ill-posed problem that we investigate is inherently more difficult than spatially homogeneous ill-posed problems like, e.g., deconvolution. In particular, due to spatial irregularity, assessment of asymptotic minimax global convergence rates is a much harder task than the derivation of asymptotic minimax local convergence rates studied recently in the literature. Furthermore, the resulting estimators exhibit very different behavior and asymptotic minimax global convergence rates in comparison with the solution of spatially homogeneous ill-posed problems. For example, unlike in the deconvolution problem, the asymptotic minimax global convergence rates are greatly influenced not only by the extent of data loss but also by the degree of spatial homogeneity of . Specifically, even if is non-integrable, one can recover as well as in the case of an equispaced design (in terms of asymptotic minimax global convergence rates) when it is homogeneous enough since the estimator is “borrowing strength” in the areas where is adequately sampled.
Keywords: Adaptivity, Besov spaces, inhomogeneous data, minimax estimation, nonparametric regression, thresholding, wavelet estimation.
AMS (2000) Subject Classification: Primary: 62G08, Secondary: 62G05, 62G20
Applicability of majority of techniques for estimation in the nonparametric regression model rests on the assumption that data is equispaced and complete. These assumptions were mainly adopted by signal processing community where the signal is assumed to be recorded at equal intervals in time. However, in reality, due to unexpected losses of data or limitations of data sampling techniques, data may fail to be equispaced and complete. To this end, we consider the problem of recovering an unknown response function on the basis of irregularly spaced observations, i.e., when one observes governed by
where , , are fixed (non-equidistant) or random points, , , are independent standard Gaussian random variables and (the noise level) is assumed to be known and finite. Model (1.1) can be viewed as a problem of recovering a signal when part of data is lost (e.g., in cell phone use) or unavailable (e.g., in military applications). Model (1.1) is also intimately connected to the problem of missing data since points , , can be viewed as the remainder of equidistant points , , after observations at points have been lost. However, there is a great advantage in treating the missing data problem as a particular case of a nonparametric regression problem: with the last two decades seeing tremendous advancement in the field of nonparametric statistics, a nonparametric regression approach to incomplete data brings along all the modern tools in this field such as asymptotic minimax convergence rates, Besov spaces, wavelets and adaptive estimators.
The problem of estimating an unknown response function in the context of wavelet thresholding in the nonparametric regression setting with irregular design has been now addressed by many authors, see, e.g., Hall and Turlach (1997), Antoniadis and Pham (1998), Cai and Brown (1998), Sardy et al. (1999), Kovac and Silverman (2000), Pensky and Vidakovic (2001), Brown et al. (2002), Zhang et al. (2002), Kohler (2003) and Amato et al. (2006). Several tools were suggested for attacking the problem; here, we shall review only few of them. For instance, the procedure of Kovac and Silverman (2000) relies upon a linear interpolation transformation to the observed data vector that maps it to a new vector of size (), corresponding to a new design with equispaced points. After the transformation, the new vector is multivariate normal with mean and covariance matrix which is assumed to have a finite bandwidth, so that the computational complexity of their algorithm is of order . Cai and Brown (1998) attacked the problem by using multiresolution analysis, projection and wavelet nonlinear thresholding while Sardy et al. (1999) applied an isometric method. Pensky and Vidakovic (2001) estimated the conditional expectation directly by constructing its wavelet expansion, while Amato et al. (2006) applied a reproducing kernel Hilbert space (RKHS) approach in the spirit of Wahba (1990). However, until very recently, all studies have been carried out under the assumption that the nonequispaced design still possesses some regularity, namely, the density function of the design points , , is uniformly bounded from below, i.e., for some constant . In this case, asymptotically, model (1.1) is equivalent to the case of the standard (equispaced) nonparametric regression model, as long as the design density function is known (see, e.g., Brown et al. (2002)).
Recently, an attempt has been made of more advanced investigations of the problem. Kerkyacharian and Picard (2004) introduced warped wavelets to construct estimators of the unknown response function under model (1.1) when the design density function has zeros of polynomial order. They, however, measured the error of their suggested estimator in the warped Besov spaces which is, practically, equivalent to measuring the error of the estimator at the design points only. For this reason, the derived estimators posses the usual asymptotic (as the sample size increases) minimax global convergence rates which do not depend on the order of the zeros of the design density function . This line of investigation was continued by Chesneau (2007) who constructed asymptotic minimax lower bounds over a wide range of Besov balls, under the assumption that the design density function is known and that is integrable, and, furthermore, suggested adaptive wavelet thresholding estimators for the unknown response function . However, in Kerkyacharian and Picard (2004) and Chesneau (2007), the assumptions on the design density function are restrictive enough so that the asymptotic minimax global convergence rates of any estimator coincide with the asymptotic minimax global convergence rates under the assumption that is bounded from below, i.e., the corresponding nonparametric estimation problem is a well-posed problem.
Gaïffas (2005, 2007) was the first author who considered nonparametric regression estimation on the basis of spatially inhomogeneous data as an ill-posed problem. In particular, he constructed pointwise adaptive estimators of on the basis of local polynomials when is non-integrable and showed that the asymptotic minimax local convergence rates of the suggested estimators are slower than in the case when is bounded from below, hence, demonstrating that the aforementioned estimation problem is an ill-posed problem. Since his techniques are intended for local reconstruction and depend on cross-validation at each point, they become too involved when one tries to adapt them to the whole domain of . Furthermore, Gaïffas (2006, 2009) studied asymptotic minimax uniform convergence rates. However, these rates are expressed in a very complex form which is very hard to obtain for belonging to standard functional classes (see Remark 6). Note also that some of his results were recently extended to the multivariate case by Guillou and Klutchnikoff (2011).
Our objective is to study how the zeros of the design density function affect the asymptotic minimax global convergence rates of in model (1.1), and to construct adaptive wavelet thresholding estimators of which attain these rates, over a wide range of Besov balls. As we show below (see Remark 2), assessing asymptotic minimax global convergence rates is a much harder task than assessing asymptotic minimax local convergence rates. Model (1.1) can be viewed as a spatially inhomogeneous ill-posed problem which is inherently more difficult than spatially homogeneous ill-posed problems like, e.g., deconvolution, especially in the case when the unknown response function is spatially homogeneous. To the best of our knowledge, so far, there are no results for asymptotic minimax global convergence rates in the case of spatially inhomogeneous ill-posed problems when its solution is spatially homogeneous since this problem is usually avoided by restricting attention to the case when the estimated function is spatially inhomogeneous, or, at most, belongs to a Sobolev ball (see, e.g., Hoffmann and Reiss (2008)).
In what follows, we address these issues. In particular, we mainly consider two different cases: when has zeros of a polynomial order and when has zeros of an exponential order. We obtain asymptotic (as the sample size increases) minimax lower bounds for the -risk when is assumed to belong to a Besov ball, and construct adaptive wavelet thresholding estimators of that are asymptotically optimal (in the minimax sense) or near-optimal within a logarithmic factor (in the case of a zero of a polynomial order), over a wide range of Besov balls. Due to spatial irregularity, the suggested estimators exhibit very different behavior and asymptotic minimax global convergence rates in comparison with the solution of spatially homogeneous ill-posed problems (see Remark 3). Specifically, even if is non-integrable, one can recover as well as in the case of an equispaced design (in terms of asymptotic minimax global convergence rates) when the function is homogeneous enough since the estimator is “borrowing strength” in the areas where is adequately sampled. These features lead to a different structure of estimators of described in Section 4. The complementary case when is integrable has been partially handled by Chesneau (2007) who showed that the problem is well-posed (i.e., data loss does not affect the asymptotic minimax global convergence rates) when is spatially homogeneous. A complete study of the case when is integrable is considered in Section 7. In depth discussion of the differences of the spatial features in the spatially inhomogeneous ill-posed problem considered in this paper is presented in Section 8.
To address spatial irregularity of the design in the case when the design density function has a zero of a polynomial order, we develop a novel, two-stage, adaptive wavelet thresholding estimator. This estimator consists of a linear part which is taken at a resolution level that is chosen adaptively by Lepski’s method and which estimates in the neighborhood of the zero of . We refer to this as the zero-affected part of the estimator. The second part is nonlinear (thresholding) and is used outside the immediate neighborhood of the zero of . We refer to this as the zero-free part of the estimator. The lowest resolution level of the nonlinear part coincides with the resolution level of the linear part of the estimator, so that the sum of the two parts represents correctly. If is integrable, then the zero-affected portion of the estimator vanishes and can be estimated by an adaptive wavelet thresholding estimator in the spirit of Chesneau (2007).
We limit our attention only to the -risk since the consideration of a wider class of risk functions will make the exposition of the present work even longer; all results, however, obtained can be extended to the case of -risks, . Moreover, we consider only the univariate case, leaving generalizations to the multivariate case for future investigation.
The rest of the paper is organized as follows. Section 2 discusses the formulation of the nonparametric regression estimation problem of the unknown response function on the basis of spatially inhomogeneous data, in particular when the design density function has either a zero of a polynomial order or a zero of an exponential order. Section 3 contains the asymptotic minimax lower bounds for the -risk when is assumed to belong to a Besov ball. Section 4 talks about estimation strategies when is non-integrable, in particular, about partitioning and its estimator into the zero-affected and zero-free parts. Section 5 elaborates on the estimation of the zero-affected and the zero-free parts, and is followed by Section 6 which discusses the choice of adaptive resolution level and derives the asymptotic minimax upper bounds for the -risk in the case when is non-integrable. Section 7 studies the complementary case when has zeros but is still integrable. Section 8 concludes the paper with a discussion. Finally, Section 9 contains the proofs of the statements in the earlier sections.
2 Formulation of the problem
Consider the nonparametric regression model (1.1). Since the noise level is assumed to be known and finite, without loss of generality, we set . Therefore, from now onwards, we work with observations governed by equation (1.1), where is the unknown response function to be recovered, , , are random design points with the underlying density function , and , , are independent standard Gaussian random variables, independent of , . Furthermore, we assume that the design density function is known and has a finite number of zeros which are well-separated, i.e., there exist a constant such that the distance between two consecutive zeros is at least . The last assumption is motivated by the following considerations. If vanishes on an interval , , then consistent estimation of , for , is impossible. Also, has an infinite number of zeros on only in the case when is highly oscillatory, which is not a very likely scenario. Finally, the assumption that has low values on a part of its domain but is still separated from zero is not an interesting case to consider, since the lower bound on will appear in the constant of the well-known expressions for the asymptotic minimax convergence rates (see, e.g., Tsybakov (2009), Chapters 1-2).
Note that the above assumptions are not restrictive. If the noise level is unknown, it can be easily estimated with parametric precision using observations in the region where is separated from zero. The assumption that the design points , , are random is not confining either. In fact, with small modifications of the theory below, one can consider fixed points , generated by an increasing and continuously differentiable function such that , and , . Then, the function plays the role of a “surrogate” distribution function with density function ; the design points , , can be then obtained as , .
Moreover, since the design density function is assumed to be known with a finite number of zeros that are also well-separated, one can partition the interval into subintervals in such a manner that each subinterval contains only one zero of . For this reason, in what follows, without loss of generality, we assume that has only one zero , and that the following condition holds.
Assumption A. Let the design density be a continuous function on the interval with , . Then, there exists constants , ( if ), and such that, for any , with ,
If , we shall say that is a zero of polynomial order. If , we shall say that is a zero of exponential order. Observe that (2.1) implies that there exist some constants such that for any , with and , one has
Note that the two cases in Assumption A correspond to the situations of moderate () and severe () data losses, respectively. Chesneau (2007) showed that in the case of a moderate loss () with (i.e., is integrable), and for a response function that is spatially homogeneous, can be estimated with the same asymptotic minimax global convergence rates as in the case of with (i.e., is uniformly bounded from below); hence, in this case, the nonparametric regression estimation problem turns out to be a well-posed problem.
Therefore, we shall be mainly interested only in the complementary situation when is non-integrable: (i) moderate losses (i.e., ) with and (ii) severe losses (i.e., ) with and . As we shall see below, usually in those cases, the asymptotically optimal (in the minimax sense) estimation procedures yield estimators with lower convergence rates than in the case of equispaced observations, so that the corresponding nonparametric regression estimation problem under model (1.1) becomes ill-posed (see Remark 1), with the degree of ill-posedeness growing as increases when or as increases when .
In what follows, we use the symbol for a generic positive constant, independent of the sample size , which may take different values at different places.
(Risk functions and design). As indicated above, we shall measure the precision of any estimator of by its -risk, i.e.,
If the design points , , in model (1.1) are treated as fixed (i.e., non-random), then, the above risk, evaluated at the equispaced design , , corresponds to
and leads to an ill-posed nonparametric regression estimation problem. However, it is instructive to note that if one measures the precision of an estimator at the design points , , only, by calculating
as it was done in, e.g., Amato et. al (2006), then the problem ceases to be ill-posed. Moreover, in this case, no special treatment is necessary to account for the irregular design. To confirm that, note that model (1.1) can be re-written as
where , , and is the “surrogate” distribution function mentioned earlier. Construct now an estimator of using, e.g., any of the standard wavelet thresholding techniques, and set , . Then,
and takes the form
Therefore, if the observed data vector is treated as if the measurements were carried out at equispaced design points, then, by using, e.g., available wavelet denoising algorithms, the resulting estimator of function will be adaptive and it will lead to the smallest possible risk . This phenomenon was noticed earlier by Cai and Brown (1998), Sardy et al. (1999) and Brown et al. (2002).
(Local versus global convergence rates). The nonparametric regression estimation problem of recovering globally, on the basis of spatially inhomogeneous data, is a much more difficult task than the corresponding problem of estimating locally, say at a given point . Indeed, if , the distribution function associated with the design density function , is known, then and, hence, one can estimate at the point instead of estimating at the point , where , , and is equispaced sampled, as in (2.3). Hence, local estimation can be reduced to a well-addressed pointwise regression estimation problem. If , then the problem is well-posed and has been extensively studied before. If, instead, is a zero of , then one can deduce asymptotic minimax pointwise convergence rates directly from considerations of Remark 1 and straightforward calculus. Let, for simplicity, and , so that and , . Let satisfy a Hölder condition of order at , i.e., . Then, since , , , satisfies a Hölder condition of order at , i.e., for ,
Since, for , , one can set and obtain asymptotic minimax pointwise convergence rates for , on noting that
which coincides with the asymptotic minimax pointwise convergence rates obtained by Gaiffas (2005). The whole argument here rests on the fact that , , so one can estimate at instead of estimating at the . This, however, cannot be accomplished when a global estimation procedure is required since, in such a case, a Taylor expansion is needed, that can be applied only locally.
3 Minimax lower bounds for the -risk over Besov balls
Before constructing an adaptive estimator of the unknown response function under model (1.1), we first derive the asymptotic minimax lower bounds for the -risk over a wide range of Besov balls.
Among the various characterizations of Besov spaces in terms of wavelet bases, we recall that for an -regular multiresolution analysis (see, e.g., Meyer, 1992, Chapter 2, pp 21–25), with , and for a Besov ball defined as
of radius with , one has, with ,
with respective sum(s) replaced by maximum if and/or , where (see, e.g., Johnstone et. al (2004)). We study below the -risk over Besov balls defined as
where is the -norm of a function defined on , and the infimum is taken over all possible square-integrable estimators (i.e., measurable functions) of based on observations from model (1.1).
The following statement provides the asymptotic minimax lower bounds for the -risk.
Let and , and let Assumption A (with if , and and if ) hold. Then, as ,
Note that the asymptotic minimax lower bound for the -risk in the first part of (3.2) is obtained by the arguments in Theorem 3.1 of Chesneau (2007).
(Global convergence rates). As we shall show below, the asymptotic minimax lower bounds for the -risk obtained in Theorem 1 are attainable for and are attainable up to a logarithmic factor for . If , the asymptotic minimax global convergence rates in the first and second parts of (3.2) coincide. Hence, whenever , the aforementioned nonparametric regression estimation problem is not ill-posed but well-posed, in the sense that the asymptotic minimax global convergence rates are the same as in the case of an equispaced design. For , this relation can take place only if , i.e., when the function is spatially homogeneous. In particular, holds true for any such that , i.e., when is very spatially homogeneous ( is large, in particular, when provided that ), so that even a relatively severe data loss does not lead to the reduction of asymptotic minimax global convergence rates. If , then the considered nonparametric regression estimation problem is always well-posed whenever is spatially homogeneous () and also when is spatially inhomogeneous () and . Therefore, even if is spatially inhomogeneous, the aforementioned nonparametric regression estimation problem is well-posed whenever data loss is very limited ().
4 Estimation strategies when is non-integrable
We consider a scaling function and a mother wavelet that generate an orthonormal wavelet basis in , as those obtained from, e.g., an -regular multiresolution analysis of , for some . We shall also assume that and are both compactly supported, with integer bounds on their supports so that, for some , with , ,
(For instance, the Daubechies or Symmlets scaling functions and mother wavelets , with filter number (number of vanishing moments) , satisfy (4.2) with , , and , see, e.g., Mallat (1999), Section 7.2.)
We then obtain a periodized version of the wavelet basis on the unit interval, i.e., for and , as
so that, for any , the set
forms an orthonormal wavelet basis for (see, e.g., Mallat (1999), Theorem 7.16). Hence, for any , any , can be expanded as
Denote by and the support bounds of the periodic scaling function and mother wavelet . Note that the supports of and coincide if and only if , and, similarly, the supports of and coincide if and only if . Choose the lowest resolution level such that , so that supports of periodic and non-periodic wavelets coincide. In this case, we obtain that
For any integer , denote . (Note that is not necessarily a rational quantity and can take any value.) At each resolution level, we partition the set of all indices into the indices which are zero–affected and zero–free. In particular, let and be the sets such that, for any integer and ,
Simple calculations yield that and imply that and , respectively, so that the sets and are zero–free while the sets and are zero–affected.
With the above notation it is easy to see that, for any and , can be partitioned as the sum of zero–affected and zero–free parts, i.e.,
We then construct estimators and of and , respectively, and estimate by
(We emphasize the unusual feature in the construction of : as we shall see below, is a linear wavelet estimator while is a nonlinear (thresholding) wavelet estimator with the lowest resolution level determined by the linear part.)
By observing that, for any function , we have
when the random variable , and setting, for any and , and , , in turn, similarly to (3.3) in Chesneau (2007), we estimate , , and , , respectively, by
Hence, we can construct an estimator of by estimating , , and , , by , , and , , respectively, given in (4.6), along with a thresholding step (see below).
Note that since is non-integrable, the estimators given in (4.6) would have infinite variances if or , so that one cannot construct an analogous estimator of by direct estimation of the appropriate scaling and wavelet coefficients. Instead, in this case, we shall use a linear estimator with the lowest resolution level estimated from the data. In what follows, we shall consider the estimation of and separately.
5 Estimation of the zero-free and the zero–affected parts.
Consider first the estimation of the zero-free part. In order to estimate , we construct a wavelet thresholding estimator as
Here, is a constant, are defined by (4.6) and is such that , where
Consider now the estimation of the zero-affected part. Since the estimators of , given in (4.6), have infinite variances when , we estimate those coefficients by solving a system of linear equations. Note that there is a finite known number of indices in , at most, indices. For any given , such that , denote
and observe that , so that
Denote , and choose such that
Introduce also a finite set of indices
Now, multiply both sides of formula (5.5) by , , where is the indicator of set , and integrate. As a result, obtain the following system of linear equations
Here, matrices and and vectors , , and have, respectively, elements
(Note that the matrices and are completely known, and also observe that only if , since, for , one has .)
Since , it follows from (5.13) that components of vector can be estimated by
using (4.6). We also estimate by
Since matrix is a positive definite matrix of non-asymptotic size, and we obtain the solution
of the system of linear equations (5.15).
Finally, for any given , such that , we set , and estimate by the following linear wavelet estimator
The following statement provides the asymptotic upper bounds for the bias and the variance of the estimator given in (5.16).