Semiparametric estimation of spectral density function for irregular spatial data

Semiparametric estimation of spectral density function for irregular spatial data

Shu Yang shuyang@hsph.harvard.edu Zhengyuan Zhu Dept. of Biostatistics, Harvard University, Boston, MA 02115 Dept. of Statistics, Iowa University, Ames, IA 50011
Abstract

Estimation of the covariance structure of spatial processes is of fundamental importance in spatial statistics. In the literature, several non-parametric and semi-parametric methods have been developed to estimate the covariance structure based on the spectral representation of covariance functions. However, they either ignore the high frequency properties of the spectral density, which are essential to determine the performance of interpolation procedures such as Kriging, or lack of theoretical justification. We propose a new semi-parametric method to estimate spectral densities of isotropic spatial processes with irregular observations. The spectral density function at low frequencies is estimated using smoothing spline, while a parametric model is used for the spectral density at high frequencies, and the parameters are estimated by a method-of-moment approach based on empirical variograms at small lags. We derive the asymptotic bounds for bias and variance of the proposed estimator. The simulation study shows that our method outperforms the existing non-parametric estimator by several performance criteria.

keywords:
Decay rate, Generalized cross validation, Integrated prediction error, Irregular observations, Spatial interpolation, Spectrum, Smoothing spline
journal: Journal of Statistical Planning and Inference

1 Introduction

In geostatistics, covariance function is the most common tool modelers use to describe the spatial dependence structure in the data, and it is a crucial ingredient in kriging prediction krige1951statistical (). The covariance function has to be positive definite in order to ensure that the variance of any linear combinations of values of the process at various locations is positive:

for any real numbers , and spatial locations , where is the dimension of the spatial domain. A common solution is to use a parametric family of covariance functions that are positive definite. Weighted least square methods cressie1985fitting () and likelihood-based methods mardia1984maximum (); stein2004approximating () can then be used to estimate parameters. However, it is not always clear what the parametric forms should be, and model misspecification can lead to bad kriging performance.

Due to the positive definite constraint, it is difficult to apply non-parametric techniques directly to estimate the covariance function in the spatial domain. Bochner’s Theorem yaglom1987correlation () shows that a function is continuous and positive definite if and only if it is the Fourier transform of a positive bounded measure on :

(1)

In the case where has a density , which is called the spectral density, (1) can be rewritten as

(2)

For example, for isotropic processes (2) is reduced to a one-dimensional integral

(3)

where is the Gamma function, and is the Bessel function of the first kind of order abramowitz1972handbook (). In the spectral domain the positive definite constraint translates to a non-negative constraint on the spectral density which is much easier to work with.

As a result, many estimation methods for the covariance function have been proposed based on its spectral representation. In the time series literature, much of the analysis of the spectral representation focus on smoothing periodograms, which can be constructed easily for observations on grids. See wahba1980automatic (); beltrato1987determining (); hurvich1985data (); hurvich1990cross (); pawitan1994nonparametric (); fan1998automatic (). Many of these approaches can be generalized to apply to spatial data on grids. Non-parametric modeling of the covariance function and its spectrum for irregularly spaced data include shapiro1991variogram (); genton2002nonparametric (); hall1994nonparametric (); garcia2004nonparametric (); huang2011spectral (); huang2011nonparametric (). However, these methods do not properly take the tail property of the spectral density function into consideration. For example, the nonparametric estimator of Huang et al. (2011b) can only take value on a bounded interval for some cutoff value , and for . Thus the estimated covariance function is a finite-range integral

which leads to exists and is finite for any . A random process with such a covariance function is infinitely smooth. Stein (1999, pg. 30) argues that such smoothness is unrealistic for physical processes under normal circumstances. The resulting nonparametric estimator of the covariance function can be problematic in kriging.

Im et al. (2007) proposed a flexible family of models for the spectral density function that is a linear combination of cubic splines up to a cutoff frequency and an algebraically decaying tail from to infinity. They used a likelihood-based method to estimate the cutoff value and the decay rate assuming the process is a Gaussian random field. Simulation studies indicate that their estimator can perform well empirically. Two limitations of their paper are the following: First, no formal theoretical justification for their method has been developed to date. Second, the estimation method is computationally demanding and can not scale to large data sets.

Following Im et al. (2007), we consider a similar semi-parametric method for estimating spectral density of an isotropic Gaussian random process which addresses both issues. In our proposed method, the spectral density function is modeled by smoothing splines for low frequencies up to a cutoff frequency, which enjoys flexible functional forms, and an algebraic tail for high frequencies. The estimator of the spectral density function at low frequencies can be solved by a regularized inverse problem huang2011spectral (). To estimate the delay rate in the algebraic tail for high frequencies, we employ a Method-of-Moment approach. Our method provides a closed-form solution which allows for theoretical analysis, and we derive asymptotic bounds for the bias and variance of the spectral density estimator. The estimation algorithm is also scalable to large spatial data sets. We would like to note that both the theoretical results and the algorithm are developed for one-dimensional spatial processes. Generalization to higher dimensions will be addressed in a separate paper.

The rest of the paper is organized as follows. Section 2 presents our methodology. In this section, we describe our estimation procedure and provide a closed-form solution. Sections 3 contains the asymptotic results. Section 4 presents a simulation study. Section 5 concludes. Proofs are provided in the Appendix.

2 Methodology

Consider an isotropic Gaussian random process at , , where are irregularly spaced locations. Without loss of generality, we assume that has mean zero and locations satisfy some weak regularity conditions to be specified later in Section 3. For example, locations following a Poisson process would satisfy those conditions. Following Im et al. (2007), we do not posit any parametric form for the spectral density function at low frequencies up to a cutoff frequency , and assume an algebraic tail for the spectral density at high frequencies:

where is the decay rate. The decay rate of the spectral density function and the smoothness parameter of the covariance function are closely related. In Matérn class, suppose that is the Matérn smoothness parameter, then where is the dimension of space. To derive explicit theoretical results, in this paper we only consider random processes in one dimension. The methodology itself is more general and can be adapted to stationary processes in higher dimensions.

We begin by outlining the estimation steps. For estimation of the spectral density at low frequencies up to the cutoff value , we follow the approach in huang2011spectral () (HHC11 from hereon). We set a grid on the range of observations with grid size , and project the irregularly observed points to their nearest grids. We refer to this preprocess step as gridization. Note that the resulting gridized data is still different from time series in that some grids may have zero observation while some grids may have multiple observations. Thus the classical spectral density estimation methods based on the periodograms in time series bartlett1950periodogram (); grenander1953statistical (); parzen1957consistent (); jenkinsdg () are not suitable. We use the smoothing spline estimation method introduced in HHC11b. The estimator is obtained by solving a regularized inverse problem.

The price we pay by projecting irregular data onto grids is that the estimand in focus, the spectral density function based on the gridized data, is different from the true spectral density function , due to aliasing. The relationship between and is given by

(4)

for . The equation (4) allows us to correct the aliasing effect if we know the tail of the spectral density.

For estimation of the spectral density at high frequencies from to , we focus on estimating the decay rate . As mentioned before, the decay rate and the smoothness parameter of the variogram function are closely related. Using Taylor expansion, we have

(5)

where , and . () is also referred to as the fractal dimension of the process. The parameter and the decay rate are linked by . Researchers have been proposed methods in estimation of the fractal dimension of the sample path of a random process based on an equally spaced sample taylor1991estimating (); constantine1994characterizing (); hall1995effect (); chan1995periodogram (); kent1997estimating (); istas1997quadratic (); zhu2002parameter (). We consider estimating based on empirical variograms constructed from the irregularly spaced data. Let be the empirical variogram at a small lag . From equation (5), we expect

(6)

and

(7)

as , where . In this regard, estimation of can be turned into a conventional regression problem. Let be a least square estimate from (6) or a regression estimate of on from (7), it is expected that , as .

We describe the proposed estimating procedure and the mathematical formulations explicitly in the rest of Section 2.

2.1 Smoothing spline estimation of spectral density at low frequencies

We first set a grid with grid size () in the range of the observations and project the irregularly observed points onto the nearest grid. A reasonable choice for the cutoff value is , where is the average sampling rate broersen2006estimating (); eyer1998variable (); press1994numerical (). From the gridized observations, we can estimate the spectral density on . Following HHC11b, we consider the spectral density function estimator belonging to a Sobolev space on ; are absolutely continuous and . Consider the following minimization problem over the functions in ,

(8)

Since the product is an unbiased estimator of

the first term in (8) is small for a function close to . The second term is a roughness penalty term with being the smoothing parameter. Without the penalty term the solution to (8) is unstable and non-unique. The roughness penalty term stabilizes the problem to a well-posed problem. The regularized inverse problem (8) gives a closed form solution as

(9)

where , is the number of location pairs in , and , where is a notation for interval . To simplify the presentation, we refer the readers to HHC11b for derivation of the solution (9).

A data-driven method of choosing the smoothing parameter was discussed in HHC11b where a generalized cross validation approach for smoothing splines villalobos1987inequality () is utilized.

Note that based on (9), we can derive a closed-form formula for the covariance function estimator as

We refer to (9) and (2.1) as HHC spectral density estimator and HHC covariance function estimator. It is easy to see that exists and is finite for any . A random field with such covariance function is infinitely smoothness and is often unrealistic for physical processes.

2.2 Estimation of the decay rate

We consider estimating in (5) based on empirical variograms with small lags constructed from the irregularly spaced data. Let be empirical variogram with lag . For irregularly located data, it is rare that the distance between any pairs of observations is the same. We use tolerance regions cressie1992statistics (). For a given spatial lag we define a tolerance region which includes all pairs with where is a prespecified tolerance size with . Let the empirical variogram estimate at lag be

(11)

where , and is the number of pairs of observations in the tolerance region . After going through prespecified small spatial lags , we obtain a sequence of triples , which stands for the spatial lag, empirical variogram estimate, and the number of pairs at this lag. The size of the tolerance region affects the bias and variance of the empirical variogram . If is small, the bias of is small, however the variance of can be large due to small sample size. If is large, the variance of is small since more samples are used to construct , however the bias can be large. To see this, for an individual term in (11), since , by Taylor expansion we have

where the second and third equality follow from (5). Since is the average of these terms, we have

(12)

Thus the bias of is . The approximated variance of the variogram estimate cressie1985fitting () is . They together explain the aforementioned tradeoff between the bias and variance for a given and determine the large sample properties of our proposed estimator of which we will see in Theorem 1.

From equation (7), we have turned estimation of to a conventional regression problem. Let be a regression estimator of by regressing on , , i.e.

(13)

where . We derive the asymptotic bound for the mean-squared error of as in Theorem 1.

2.3 Adjusting for Aliasing and the final spectral density estimator

Analysis based on the gridized data focus on estimation of , which is different from the true spectral density , due to aliasing. We have obtained for and an estimated algebraic form for , where . We can adjust for aliasing using equation (4) to get the spectral density estimator

for . The parameter is the value of spectral density evaluated at , which is chosen to guarantee that the semi-parametric estimator of spectral density is continuous at the cutoff point . After some algebra, can be estimated by

Thus, our final estimator of spectral density, referred to as YZ estimator, takes the form

By plugging in the form of and , we obtain a closed form for YZ estimator as

(15)

where

The closed-form estimator allows us to study the large sample properties of the proposed estimator, which is presented in Theorem 2.

Lastly, from (15) it is possible for to have negative values. To remove the negativity, a practical solution is to consider

From our simulation study, we found this is not a big concern. In addition, in Theorem 2 we show that is consistent to , so that when we have more samples, is guaranteed to be positive.

3 Asymptotic Results

Assume the following conditions:

(C.1)

Let be an isotropic random process on . has the following linear process representation:

where , and has stationary independent increments with mean zero, the second moment , and the forth moment for .

(C.2)

Let , for some . There exists a bounded, symmetric function with decreasing for and for all large , such that

(16)

and

(17)
(C.3)

The covariance function is differentiable and
, where .

(C.4)

Let be the sample size and be the number of pairs of gridized data with spatial lag . There exist some , such that

(18)

The assumption that an isotropic random process X has a spectral density implies that X has the linear process representation. Thus the condition (C.1) is a necessary condition. We assume additionally that has independent increments to simplify the derivation. It is easy to show from (C.1) that the covariance function . Hence, (16) implies that for all . The condition (C.2) then implies that is a short-memory process. Note that the left hand side of (17) approximates the left hand side of (16) if is large. Thus, (17) is not a strong condition given (16). The condition (C.3) requires the covariance function to be sufficiently smooth. The condition (C.4) guarantees that there are sufficiently many pairs of data associated with each small lag compared with the sample size. This condition is satisfied if we project the irregularly scattered data points into a grid with grid size less than or equal to 1/the average sampling rate.

In what following, we show the asymptotic properties of our estimators. All proof are given in the Appendix.

Theorem 1

Let , and such that , where the notation can be read as the same order as. Let be given by (13), then

(19)

The optimal rate of is , which can be achieved when .

Remark 1

We require to be smaller than so that the empirical variograms are consistent to the true variograms. Specifically, we choose to balance the bias term (12) and the variance of the empirical variograms, respectively. The mean-squared error of is the sum of two terms. The first term is the squared bias term of due to ignoring the high order term in equation (5). The second term is the variance term of , contributed from the variance of the empirical variogram .

Remark 2

Equation (19) indicates that the convergence rate for deteriorates as . Our simulation study (not included in the paper) shows that the variance of is quite stable for all , while the bias increases as for fixed . This is consistent with the theoretical result that the variance term does not depend on , and the deteriorative rate is due to the bias term . Kent et al. (1997) discussed a similar problem, and proposed new estimators based on higher order difference of observations on grids, whose bias does not depend on anymore. It is possible to generalize their results to irregular spaced data, which we did not pursue in this paper. Commonly used covariance function such as the exponential covariance function correspond to . Same is true for Matérn covariance functions with the smoothing parameter for some integer .

Theorem 2

Let be our proposed spectral density estimator (15). Under the conditions (C.1)-(C.4), for and we have

(20)

and

(21)

Corollary 3

Let be our proposed spectral density estimator (15). Under the conditions (C.1)-(C.4), for , , and , there exists a constant such that for all ,

(22)

Remark 3

In Corollary 3, we assume to simplify the discussion, which covers many commonly used covariance models, see Remark 2. HHC11b derived the asymptotic bounds for the bias and variance of HHC estimator of the spectral density on . Here we extend that to the whole real line . The first term and the second term are the same as that derived in HHC11b. The extra term is due to the estimation of the tail behavior. The implications of (22) is the following: Assume the range of the sample path of is , where we have observations with , then is bounded by , which is the same with the optimal rate of convergence of the smoothed periodogram estimator priestley1981spectral (); grenander1957statistical ().

4 Simulation study

In this section we assess the performance of the proposed estimator, denoted by , with irregular spatial data in a Monte Carlo study relative to the previously proposed estimators, first the smoothing spline estimator as proposed in HHC11b, denoted by HHC, second a parametric estimator under the Matérn covariance model with parameters estimated by the maximum likelihood approach, denoted by Matérn . We have two Model Setups, one with a Matérn covariance model, and the other one with a spherical covariance model. In both Model Setups, the parametric estimation procedure assumes a Matérn covariance model. Therefore it is correctly specified in the former while it is misspecified in the latter. We would like to assess the robustness of our proposed semi-parametric estimation procedure. For the non-parametric methods, previous simulations have found that HHC is superior to other approaches for irregular data in the literature, including a procedure introduced in broersen2006estimating () in terms of the mean-squared error of estimating spectral density (HHC11b). Here we focus on comparisons of the proposed estimator with HHC.

4.1 Simulation setup

We consider the spectral density estimation of a Gaussian process on the real line , whose values are observed at random locations.

  1. In Model Setup One, the covariance function is a Matérn covariance function

    and the corresponding spectral density is

    where , and .

  2. In Model Setup Two, the covariance function is a spherical covariance function

    and the corresponding spectral density is obtained by the inverse Fourier transformation

    where , and .

In the simulation, we consider sample sizes to be , , and . The process is observed at locations that are i.i.d. uniformly distributed on the range .

4.2 Estimation

HHC estimator is fitted on the frequency interval with the cutoff frequency . The smoothing parameter is selected by generalized cross validation procedure. In estimation, the empirical variograms are constructed with lags , which serve as the building blocks in the regression estimator . In the parametric estimation procedure, we fit the Matérn covariance function and the corresponding spectral density. We evaluate the performance of fitting the spectral density and the covariance function by the integrated squared error (ISE) yu2007kernel ():

and

4.3 Spatial kriging

To compare the kriging performance based on the estimated covariance function, we consider equally spaced locations inside the observation interval for prediction. Let be the predicted value at location using the true covariance function , and be the predicted value with an estimated covariance function . The prediction errors are , and , respectively. Let denote the expectation under the true covariance function . Then is the mean-squared prediction error (MSPE) of the best linear unbiased predictor or the kriging variance. It is easy to show that . The second term on the right hand side represents the extra mean-squared prediction error introduced by predicting with an estimated covariance function instead of the true one. We refer to this term as the increase in prediction error at location , denoted by . We conduct Monte Carlo simulations and compute the prediction performance measure as

with the superscript indicating that the quantity is obtained from the -th Monte Carlo sample. Smaller value indicates a better kriging performance for the corresponding covariance function estimator.

4.4 Simulation result

Figure 1 visualizes the performance of spectral density estimation of HHC, , and Matérn estimator with and in two Model Setups. From these figures, we can see that estimator is always lying below HHC estimator by correcting the aliasing problem. HHC tends to overestimate the spectral density at higher frequencies and reduces this bias by adjusting for the aliasing effect. When sample size increases, both HHC and become closer to the true spectral density. In Model Setup One where the data generating model uses a Matérn family, Matérn estimator does a very good job in estimating the spectral density function. This is expected since the model is correctly specified. However in Model Setup Two where the data generating model uses a spherical function, Matérn estimator tends to away from the true spectral density.

Figure 2 visualizes the performance of covariance function estimation of HHC, , and Matérn estimator with and in two Model Setups. The covariance function estimates from HHC method exhibit oscillation even when sample size is increased. By expression (2.1), the covariance function estimates from HHC method are infinitely differentiable at original and is a combination of functions, which contains oscillation. Whereas, the covariance function estimate from our proposed method is very close to the true covariance function and coverages to the true covariance function when sample size increases. Among the three methods HHC, , and Matérn, the parametric approach Matérn is the best given the model is correctly specified; however its performance deteriorates if model is misspecified.

Table 1 presents Monte Carlo median of , , and mIPE for three methods under two Model Setups. The performance of estimating spectral density forHHC and are comparable ( is similar for HHC and ). However, outperforms HHC in terms of estimating covariance function and spatial kriging ( and are smaller for than HHC). By estimating the tail behavior of the spectral density, gains improvement in Kriging prediction. Again, the parametric approach Matérn is the best given the model is correctly specified; however its performance deteriorates if model is misspecified, which suggests that the parametric approach is not robust.

(a) Model Setup One with n=250.
(b) Model Setup One with n=500.
(c) Model Setup Two with n=250.
(d) Model Setup Two with n=500.
Figure 1: Spectral Density Estimation in Model Setup One and Two with n=250, 500. (a): Model Setup One with n=250; (b): Model Setup One with n=500; (c) Model Setup Two with n=250; (d) Model Setup Two with n=500. The black solid line is the true spectral density function; the red dashed line is HHC estimator; the blue dotted line is YZ estimator; and the green dashed line is the maximum likelihood estimator with a covariance model in a Matérn family..
(a) Model Setup One with n=250.
(b) Model Setup One with n=500.
(c) Model Setup Two with n=250.
(d) Model Setup Two with n=500.
Figure 2: Covariance Function Estimation in Model Setup One and Two with n=250, 500. (a): Model Setup One with n=250; (b): Model Setup One with n=500; (c) Model Setup Two with n=250; (d) Model Setup Two with n=500. The black solid line is the true spectral density function; the red dashed line is HHC estimator; the blue dotted line is YZ estimator; and the green dashed line is the maximum likelihood estimator with a covariance model in a Matérn family..
Model Setup One Model Setup Two
with Matérn Covariance with Spherical Covariance
n
HHC 0.0090 0.1201 56.650 0.0312 0.6880 89.480
250 YZ 0.0097 0.0501 0.1633 0.0214 0.0875 1.2081
Matérn 0.0055 0.0226 0.1401 0.0388 0.1034 1.3944
HHC 0.0078 0.0836 3.0119 0.0188 0.7420 13.5128
500 YZ 0.0075 0.0558 0.1133 0.0164 0.0866 0.9217
Matérn 0.0039 0.0205 0.0550 0.0244 0.0967 0.9781
HHC 0.0035 0.0827 1.3007 0.0103 0.7333 10.046
1000 YZ 0.0027 0.0242 0.1104 0.0091 0.0590 0.8001
Matérn 0.0019 0.0118 1.5e-4 0.0245 0.0799 0.9123
Table 1: Monte Carlo Simulation Result in Model Setup One and Two. In Model Setup One, the data generating model uses Matérn Covariance. In Model Setup Two, the data generating model uses Spherical Covariance. The estimation methods include (1)HHC: the smoothing spline approach as proposed in HHC11b; (2)YZ: the proposed estimator in the paper (initial of the authors); (3) Matérn : maximum likelihood approach with a covariance model in a Matérn family. The evaluation measures include (A) ISE(f):the integrated squared error of spectral density function estimator; (B) ISE(C): the integrated squared error of covariance function estimator; (C) mIPE: the Monte Carlo median of the increase in prediction error .

5 Discussion Remarks

In this paper we proposed a semi-parametric method to estimate spectral densities of isotropic Gaussian processes observed at irregular locations on . The methodology can be adapted for spectral density estimation of spatial processes that are stationary or intrinsic random processes on with . Such extension will be addressed in a separate paper.

The proposed estimator is in a closed form, so it does not require heavy numerical computation. Therefore it is feasible for large-scale spatial data. It also allows us to derive asymptotic bounds for the bias and variance of the proposed estimator, and to prove the estimator is consistent in theory.

Our method builds on HHC estimator in HHC11b. The difference is that we additionally estimate the spectral density at high frequencies which has been ignored in HHC11b. The rationale is that the tail properties of the spectral function play a fundamental role in the prediction. Our simulation study shows that the proposed estimator outperforms HHC estimator in Kriging prediction.

This semi-parametric method allows modeling the spectral density at low frequencies, and therefore it is more flexible than the fully parametric approach.

Appendix A Proof of Theorem 1

Since in the estimator (13), we have involved log of the empirical variogram estimate instead of the empirical variogram estimate itself, we first derive the moment property for . By Taylor expansion technique,

The second equality follows from (12) and the approximated variance of . Combining individual terms,

Therefore, the square bias term can be derived:

and the variance term is

Combining the squared bias term and the variance term, we have

Appendix B Proof of Theorem 2

Write the spectral density based on the gridized data as

for . From the aliasing problem, we also have .

Firstly, we consider the bias of the spectral density estimator at the cutoff value . From (9),

(23)

By Taylor expansion technique,

where . Therefore (23) becomes

(24)