Kriging: Beyond Matérn

Kriging: Beyond Matérn

Abstract

The Matérn covariance function is a popular choice for prediction in spatial statistics and uncertainty quantification literature. A key benefit of the Matérn class is that it is possible to get precise control over the degree of differentiability of the process realizations. However, the Matérn class possesses exponentially decaying tails, and thus may not be suitable for modeling long range dependence. This problem can be remedied using polynomial covariances; however one loses control over the degree of differentiability of the process realizations, in that the realizations using polynomial covariances are either infinitely differentiable or not differentiable at all. We construct a new family of covariance functions using a scale mixture representation of the Matérn class where one obtains the benefits of both Matérn and polynomial covariances. The resultant covariance contains two parameters: one controls the degree of differentiability near the origin and the other controls the tail heaviness, independently of each other. Using a spectral representation, we derive theoretical properties of this new covariance including equivalence measures and asymptotic behavior of the maximum likelihood estimators under infill asymptotics. The improved theoretical properties in predictive performance of this new covariance class are verified via extensive simulations. Application using NASA’s Orbiting Carbon Observatory-2 satellite data confirms the advantage of this new covariance class over the Matérn class, especially in extrapolative settings.

Keywords: Equivalence measures; Gaussian scale mixture; Long-range dependence; Prediction; Spectral density; XCO2 data.

Pulong Ma

Statistical and Applied Mathematical Sciences Institute and Duke University

79 T.W. Alexander Drive, P.O. Box 110207, Durham, NC 27709, USA

pulong.ma@duke.edu

and

Anindya Bhadra

Department of Statistics, Purdue University

250 N. University St., West Lafayette, IN 47907, USA

bhadra@purdue.edu


1 Introduction

Kriging, a method for deriving the best spatial linear unbiased predictor in Gaussian process regression, is a term coined by Matheron (1963) in honor of the South African mining engineer D. G. Krige (Cressie, 1990). With origins in geostatistics, applications of kriging has permeated fields as diverse as spatial statistics (e.g., Matérn, 1960, Journel and Huijbregts, 1978, Ripley, 1981, Cressie, 1993, Stein, 1999, Banerjee et al., 2014), uncertainty quantification or UQ (e.g., Santner et al., 2018, Sacks et al., 1989, Gu et al., 2018, Berger and Smith, 2019) and machine learning (Williams and Rasmussen, 2006). Suppose is a stochastic process with a covariance function that is solely a function of the increment . Then is said to be second-order stationary (or weakly stationary). Further, if is a function of with denoting the Euclidean norm, then is called isotropic. If the process possesses a constant mean function and a weakly stationary (resp. isotropic) covariance function, the process is called weakly stationary (resp. isotropic). Further, is a Gaussian process (GP) if every finite-dimensional realization jointly follows a multivariate normal distribution for and every .

The Matérn covariance function (Matérn, 1960) has been widely used in spatial statistics due to its flexible local behavior and nice theoretical properties (see, e.g., Stein, 1999) with increasing popularity in the UQ and machine learning literature (Guttorp and Gneiting, 2006). The Matérn covariance function is of the form:

(1)

where is the variance parameter, is the range parameter, and is the smoothness parameter that controls the differentiability of the associated random process. Here is the modified Bessel function of the second kind that satisfies as , where denotes . Further, we use the notation if . Thus, using this asymptotic expression of for large from Section 6 of Barndorff-Nielsen et al. (1982), the behavior of the Matérn covariance function is given by:

Eventually, the term dominates, and the covariance decays exponentially for large . This exponential decay may make it unsuitable for modeling some very long-range dependence. This problem with the Matérn covariance can be remedied by using covariance functions that decay polynomially, such as the generalized Wendland (Gneiting, 2002) and generalized Cauchy covariance functions (Gneiting, 2000, Gneiting and Schlather, 2004), but in using these polynomial covariance functions one loses a key benefit of the Matérn family: that of the degree of differentiability of the process realizations. Process realizations with a Matérn covariance function are exactly times differentiable, whereas the realizations with a generalized Cauchy covariance function are either non-differentiable (very rough) or infinitely differentiable (very smooth), without any middle ground (Stein, 2005). The generalized Wendland covariance family also has limited flexibility near the origin compared to the Matérn class and has compact support (Gneiting, 2002).

Stochastic processes with power-law covariance for long-range dependence are ubiquitous in many scientific disciplines including geophysics, meteorology, hydrology, astronomy, agriculture and engineering; see Beran (1992) for a survey. In UQ studies, certain inputs may have little impact on output from a computer model, and these inputs are called inert inputs; see Chapter 7 of Santner et al. (2018) for detailed discussions. Long-range covariance functions can allow for large correlations among distant observations and hence are more suitable for modeling these inert inputs. Most often, computer model outputs can have different smoothness properties due to the behavior of the physical process to be modeled. Thus, long-range covariances with the possibility of controlling the differentiability of stochastic process realizations are very desirable for modeling such output. In spatial statistics, long-range dependence has been studied in many works (e.g., Haslett and Raftery, 1989, Gay and Heyde, 1990, Gneiting, 2000). In the rest of the paper, we focus on modeling long-range dependence in spatial settings. As a motivating example, Figure 1 shows a 16-day repeat cycle of NASA’s Level 3 data product of the column-averaged carbon dioxide dry air mole fraction (XCO2) at and collected from the Orbiting Carbon Observatory-2 (OCO-2) satellite. The XCO2 data are collected over long latitude bands and have large missing gaps between these latitude bands. Predicting the true process over these large missing gaps based on a spatial process model is challenging. If the covariance function only allows short-range dependence, the predicted true process will be predominated by the mean function in the spatial process model with the covariance function having negligible impact over these large missing gaps. However, if the covariance function can model long-range dependence, the predicted true process over these missing gaps will carry more information from distant locations where observations are available, presumably resulting in better prediction. Thus, it is of fundamental and practical interest to develop a covariance function to have long-range dependence without sacrificing the control over the smoothness behavior of the process realizations.

Fig. 1: XCO2 data from June 1 to June 16, 2019. The units are parts per millions (ppm).

In this paper we propose a new family of covariance functions that bridges this gap between the Matérn covariance and polynomial covariances. The proposed covariance class is obtained by mixing the Matérn covariance over its range parameter . This is done by recognizing the Bessel function in the Matérn covariance function as proportional to the normalizing constant of the generalized inverse Gaussian distribution (Barndorff-Nielsen, 1977, 1978) , which then allows analytically tractable calculations with respect to a range of choices for mixing densities, resulting in valid covariance functions with varied features. Apart from this technical innovation, the key benefit is that this mixing does not affect the origin behavior and thus allows one to retain the precise control over the smoothness of process realizations as in Matérn. However, the tail is inflated due to mixing, and, in fact, the mixing distribution can be chosen in a way so that the tail of the resultant covariance function displays regular variation, with precise control over the tail decay parameter . A function is said to have a regularly decaying right tail with index if it satisfies as for some where is a slowly varying function at infinity with the property for all (Bingham et al., 1989). Unlike a generalized Cauchy covariance function, this new covariance class is obtained without sacrificing the control over the degree of differentiability of the process, which is still controlled solely by , and the resulting process is still exactly times differentiable, independent of . Moreover, regular variation is preserved under several commonly used transformations, such as sums or products. Thus, it is possible to exploit these properties of regular variation to derive new covariance functions with similar features from the original covariance function that is obtained via a mixture of the Matérn class.

The rest of the paper is organized as follows. Section 2 begins with the construction of the proposed covariance function as a mixture of the Matérn covariance function over its range parameter. We verify that such construction indeed results in a valid covariance function. The behaviors of this covariance function near the origin and in the tails are characterized by two distinct parameters, which in turn control over the smoothness and the degree of long-range dependence, respectively. Section 3 presents the main theoretical results for the new covariance function. We first derive the spectral representation of this new covariance and characterize its high-frequency behavior, and then show theoretical properties concerning equivalence classes under Gaussian measures and asymptotic normality of the maximum likelihood estimators. The resultant theory is extensively verified via simulations in Section 4. In Section 5, we use this new covariance function to analyze NASA’s OCO-2 data, and demonstrate better prediction results over the Matérn covariance function. Section 6 concludes with some discussions for future investigations.

2 A New Covariance Class as a Mixture of the Matérn Class

Our starting point in mixing over the range parameter in the Matérn covariance function is the correspondence between form of the Matérn covariance function and the normalizing constant of the generalized inverse Gaussian distribution of Barndorff-Nielsen (1977). The generalized inverse Gaussian distribution has density on given by:

Thus,

Take and . Then we have the following representation of the Matérn covariance function with range parameter and smoothness parameter :

Thus, the mixture over with respect to a mixing density can be written as

(2)

The inner integral may be recognized as a mixture of gamma integrals (by change of variable ), which is analytically tractable for many choices of ; see for example the chapter on gamma integrals in Abramowitz and Stegun (1965). More importantly, as we show below, the parameters of the mixing density can be chosen to achieve precise control over certain features of the resulting covariance function. Our first result is as follows, with proof given in Section S.1 of the Supplementary Material.

Theorem 1.

Let denote an inverse gamma random variable using the shape–scale parameterization with density . Assume that and that is the Matérn covariance function in Equation (1). Then is a valid covariance function on with the following form:

(3)

where is the variance parameter, is called the tail decay parameter, is called the scale parameter, and is called the smoothness parameter.

Having established the resultant mixture as a valid covariance function, one may take a closer look at its properties. To begin, although the final form of involves an integral, and thus may not appear to be in closed form at a first glance, the situation is indeed not too different from that of Matérn, where the associated Bessel function is available in an algebraically closed form only for certain special cases; otherwise it is available as an integral. In addition, this representation of is sufficient for numerically evaluating the covariance function as a function of via either quadrature or Monte Carlo methods. Additionally, with a certain change of variable, the above integral can be identified as belonging to a certain class of special functions that can be computed efficiently. More precisely, we have the following elegant representation for the new covariance function.

Corollary 1.

The proposed covariance function in Equation (3) can also be represented in terms of the confluent hypergeometric function of the second kind:

(4)

where and .

Proof.

By making the change of variable , standard calculation yields that

Thus, the conclusion follows by recognizing the form of the confluent hypergeometric function of the second kind from Chapter 13.2 of Abramowitz and Stegun (1965). ∎

Equation (4) provides a convenient way to evaluate the new covariance function, since efficient numerical calculation of the confluent hypergeometric function is implemented in various libraries such as the GNU scientific library (Galassi et al., 2002) and softwares including R and MATLAB, facilitating its practical deployment. The Matérn covariance function is sometimes parameterized differently. The mixing density can be chosen accordingly to arrive at results identical to ours. For instance, with parameterization of the Matérn class given in Stein (1999), a gamma mixing density with shape parameter and rate parameter would lead to an alternative route to the same representation of the new covariance class. The limiting case of the Matérn class is the squared exponential (or Gaussian) covariance function when the smoothness parameter in Equation (1) goes to . Mixing over the inverse gamma distribution in Theorem 1 yields a special case of the generalized Cauchy covariance function.

Besides the computational convenience, the new covariance function in Equation (3) also allows us to make precise statements concerning the origin and tail behaviors of the resultant mixture. This is clearly more important, since for stationary random fields, the short-range (local) behavior is determined by the differentiability of the covariance function near origin, while the long-range behavior is determined by the tail behavior of the covariance function. The next theorem makes the origin and tail behaviors explicit with proof given in Section S.2 of the Supplementary Material.

Theorem 2.

The covariance function has the following two properties:

  • Origin behavior: has the same origin behavior as the Matérn covariance function given in Equation (1).

  • Tail behavior: as , where is a slowly varying function at of the form .

A weakly stationary process on is -times differentiable if its covariance function has derivatives at the origin (Stein, 1999, Section 2.4). A random process generated by the new covariance function with smoothness parameter is times differentiable in the mean-square sense. The local behavior of this new covariance function is very flexible in the sense that the parameter can allow for any degree of differentiability of a weakly stationary random process in the same way as the Matérn class. However, its tail behavior is quite different from that of the Matérn class, since the new covariance function has a polynomial tail that decays slower than the exponential tail in the Matérn covariance function. This is natural since mixture inflates the tails in general, and in our particular case, changes the exponential decay to a polynomial one. The rate of tail decay is controlled by the parameter . Thus, this new covariance class may be more suitable for very long range dependence which an exponentially decaying covariance function may fail to capture. Moreover, the control over the degree of smoothness of the process realization is not lost. Theorem 2 also establishes a very desirable property that the degrees of differentiability near origin and the rate of decay of the tail for the new covariance function are controlled by two different parameters, and , independently of each other. Each of these parameters can allow any degrees of flexibility.

To get a clear picture of the difference between the new covariance class and the Matérn class, we fix the effective range (ER) at and , where ER is defined as the distance at which a correlation function has value approximately . Then we find the corresponding scale parameter such that the new covariance function has ER at and under different smoothness parameters and different tail decay parameters . For the Matérn class, we find the corresponding range parameter such that the Matérn correlation function has ER at and under smoothness parameters 0.5 and 2.5. These correlation functions are visualized in Figure 2. Clearly we can see different smoothness behaviors near the origin. As the new covariance class has a polynomial tail, its correlation drops much faster than the Matérn class in order to reach the same correlation 0.05 at the same effective range. If is smaller, the new correlation function has a heavier tail, and hence it drops more quickly to reach the correlation 0.05 at the same effective range. After the effective range, the new covariance function with a smaller decays slower than those with larger . As the Matérn class has an exponentially decaying tail, it decays much faster than the new covariance class. This is indicated by the behavior of the Matérn covariance function after the ER compared to the new covariance class.

In Figure 3, we show the realizations from zero mean Gaussian processes with the new covariance class and the Matérn class under different parameter settings. When the distance is within the effective range, the Matérn covariance function results in more large correlations than the new covariance function. This makes the process realizations from the Matérn class smoother even though both the Matérn class and the new covariance class are fixed at the same value for the smoothness parameter. For the new covariance class, if has a smaller value, the corresponding correlation function has more small values within the effective range. This makes the process realizations under the new covariance class look rougher. As we expect, when the effective range and the tail decay parameter are fixed, the process realizations under the new covariance class look smoother for a larger value of the smoothness parameter.

(a)

(b)

(c)

(d)
Fig. 2: Correlation functions for the new covariance class and the Matérn class. The panels (a) and (b) show the correlation functions with the effective range (ER) at . The panels (c) and (d) show the correlation functions with the effective range (ER) at . ER is defined as the distance at which correlation is approximately .

Fig. 3: Realizations over 2000 regular grid points in the domain from zero mean Gaussian processes with the new covariance function model and the Matérn covariance function model under different parameter settings. The realizations from the new covariance are shown in the first three columns and those from the Matérn covariance are shown in the last column. For the first two rows, the effective range (ER) is fixed at 200. For the last two rows, the effective range is fixed at 500. ER is defined as the distance at which correlation is approximately .

3 Theoretical Properties of the Proposed Covariance Class

For an isotropic random field, the property of a covariance function can be characterized by its spectral density. By Bochner’s Theorem (Bochner, 1933), there is a dual form between an isotropic covariance function and its spectral density in if , see also Section 2.10 of Stein (1999). Given the existence of the corresponding spectral density, the origin behavior (low frequency) of the spectral density of a covariance characterizes the large-scale variation of the random process, while its tail behavior (high frequency) characterizes the small-scale variation.

The spectral density of the new covariance function is finite for and infinite for . The following theorem gives the spectral density of the new covariance function in Equation (3) and characterizes the tail behavior of the spectral density, with its proof given in Section S.3 of the Supplementary Material.

Theorem 3 (Spectral density).

If , the new covariance function in Equation (3) admits the following spectral density:

(5)

with the tail behavior

where is a slowly varying function at .

The spectral density of the Matérn class is proportional to for large . By mixing over the range parameter with an inverse gamma mixing density, the high-frequency behavior of the resultant covariance class is proportional to the product of the high-frequency term of the Matérn class and a slowly varying function . This slowly varying function does not change the high-frequency behavior. When the spectral density is finite at any frequency, i.e., , this is another an argument on why the origin behavior of the new covariance class is controlled in the same way as the that of the Matérn class. In what follows, the theoretical results are established based on the condition that .

It is well understood that the spectral density of the Matérn class is well-behaved at high frequencies, equipped with the phenomenon in geostatistics known as “screening effect” (Journel and Huijbregts, 1978), which means that nearby observations yield a good approximation to the optimal linear predictor of a spatial process based on a large set of observations. The screening effect has been studied extensively by Stein (2002, 2011, 2015) who gives conditions on when it holds and does not hold for isotropic random fields. The spectral density of the new covariance class is well-behaved at high frequencies in a way similar to the Matérn class. Notice that for the Matérn class, the following condition holds for any finite ,

This condition effectively implies that the spectral density of the Matérn class at high frequencies changes by a negligible amount with a modest change in frequency (Stein, 2011, 2015). The spectral density of the new covariance at high frequencies differs from that of the Matérn class by a slowly varying multiplicative function and a multiplicative constant that does not depend on frequency. Thus, the spectral density of the new covariance class also displays the screening effect.

3.1 Equivalence Results

As discussed by Zhang (2004), the equivalence of probability measures has important applications to statistical inferences on parameter estimation and prediction. Let ; be two probability measures corresponding to the spectral density for stationary Gaussian process with mean zero in . If is equivalent to , then cannot be correctly distinguished from with with probability 1 regardless of what is observed. Let be a a sequence of distributions on the parameter space and be a sequence of estimators. Then cannot be consistent estimators of for all under the infill asymptotics (or fixed domain asymptotics), where the domain is fixed (bounded) and the locations of observations get denser as the number of observations increases. The second application of equivalence measures concerns the asymptotic efficiency of predictors that is discussed in Section 3.3.

The tail behavior of the spectral densities can be used to check the equivalence of probability measures generated by stationary Gaussian random fields. If for some and for some finite , one has

(6)
(7)

then the two Gaussian measures and are equivalent. For isotropic Gaussian random fields, the condition in Equation (7) can be expressed as

(8)

The details of equivalence of Gaussian measures and the condition for equivalence can be found in a series of works (Stein and Handcock, 1989, Stein, 1988, 1993, 1999). Our first result on equivalence of two Gaussian measures under the new covariance class is given in Theorem 4 with its proof given in Section S.4 of the Supplementary Material.

Theorem 4 (Equivalence measures).

Assume that . Let be the spectral density of the covariance for . Then and are equivalent on the paths of for any bounded infinite set with if and only if

(9)

Notice that if these two covariances are further assumed to have the same tail decay parameter , the above condition becomes .

An immediate consequence of Theorem 4 is that for fixed ; the tail decay parameter , the scale parameter and the variance parameter cannot be estimated consistently under the infill asymptotics. Instead, the quantity is consistently estimable and has been referred to as the microergodic parameter. We refer the readers to page 163 of Stein (1999) for the definition of microergodicity. Similarly, for fixed and , the scale parameter and the variance parameter cannot be estimated consistently. Instead, the microergodic parameter can be estimated consistently. In the next section, we establish the asymptotic properties of maximum likelihood estimation associated with the microergodic parameter.

Theorem 4 gives the result on equivalence measures within the new covariance class. The new covariance can allow the same smoothness behavior as the Matérn class, but it has a polynomially decaying tail that is quite different from the Matérn class. One may ask whether there is an analogous result on the Gaussian measures under the new covariance class and the Matérn class. Theorem 5 provides an answer to this question, with its proof given in Section S.5 of the Supplementary Material.

Theorem 5 (Equivalence measures with Matérn).

Assume that . Let be the spectral density of the new covariance function and be the spectral density of the Matérn covariance function . If

(10)

then and are equivalent on the paths of for any bounded infinite set with

Theorem 5 gives the conditions under which the Gaussian measures under the new covariance class and the Matérn class are equivalent. If the condition in Equation (10) is satisfied, the Gaussian measure under the new covariance class cannot be distinguished from the Gaussian measure under the Matérn class, regardless of what is observed.

Based on Theorem 4, one can study the asymptotic properties of the microergodic parameter . In Section 3.2, the microergodic parameter can be shown to be consistently estimated under infill asymptotics for a Gaussian process under the new covariance model with known . Moreover, one can show that this microergodic parameter converges to a normal distribution.

3.2 Asymptotic Normality

Let be a zero mean Gaussian process with the covariance function , where is a bounded subset of with . Let be a partially observed realization of the process at distinct locations in , denoted by . Then the log-likelihood function is

(11)

where and is an correlation matrix with the correlation function .

In what follows, is assumed to be known and fixed. Let and be the maximum likelihood estimators (MLE) for and by maximizing the log-likelihood function in Equation (11). To show the consistency and asymptotic results for the microergodic parameter, we first obtain an estimator for when is fixed:

Then, let be the maximum likelihood estimator of , as a function of , given by

We have the following result on the asymptotic properties of for arbitrarily fixed values and under the infill asymptotics, with its proof given in Section S.6 of the Supplementary Material.

Theorem 6 (Asymptotics of the MLE).

Let be be an increasing sequence of subsets of a bounded domain . Assume that is fixed. Then under , as , for any fixed and ,

  1. ,

  2. ,

where .

Theorem 6 implies that the estimator of the microergodic parameter converges to the true microergodic parameter, almost surely, when the number of observations tends to infinity in a fixed and bounded domain. This result holds true for any value of . As will be shown, if one replaces with its maximum likelihood estimator in , this conclusion is true as well. The second statement of Theorem 6 indicates that converges to a normal distribution.

A key fact is that the above theorem holds true for arbitrarily fixed . A more practical situation is to estimate and by maximizing the log-likelihood . For notational convenience, we use instead of to denote the microergodic parameter when needed. We discuss three situations. In the first situation, we consider joint estimation of and for fixed . The MLE of will be denoted by , and the MLE of the microergodic parameter is . In the second situation, we consider joint estimation of and for fixed . The MLE of will be denoted by and the MLE of the microergodic parameter is . In the third situation, we consider joint estimation of all parameters . Note that the MLEs of either or (or both) are typically computed numerically, since there is no closed-form expression. Theorem 6 can be used to show that has the same asymptotic properties as for any fixed . The following lemma is needed to prove the asymptotic behavior of and under infill asymptotics, with its proof given in Section S.7 of the Supplementary Material.

Lemma 1.

Suppose that is the dimension of the domain and is a vector of observations in . For any and with and , , where and , we have the following results:

  • for any fixed .

  • for any fixed .

Theorem 7 (Asymptotics of the MLE: joint estimation).

Let be an increasing sequence of subsets of . Assume that and . Let be the MLE of the microergodic parameter over for any fixed , be the MLE of the microergodic parameter over for any fixed , and be the MLE of the microergodic parameter over . Then under , as , the following results can be established:

  1. and for any fixed .

  2. and for any fixed .

  3. and .

Proof.

We define sequences, , , , and . It follows from Lemma 1 that and . Applying Theorem 6 yields the desired results for and . To show the result for , it suffices to show that according to Lemma 1. In fact, we have, and . ∎

The first two results of Theorem 7 imply that the microergodic parameter can be estimated consistently by either fixing the tail decay parameter or the scale parameter . In practice, fixing either or may be too restrictive for modeling spatial processes. For instance, the microergodic parameter in the Matérn class can be estimated consistently when its range parameter is fixed. However, using the maximum likelihood estimator by jointly maximizing the likelihood over the variance parameter and the range parameter dramatically improves the prediction efficiency in a finite sample case (Kaufman and Shaby, 2013). Similarly, we would also expect that the finite sample prediction performance can be improved by jointly maximizing the variance parameter , the tail decay parameter , and the scale parameter for the new covariance class.

The third result of Theorem 7 establishes that the microergodic parameter can be consistently estimated by jointly maximizing the log-likelihood (11) over the parameters and . However, the current result requires that has to be greater than half of the dimension of the study domain. This means that the new covariance function cannot decay too slowly in its tail in order to establish the consistency result for the maximum likelihood estimator of the microergodic parameter. Nevertheless, this result shows a significant improvement over existing asympototic normality results for other types of long-range dependent covariance functions. For instance, it was shown by Bevilacqua and Faouzi (2019) that the microergodic parameter in the generalized Cauchy class can be estimated consistently under infill asymptotics. However, their results assume that the parameter that controls the tail behavior is fixed. This is similar to the first result of Theorem 7. Unlike their results, a pleasing theoretical improvement in Theorem 7 is that the asymptotic results for the microergodic parameter can be obtained for the joint estimation of all three parameters, including the parameter that controls the decay of the tail.

3.3 Asymptotic Prediction Efficiency

This section is focused on prediction of Gaussian process at a new location . This problem has been studied extensively when an incorrect covariance model is used. Our focus here is to show the asymptotic efficiency and asymptotically correct estimation of prediction variance in the context of the new covariance class. Stein (1988) shows that both of these two properties hold when the Gaussian measure under a misspecified covariance model is equivalent to the Gaussian measure under the true covariance model. In the case of the new covariance class, Theorem 4 gives the conditions for equivalence of two Gaussian measures in the light of the microergodic parameter .

Under the new covariance model , we define the best linear unbiased predictor for to be

(12)

where is an -dimensional vector. This predictor depends only on . It is a misunderstanding of asymptotic results that if one fixes , the prediction will improve as grows due to the way converges. This fact was also pointed out for the Matérn class by Kaufman and Shaby (2013).

If the true covariance is , the mean squared error of the predictor in Equation (12) is given by

If , i.e., and , the above expression simplifies to

(13)

If the true model is , analogous expressions can be derived for .

Let be two spectral densities associated with two probability measures . Stein (1993) shows that the condition (6) together with as implies that the best linear predictor (BLP) under a misspecified probability measure is asymptotically equivalent to the BLP under the true measure . The following results are immediate.

Theorem 8.

Suppose that are two Gaussian probability measures defined by a zero mean Gaussian process with the new covariance class for on . If , then it follows that

  • As ,

  • Moreover, if , then as ,

Proof.

Let be the spectral density of the new covariance functions with . Note that is finite. If the condition in Equation (9) is satisfied, then,

These two statements follow by applying Theorem 1 and Theorem 2 of Stein (1993). ∎

Part (a) of Theorem 8 implies that if the smoothness parameter of is correctly specified, any values for and will result in asymptotically efficient estimates. The condition is not necessary for asymptotic efficiency, but it provides asymptotically correct estimate of the mean squared prediction error. The quantity is the mean squared prediction error (MSPE) for under the model , while the quantity is the true MSPE for under the true model . In practice, it is common to estimate model parameters and then prediction is made by plugging in these estimates into Equations (12) and (13). Next, we show the same convergence results if is fixed at , but is estimated via maximum likelihood method. This is one extension of Part (b) of Theorem 8, with its proof given in Section S.8 of the Supplementary Material.

Corollary 2.

Suppose that are two Gaussian probability measures defined by a zero mean Gaussian process with the new covariance class for on . Let . If , then it follows that almost surely under , as ,

One can conjecture that the result in Corollary 2 still holds if is replaced by its maximum likelihood estimator, but its proof seems elusive. Theorem 8 and Corollary 2 demonstrate the asymptotic prediction efficiency for the new covariance class. The following results are established to show the asymptotic efficiency of the best linear predictor under the new covariance class when the true Gaussian measure is generated by a zero-mean Gaussian process under the Matérn class.

Theorem 9.

Let be the Gaussian probability measure defined by a zero mean Gaussian process with the Matérn covariance class and be the Gaussian probability measure defined by a zero mean Gaussian process with the new covariance class on . If and the condition in Equation (10) is satisfied, then it follows that under the Gaussian measure , as ,

for any fixed and .

Proof.

Let be the spectral density of the Matérn covariance function and be the spectral density of the covariance function . Notice that the spectral density of the Matérn covariance satisfies the condition (6). It suffices to show that . Let and . If