Visualization and Assessment of Spatio-temporal Covariance Properties
Huang Huang111 CEMSE Division, King Abdullah University of Science and Technology, Thuwal 23955-6900 Saudi Arabia. E-mail: email@example.com; firstname.lastname@example.org and Ying Sun
July 3, 2019
Spatio-temporal covariances are important for describing the spatio-temporal variability of underlying random processes in geostatistical data. For second-order stationary processes, there exist subclasses of covariance functions that assume a simpler spatio-temporal dependence structure with separability and full symmetry. However, it is challenging to visualize and assess separability and full symmetry from spatio-temporal observations. In this work, we propose a functional data analysis approach that constructs test functions using the cross-covariances from time series observed at each pair of spatial locations. These test functions of temporal lags summarize the properties of separability or symmetry for the given spatial pairs. We use functional boxplots to visualize the functional median and the variability of the test functions, where the extent of departure from zero at all temporal lags indicates the degree of non-separability or asymmetry. We also develop a rank-based nonparametric testing procedure for assessing the significance of the non-separability or asymmetry. The performances of the proposed methods are examined by simulations with various commonly used spatio-temporal covariance models. To illustrate our methods in practical applications, we apply it to real datasets, including weather station data and climate model outputs.
Some key words: full symmetry, functional boxplot, functional data ranking, rank-based test, separability, spatio-temporal covariance
Spatio-temporal covariance functions play an important role in parametrically modeling geostatistical data, particularly for Gaussian random fields. Spatio-temporal models often have complex structures and rely on various assumptions to simplify the model and reduce the computational burden (Gneiting et al., 2006), such as full symmetry and separability (Cressie and Huang, 1999; Kyriakidis and Journel, 1999; Gneiting, 2002; Stein, 2005). Specifically, let be a second-order stationary random process at spatial location , and temporal point The covariance function is positive definite and only depends on the spatial lag and temporal lag . Then, the covariance function is called fully symmetric if , and separable if , for any spatial lag and temporal lag . One can show that separability implies full symmetry. The separable covariance model is a product of purely spatial and temporal covariances, ignoring the interaction between space and time; the symmetric covariance model acknowledges the interaction but assumes that the spatio-temporal cross-covariances depend on neither the direction of temporal lags nor the direction of spatial lags. Although these models with simplified structures are easier to fit, they have limitations and may not be realistic in practical applications. In separable models, even small spatial changes can lead to large changes in the correlations, causing a lack of smoothness (Stein, 2005), and full symmetry is often violated when data are influenced by dynamic processes with a prevailing flow direction (Gneiting, 2002).
Therefore, it has been an active research area to develop more flexible spatio-temporal covariance models motivated by real applications. Jones and Zhang (1997) developed families of spectral densities that lead to non-separable covariance models without a closed form. Cressie and Huang (1999) proposed families of non-separable covariance functions based on the Fourier transform of non-negative, finite measures with explicit expressions, but these are limited to classes with known analytical solutions of Fourier integrals. Later, Gneiting (2002) extended this approach to more general classes with Fourier-free implementations. All these models are non-separable but fully symmetric. Stein (2005) generated covariance functions that are isotropic but not fully symmetric by taking derivatives of fully symmetric models. Gneiting et al. (2006) proposed anisotropic, asymmetric models by adding a compactly supported Lagrangian correlation function to a fully symmetric covariance model, where the coefficient of the Lagrangian component controls the extent of asymmetry.
In real data analysis, before choosing a covariance function from the existing models, it is necessary to assess the properties of separability and symmetry either by exploratory data analysis or through formal hypothesis tests. However, it is challenging to visualize and assess such properties from spatio-temporal observations. Brown et al. (2001) determined the separability from the closeness to zero of the maximum likelihood estimates of blurring parameters associated with separability, without considering the uncertainty. Shitan and Brockwell (1995) developed an asymptotic test for separability, but it is restricted to spatial autoregressive processes. Fuentes (2006) presented a test of separability in the spectral domain using a simple two-factor analysis of variance. Regarding likelihood-based tests, Mitchell et al. (2006) proposed a likelihood ratio test of separability for replicated multivariate data that examines whether the covariance matrix is a Kronecker product of two matrices of smaller dimensions. Mitchell et al. (2005) extended this approach to test the separability of spatio-temporal covariances by partitioning the observations into approximate replicates. After Scaccia and Martin (2005) presented tests of symmetry and separability for lattice processes using the periodogram, Li et al. (2007) developed a unified testing framework for both separability and symmetry by constructing contrasts of covariances at selected spatio-temporal lags.
Existing methods mostly build separability and symmetry tests based on given spatial and temporal lag sets, and work on visualization is sparse. In this paper, we propose formulating separability and symmetry test functions as functional data, and provide a visualization tool to illustrate these properties. These test functions are functions of temporal lags and constructed from the cross-covariances of each pair of the time series observed at spatial locations. With the obtained functional data of test functions, we develop rank-based testing procedures that are model-free with data depth-based functional data ranking techniques. Because the test statistics are rank-based, the results are more robust to outliers, which may come from erroneous measurements of the variables under study or poor estimates of the covariances due to limited sample size.
The rest of the paper is organized as follows. In §2, we introduce separability and symmetry test functions, which are used to visualize the covariance property of separability and symmetry, and provide the detailed rank-based testing procedure. In §3, we illustrate the visualization tools for simulated data and demonstrate the rank-based test for various spatio-temporal covariance models. In §4, we apply our methods to two real datasets, wind speed from monitoring stations, and surface temperatures and wind speed from numerical model outputs, to study their covariance structures.
2.1 Non-separable covariance functions
Gneiting (2002) introduced the following rich family of non-separable spatio-temporal covariance functions, which includes separable ones as a special case,
where is the variance, determine the temporal and spatial range, control the temporal and spatial smoothness, and is the spatio-temporal interaction. The covariance function is separable when , and a larger is associated with a more non-separable model.
Cressie and Huang (1999) also proposed a non-separable model,
and a possible corresponding separable model was discussed in Mitchell et al. (2005) as,
Later, Rodrigues and Diggle (2010) further defined the properties of positive and negative non-separability and showed that the two aforementioned models are both positively non-separable. To investigate models of negative non-separability, we consider the one from Cesare et al. (2001),
where , , and and are valid spatial and temporal covariance functions, respectively.
Table 1 summarizes the specific models used to illustrate our methods, after we plugged in the appropriate parameter values, chose suitable building covariance functions, and applied some trivial transformations, as necessary.
2.2 Visualization for separability
Noting that separable covariance functions require , for any spatial lag and temporal lag , or equivalently, remains a constant for any given . Therefore, we introduce the separability test function defined below.
Given a valid spatio-temporal covariance function , the separability test function is a function of temporal lag for any spatial lag , defined as
By Definition 1, is for any and if is separable, and moves away from zero for non-separable models. Suppose are a set of test functions for pairs of locations with spatial lags . We use functional boxplots (Sun and Genton, 2011) to visualize these separability test functions. The motivation is that functional boxplots can characterize the most representative functional realization, or the functional median, and remove the outliers, which are often involved in the datasets, especially when we are going to use sample covariances to estimate the test functions. Functional boxplots order the functional observations by their functional data depths, identify the observation with the largest depth value as the functional median and construct the central region that covers half of the data with largest depth values. Although depth values in the functional boxplot are calculated by the modified band depth proposed by (López-Pintado and Romo, 2009) by default, it is also possible to choose other depth notions developed for functional data, such as integrated data depth (Fraiman and Muniz, 2001), half-region depth (López-Pintado and Romo, 2011), and extremal depth (Narisetty and Nair, 2016). Outliers are detected by the times the central region empirical rule. The factor can be adjusted for outlier detection proposes in an adjusted functional boxplot, but the adjustment is not necessary for visualizing functional data (Sun and Genton (2012a)). Therefore, it is worth pointing out that we use the functional boxplot for visualizing covariance properties in this paper rather than detecting outliers. Alternatively, contour plots are the traditional way to visualize the pattern of covariances with respect to different spatial and temporal lags. We show next that the functional boxplots provide a much better visualization and interpretation of the separability than the contour plots.
Firstly, we use functional boxplots of the separability test functions and contour plots of the covariances to visualize the Gneiting model for different values of . The spatial locations are on a regular grid within the unit square . The top panels in Figure 1 show the functional boxplots applied to the separability test functions for the Gneiting model with different values of . Each curve is associated with a specific pair of spatial locations. The central region (magenta area) and the median (black curve) move away from zero as increases. The bottom panels in Figure 1 show the contours of the covariances for different spatial and temporal lags. In the separable case, the covariances at non-zero temporal lags should decay more rapidly as the spatial lag increases. We do see this trend in the contour plots; however, the separability is much more obvious in the functional boxplot.
Similarly, we draw the same plots to compare the Cressie and Huang separable model and the Cressie and Huang non-separable model , and the Cesare model for different values of . The results are shown in Figures 2 and 3. The functional boxplots clearly show the separability or non-separability of each test function, and, if non-separable, the positive or negative non-separability. While the contour plots show different patterns and the separable covariance decays much faster than non-separable covariances at non-zero temporal lags along an increasing spatial lag, the indication for separability is still unclear.
2.3 Visualization and rank-based testing procedure for real data
In §2.1–2.2, we discussed the visualization of true spatio-temporal covariance functions, and we saw that the functional boxplots produce intuitive indications of separability in the test functions. However, the true covariance model is unknown in practice, and we must use sample covariance estimators to estimate the test functions without assuming any parametric model to fit. For each pair of spatial locations, we use the two time series to estimate all the components in the separability test function by sample covariances. In other words, if we have observations at time points , at two locations and with a spatial lag , then the test function is estimated by
and other quantities are calculated similarly.
With these estimated separability test functions, we also develop a rank-based test for the separability hypothesis. López-Pintado and Romo (2009) proposed a test for functional data to decide whether two sets of samples are from the same distribution. It can be viewed as a functional data generalization of the univariate Wilcoxon rank test and has many applications. For example, Sun and Genton (2012b) proposed a robust functional analysis of variance (ANOVA) model for testing whether the differences in functional data observed from multiple groups are significant. The main idea is to see whether the positions of the samples from the two sets are similar, with respect to a reference distribution. We follow a similar procedure to test the null hypothesis, separability, by comparing the given observations to a simulated dataset under : separability. In this rank-based test, two additional datasets are required. One follows the distribution under the null hypothesis , and the other is a reference dataset following the distribution under the null hypothesis or the same distribution as the observations. We propose to generate these two datasets under by simulating data from a constructed separable covariance function , for which we estimate the covariance at observed spatial and temporal lags by sample covariances, respectively, and let . After we get all estimated separability test functions, the hypothesis test can be performed to compare the two populations. More details can be found in the paper by López-Pintado and Romo (2009), and we briefly explain the procedure as follows:
Estimate the test functions for the observations, denoted by , .
Generate two sets of spatio-temporal datasets from the constructed separable covariance function and estimate the test functions, denoted by , and , .
For each , combine and all for , and compute their functional band depths and modified band depths (López-Pintado and Romo, 2009). Order these functions by their band depth values, and use modified band depths to separate ties. Let be the proportion of functions with smaller depth values than .
Repeat Step 3 for each , and let be the proportion of functions with smaller depth values than .
Order from smallest to largest, and assume the ranks of are . Then the test statistics is .
Critical values can be determined from the asymptotic distribution of the test statistics, which is the sum of numbers that are randomly chosen from . In practice, however, we found that the distribution with a limited sample size is not close enough to the asymptotic one. Hence, we use the bootstrap to calculate the critical values. Under , we generate sets of spatio-temporal datasets from . We repeat Steps 1–5 and obtain the test statistics for each dataset, which provides an approximate distribution under the null hypothesis. We examine the performance of this technique through simulations in §3, where we use .
2.4 Symmetric covariance functions and visualization
Gneiting et al. (2006) proposed the following family of asymmetric covariance models that include symmetric ones as a special case,
where , is the first component of the spatial lag , determines the symmetry, is the velocity along the direction of , and correspond to the temporal and spatial ranges, corresponds to smoothness, and corresponds to the separability. After plugging in appropriate parameter values, we get the following covariance model to study symmetry, referred as the Gneiting model .
We know that a fully symmetric covariance function satisfies . Analogously to Definition 1, we introduce the symmetry test functions with definition below.
Given a valid spatio-temporal covariance model , the symmetry test function is a function of temporal lag for any spatial lag , defined as
It is easy to see that remains zero when the underlying covariance model is fully symmetric. An illustration is given in Figure 4 for the Gneiting model with different values of , where we see the symmetry test function moves away from zero as increases. Contour plots of the covariances are also shown in the bottom panels of Figure 4. Since the covariance relies on the temporal lag and both the first and second components of the spatial lag , we fix , and draw the contours for different values of and . It is easy to observe that the covariances are clearly asymmetric when . For real data, the estimated symmetry test functions are used in the functional boxplots, and a similar rank-based testing procedure described in §2.3 is developed to test the significance of the asymmetry.
3.1 Simulation design
Spatio-temporal data collected from monitoring sites are often sparse in space but dense in time. In this simulation, we consider data from regularly spaced locations in the unit square , and (a comparable number to many real applications) equally spaced time points.
Noting that the correlation between observations with temporal lags greater than is negligible in each model, we generate our spatio-temporal data sequentially in time. Let be a matrix of size , indicating the spatio-temporal observations from time point to at the locations, for . We first generate , and then generate , , only conditioning on . Thus the entire generated spatio-temporal data is .
3.2 Visualization and assessment for separability
The Gneiting model with five different values of , , , and is used to generate observations . Three examples of the estimated separability test functions are shown in Figure 5. We see that, in the functional boxplot, the distance between the median (black line) and zero (green dashes) indicates the degree of separability. As increases, the majority of the separability test functions move away from zero. This gives us a clear visualization of the separability.
|Type I error||Power|
To formally test the separability and examine the performance of the proposed rank-based testing procedure in §2.3, we show the Type I error and power for simulated data over simulations. The results are shown in Table 2. We see the Type I error is close to the nominal level of the hypothesis test, and the power for non-separable models increases with an increasing . The power is quite close to one when approaches .
In addition, we also take a look at the Cressie-Huang models and . Estimated separability test functions from one simulated dataset are shown in Figure 6. We see the majority of the separability test functions from the non-separable model move away from zero. Similarly, a rank-based test is used to examine the performance; the results are shown in Table 3.
3.3 Visualization and assessment for symmetry
We choose five different values of , , , , and in the Gneiting model to generate observations . The reason for choosing relatively small values for is that the power of the rank-based test is already strong enough for small values of . Three examples of the estimated symmetry test functions are shown in Figure 7. Since the chosen values of are very small, the deviation of the symmetry test functions from zero in our most extreme case when is still subtle; see Figure 4. Although the functional medians of the estimated test functions do not show obvious changes for the different values of , we can still see that the majority of the symmetry test functions move away from zero as increases. Furthermore, the results of the rank-based test in Table 4 show that the power is very strong, even when is as small as , and the Type I errors are all close to the nominal levels.
|Type I error||Power|
4.1 Wind speed
Wind speed is an important atmospheric variable in weather forecasting and many other environmental applications. One source of wind speed data is from monitoring stations. In this section, we use our proposed method to visualize and assess properties of covariances in the wind speed from ten monitoring stations in the northwestern U.S.A. The locations are shown in Figure 8, where we divide them into two regions. Coastal sites are in green, and inland sites are in orange. We analyze the hourly data observed at each station from 2012 to 2013 and visualize the spatio-temporal covariance structure from two seasons, summer (June to August) and winter (December to February).
The plots of the estimated separability test functions are shown in Figure 9, and the p-value results of the rank-based test are given in the titles. We see the separability test functions are away from zero in all cases, and all the p-values for testing separability are much smaller than , indicating rejection of separability at . When computing the p-value, we choose replicates from the bootstrap to approximate the distribution of the test statistics under the null hypothesis. Since the covariance functions for all the four cases are significantly non-separable, we next investigate their properties of symmetry. The symmetry test functions as well as the p-values of the test for different cases are shown in Figure 10. The functional boxplot for the coastal region in winter suggests the strongest symmetry, and its corresponding p-value is indeed the largest, while p-values for the other cases are much smaller than . One possible reason might be that there is no prevailing wind direction during the winter in the coastal region.
4.2 Climate model outputs
In climate studies, numerical models are used to simulate many important variables by solving a set of dynamic equations. Two types of models are available: General Circulation Model (GCM) is used to describe the dynamic system of the entire earth, with a global coverage but a comparatively low resolution; however, Regional Climate Model (RCM) with GCM outputs as their boundary conditions is used to simulate locally high-resolution data. Different combinations of GCMs and RCMs may lead to different results. In this section, we apply our visualization and assessment tools to data based on four combinations of two GCMs, GFDL and HADCM3, and two RCMs, ECP2 and HRM3, which are provided by the North American Regional Climate Change Assessment Program (NARCCAP) (Mearns et al., 2012). More details about these numerical models can be found at http://www.narccap.ucar.edu/. We focus on a small region which consists of spatial locations in the North Atlantic Ocean (as shown in Figure 11) and study the daily surface temperature and wind speed from 1976 to 1979.
We first look at the mean of the surface temperature across the four years at each location, which is shown in Figure 12. We see different GCMs as boundary conditions give different patterns of surface temperatures, while different RCMs show similar results. We then apply our visualization and assessment tools to the covariance structures of these four GCM and RCM combinations. To eliminate seasonality, we remove the monthly mean from the daily temperatures. The estimated separability test functions and the p-values of the rank-based test for separability are shown in Figure 13. With all the p-values greater than , there is no significant evidence that the covariances of these four cases are non-separable at the significance level. However, the output from GFDL and HRM3 shows stronger non-separability and the separability is rejected at the significance level with a p-value .
Next, we look at the daily wind speed in the same area. The mean wind speeds at these locations are all close to zero and do not show much difference. Again, we remove the monthly mean from the daily wind speed and then apply our visualization and assessment methods. The estimated separability test functions are shown in Figure 14, where the medians are away from zero. These test functions show large variability in the functional boxplot, and all the p-values from the rank-based tests are as small as zero, indicating strong separability. To further investigate the symmetry, we plot the estimated symmetry test functions; the illustration is given in Figure 15. All the symmetry test functions are also very far from zero, and we find that the p-values of the test for symmetry are zero. Therefore, we conclude that the covariance of wind speed is neither separable nor symmetric.
In this application, the covariance of the daily surface temperatures tends to be separable and thus also symmetric; for daily wind speed, the covariance is non-separable and asymmetric. Our analysis suggests that, for the region we have considered, the daily surface temperatures produced by the climate models do not show significant space-time interaction, whereas the daily wind speeds clearly show asymmetric space-time interaction.
We presented a functional data analysis approach to visualizing and assessing spatio-temporal covariance properties. The proposed method is suitable for visualizing the separability and the symmetry of a stationary spatio-temporal process. The p-value calculated in the rank-based test can be used to measure the degree of separability or symmetry. We illustrated our methods using various classes of spatio-temporal covariance models, and our simulations demonstrated the good performance of our proposed methods. In the practical applications, we applied our method to temperature and wind speed data from either monitoring sites or climate model outputs, and illustrated how to interpret the separability and symmetry in these different scenarios.
However, we have only considered spatio-temporal datasets collected from a small number of spatial locations. Because we need to estimate the test functions for each pair of locations, computation will become an issue as the number of locations grows. One possible solution is to divide the region of interest into smaller subregions, and estimate all the test functions separately for each of these subregions. Then, the obtained test functions can be combined for the visualization and the rank-based test. This would significantly reduce the number of test functions, and each small subregion is more likely to be stationary. However, future research is required to assess the performance of this approach.
We wish to thank the North American Regional Climate Change Assessment Program (NARCCAP) for providing the data used in this paper. NARCCAP is funded by the National Science Foundation (NSF), the U.S. Department of Energy (DoE), the National Oceanic and Atmospheric Administration (NOAA), and the U.S. Environmental Protection Agency Office of Research and Development (EPA).
- Brown et al. (2001) Brown, P. E., P. J. Diggle, M. E. Lord, and P. C. Young (2001). Space-time calibration of radar rainfall data. Journal of the Royal Statistical Society Series C Applied Statistics 50(2), 221–241.
- Cesare et al. (2001) Cesare, L. D., D. E. Myers, and D. Posa (2001). Estimating and modeling space-time correlation structures. Statistics and Probability Letters 51(1), 9–14.
- Cressie and Huang (1999) Cressie, N. and H. Huang (1999). Classes of nonseparable, spatio-temporal stationary covariance functions. Journal of the American Statistical Association 94, 1330–1340.
- Fraiman and Muniz (2001) Fraiman, R. and G. Muniz (2001). Trimmed means for functional data. Sociedad de Estadistica e Investigacion Operativa 10(2), 419–440.
- Fuentes (2006) Fuentes, M. (2006). Testing for separability of spatial-temporal covariance functions. Journal of Statistical Planning and Inference 136(2), 447–466.
- Gneiting (2002) Gneiting, T. (2002). Nonseparable, stationary covariance functions for spaceâtime data. Journal of the American Statistical Association 97(458), 590–600.
- Gneiting et al. (2006) Gneiting, T., M. G. Genton, and P. Guttorp (2006). Geostatistical space-time models, stationarity, separability, and full symmetry. Monographs On Statistics and Applied Probability, 151–175.
- Jones and Zhang (1997) Jones, R. H. and Y. Zhang (1997). Models for continuous stationary space-time processes. In T. G. Gregoire, D. R. Brillinger, P. J. Diggle, E. Russek-Cohen, W. G. Warren, and R. D. Wolfinger (Eds.), Modelling Longitudinal and Spatially Correlated Data, pp. 289–298. New York, NY: Springer New York.
- Kyriakidis and Journel (1999) Kyriakidis, P. C. and A. G. Journel (1999). Geostatistical spaceâtime models: a review. Mathematical Geology 31(6), 651–684.
- Li et al. (2007) Li, B., M. G. Genton, and M. Sherman (2007). A nonparametric assessment of properties of spaceâtime covariance functions. Journal of the American Statistical Association 102(478), 736–744.
- López-Pintado and Romo (2009) López-Pintado, S. and J. Romo (2009). On the concept of depth for functional data. Journal of the American Statistical Association 104(486), 718–734.
- López-Pintado and Romo (2011) López-Pintado, S. and J. Romo (2011). A half-region depth for functional data. Computational Statistics and Data Analysis 55(4), 1679–1695.
- Mearns et al. (2012) Mearns, L. O., R. Arritt, S. Biner, M. S. Bukovsky, S. McGinnis, S. Sain, D. Caya, J. Correia, D. Flory, W. Gutowski, E. S. Takle, R. Jones, R. Leung, W. Moufouma-Okia, L. McDaniel, A. M. B. Nunes, Y. Qian, J. Roads, L. Sloan, and M. Snyder (2012). The North American regional climate change assessment program: overview of phase I results. Bulletin of the American Meteorological Society 93(9), 1337–1362.
- Mitchell et al. (2005) Mitchell, M. W., M. G. Genton, and M. L. Gumpertz (2005). Testing for separability of space-time covariances. Environmetrics 16(8), 819–831.
- Mitchell et al. (2006) Mitchell, M. W., M. G. Genton, and M. L. Gumpertz (2006). A likelihood ratio test for separability of covariances. Journal of Multivariate Analysis 97(5), 1025–1043.
- Narisetty and Nair (2016) Narisetty, N. N. and V. N. Nair (2016). Extremal depth for functional data and applications. Journal of the American Statistical Association 111(516), 1705–1714.
- Rodrigues and Diggle (2010) Rodrigues, A. and P. J. Diggle (2010). A class of convolutionâbased models for spatioâtemporal processes with nonâseparable covariance structure. Scandinavian Journal of Statistics 37(4), 553–567.
- Scaccia and Martin (2005) Scaccia, L. and R. J. Martin (2005). Testing axial symmetry and separability of lattice processes. Journal of Statistical Planning and Inference 131(1), 19–39.
- Shitan and Brockwell (1995) Shitan, M. and P. J. Brockwell (1995). An asymptotic test for separability of a spatial autoregressive model. Communications in Statistics - Theory and Methods 24(8), 2027–2040.
- Stein (2005) Stein, M. L. (2005). Spaceâtime covariance functions. Journal of the American Statistical Association 100(469), 310–321.
- Sun and Genton (2011) Sun, Y. and M. G. Genton (2011). Functional boxplots. Journal of Computational and Graphical Statistics 20(2), 316–334.
- Sun and Genton (2012a) Sun, Y. and M. G. Genton (2012a). Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics 23(1), 54–64.
- Sun and Genton (2012b) Sun, Y. and M. G. Genton (2012b). Functional median polish. Journal of Agricultural, Biological, and Environmental Statistics 17(3), 354–376.