In this paper we aim to propose two models for regression and dependence analysis when dealing with positive spatial or spatio-temporal continuous data. Specifically we propose two (possibly non stationary) random processes with Gamma and Weibull marginals. Both processes stem from the same idea, namely from the transformation of a sum of independent copies of a squared Gaussian random process. We provide analytic expression for the bivariate distributions and we study the associated geometrical and extremal properties. Since maximum likelihood estimation method is not feasible, even for relatively small data-set, we suggest to adopt the pairwise likelihood. The effectiveness of our proposal is illustrated through a simulation study that we supplement with a new analysis of well-known dataset, the Irish wind speed data (Haslett and Raftery, 1989). In our example we do not consider a preliminary transformation of the data differently from the previous studies.
On modelling positive continuous data with spatio-temporal dependence
Moreno Bevilacqua 111 Universidad de Valparaiso, Department of Statistics, Valparaiso, Chile.
, Christian Caamaño 222 Universidad del BioBio, Department of Statistics, Concepcion, Chile
, Carlo Gaetan 333 Università Ca’ Foscari di Venezia, DAIS, 2 Venice, Italy.
July 31, 2019
Keywords: Linear Prediction; Non-Gaussian data; Pairwise likelihood; Regression; Tail dependence, Wind data.
Climatology, Environmental Sciences and Engineering, to name some fields, show an increasing interest in statistical analyses of spatial and/or temporal data. In order to model the inherent uncertainty of the data, Gaussian random processes play a fundamental role (see Cressie, 1993; Cressie and Wikle, 2011, for instance). Indeed, they have to offer marginal and dependence modelling in terms of mean and covariance, methods of inference well studied and scalable for large dataset (Heaton et al., 2017) and optimality in the prediction (Stein, 1999).
However, in many applications, the Gaussian framework is unrealistic because the observed data are defined on the real line and exhibits specific features such as negative or positive asymmetry and/or heavy tails.
Transformations of Gaussian (trans-Gaussian) processes is a general approach to model this kind of data by applying some nonlinear transformations to the original data (De Oliveira et al., 1997; Allcroft and Glasbey, 2003; De Oliveira, 2006). However, it can be difficult to find an adequate non linear transformation and some appealing properties of the latent Gaussian process may not be inherited by the transformed process. A flexible trans-Gaussian process based on the tukey distribution has been proposed in Xua and Genton (2017). Wallin and Bolin (2015) proposed non-Gaussian processes derived from stochastic partial differential equations to model non-Gaussian spatial data. Nevertheless this method is restricted to the Matérn covariance model and its statistical properties are much less understood then the Gaussian process.
The copula framework has been adapted in the spatial context, in order to account for possible deviations from the Gaussian distribution, for instance in Kazianka and Pilz (2010), Masarotto and Varin (2012) and Gräler (2014). Alternatively, convolution and mixing of a Gaussian with a non Gaussian process are two appealing strategies for modeling non-Gaussian data. For instance, Zhang and El-Shaarawi (2010) proposed a Gaussian-Half Gaussian convolution in order to construct a process with marginal distribution of the skew-Gaussian type (Azzalini and Capitanio, 2014) and Palacios and Steel (2006) proposed a Gaussian-Log-Gaussian scale mixing approach in order to accommodate processes, with heavy-tailed marginal distributions.
In some cases the observed data has the additional feature to be defined only on the positive real line. In particular, data collected in a range of studies such as wind speeds (Pryor and Barthelmie, 2010), ocean surface currents (Galanis et al., 2012) and rainfalls (Neykov et al., 2014) take continuous positive values and exhibit skewed sampling distributions. For this kind of data only a subset of the aforementioned approaches is suitable, such as as for instance the Log-Gaussian process (Chilès and Delfiner, 1999; De Oliveira, 2006).
In this paper we introduce two examples of stationary random processes with Gamma and Weibull marginal distributions for modelling positive spatial or spatio-temporal continuous data.
The Gamma process is obtained by rescaling a sum of independent copies of a squared Gaussian random process and the Weibull process is obtained as a power transformation of a specific Gamma process. Thus our proposal is different from the trans-Gaussian random processes and the copula models since they exploit just one copy. It is also possible to account for non stationary observations by a simple modification of the stationary versions of the processes.
We provide analytical expressions of the bivariate distributions and we study the associated geometrical properties. It turns out that the geometrical properties of the Gaussian process are inherited by the Gamma and Weibull processes. Since the Weibull distribution can be used for modelling extreme values (see Opitz, 2016, for instance), we also study the extremal properties of the Weibull process i.e. when the observations overcome a ‘high’ threshold.
A limitation of our modelling proposal is the difficulty of evaluating the multivariate density that prevents an inference approach based on the likelihood and the derivation of an optimal predictor that minimizes the mean square prediction error. For this reason we investigate the use of a weighted version of the pairwise likelihood (Lindsay, 1988; Varin et al., 2011) for estimating the unknown parameters. Moreover two linear and unbiased predictors are proposed following the approach detailed in De Oliveira (2014).
We have carried out a designed simulation experiment, in the spatio-temporal setting, in order to investigate and compare the sampling properties of the estimators. Then the effectiveness of our models is illustrated through a new analysis of a well-known dataset, the Ireland wind data in Haslett and Raftery (1989). Our study of the wind speed is performed on the raw data without a preliminary transformation differently from the previous literature (Gneiting, 2002b; Stein et al., 2004; Stein, 2005; De Luna and Genton, 2005).
The remainder of the paper is organized as follows. In Section 2, we introduce the random processes and we describe their features. Section 3 starts with a short description of the estimation method and ends with tackling the prediction problem. In Section 4 we report the numerical results of the simulation study and the re-analysis of the Irish Wind speed data. Finally some concluding remarks are consigned to Section 5.
2 Random processes with Gamma and Weibull marginals.
The definition of the random processes starts by considering a ‘parent’ Gaussian random process , where represents a location in the domain . Spatial () or spatio-temporal examples () will be considered indifferently. We also assume that is stationary with zero mean, unit variance and correlation function where .
Let be independent copies of and define the random process as
It turns out that is a stationary random process with marginal distribution . Here the pairs are the shape and rate parameter of the Gamma distribution i.e. and for all .
A possible drawback for the distribution is that it is a limited model for continuous positive data due to the restrictions to the half integers for the shape parameter. A closer look to the problem reveals that overcoming this limitation is not easy.
The Laplace transform of the random vector is given by
where is the identity matrix, and is the correlation matrix of (Vere-Jones, 1997, Proposition 4.3). Therefore the finite dimensional distribution of is a particular instance of the finite dimensional distribution of a -permanental process (Eisenbaum and Kaspi, 2009) for which the Laplace transform satisfies
In the last formula is any positive real, is a matrix such that its inverse is a diagonally dominant -matrix and . It turns out that the permanental process has marginal distribution .
Thus one may be tempted to identify the matrix with a correlation matrix that comes from the Gaussian process and, possibly, define a process with more flexible marginals. However this is identification holds only for particular instances of Gaussian processes since the random process has to be infinitely divisible (Eisenbaum and Kaspi, 2006). The condition is satisfied, for instance, by the stationary Gaussian random process defined on with exponential correlation function.
The multivariate density of can be expressed as an infinite series where each element of the series is a double sum of order involving the computation of Laguerre polynomials and a determinant of a square matrix (Krishnamoorthy and Parthasarathy, 1951). However analytical forms can be derived only in some special cases (Royen, 2004) and this fact prevents their use for the maximum likelihood estimation.
On the other hand the evaluation of the bivariate densities of a pair of observations and is still feasible and could be used for estimating the parameters (see Section 3). The bivariate distribution of is known as the Kibble bivariate Gamma distribution (Kibble, 1941) with density
where and denotes the modified Bessel function of the first kind of order .
When , has marginal distribution which is a standard exponential distribution. In such case the bivariate distribution reduces to the Downton distribution (Downton, 1970) with density
Therefore a random process with marginal distribution can be derived by the transformation
Note that in (3) and (6), implies pairwise independence, as in the Gaussian case. Both processes are stationary by construction with mean equal to one and variance depending on the shape parameter. A non stationary version of that we denote as can be obtained by setting
where is a deterministic function. Then the random process has mean function and the non stationarity is induced by the mean function, i.e. . We can apply a similar argument to (5) and define with
As in the Gamma case the mean function does not depend on the shape parameter, i.e. , and .
Thus a spatial regression model can be obtained by assuming that where is a -dimensional vector of covariates and is a -dimensional vector of (unknown) parameters. Finally note that the bivariate densities of and can be derived as
2.1 Second-order and geometrical properties
It is easy to show that the correlation function of , , is equal to , the squared of the correlation function of the ‘parent’ Gaussian random process, and and have the same correlation function.
Using Proposition 1 in Appendix we can also obtain the correlation function of and , namely
where the function
is the generalized hypergeometric function (Gradshteyn and Ryzhik, 2007) and , for , is the Pochhammer symbol.
Unlike the Gamma case, the correlation of depends on the shape parameter and for . We can notice that both processes have positive correlation function with , . In fact the continuous function
is concave for with and . Figure 1 allows to better understand the ”shrinking” effect on the correlation according to different values of the shape parameter .
Moreover, the function is infinitely differentiable in a neighbourhood of . We can conclude that geometrical properties such as mean-square continuity, and degrees of mean-square differentiability can be inherited by the Gamma and Weibull processes from the ‘parent’ Gaussian process . In fact, for a stationary process, continuity of its correlation function at implies mean square continuity and the -times mean-square differentiability is equivalent to the -times differentiability of its correlation function at (Stein, 1999, Section 2.4). However, and are -times differentiable at as well as is. In summary, and are mean square continuous if is mean square continuous and they are -times mean-square differentiable if is -times mean-square differentiable. Finally, it is trivial to see that the sample path continuity and differentiability of and originate from the ‘parent’ Gaussian process.
2.2 Illustrative examples
Since the Gamma and Weibull processes inherit the geometrical properties of the ’parent’ Gaussian process, the choice of the covariance function is crucial. Two flexible models that allow to parametrize in a continuous fashion the mean square differentiability of a Gaussian process and its sample paths are:
the Matérn correlation function (Matérn, 1986)
where is a modified Bessel function of the second kind of order . Here, and guarantees the positive definiteness of the model in any dimension.
the Generalized Wendland correlation function (Gneiting, 2002a), defined for as:
and for as:
Here is the Beta function and , , guarantees the positive definiteness of the model in .
Both represent flexible parametric models for the spatial correlation and the parameter allows a continuous parametrization for the smoothness of the associated Gaussian random process (Stein, 1999; Bevilacqua et al., 2018). In particular for a positive integer , the sample paths of a Gaussian process are times differentiable if and only if in the Matérn case and if and only if in the Generalized Wendland case.
The Generalized Wendland correlation is also compactly supported, an interesting feature from computational point of view (Furrer et al., 2013), and since implies and , the sparsity of the correlation matrix associated to the parent Gaussian process is inherited by the Gamma and Weibull correlation matrices.
Figures 2 and 3 show six realizations of the stationary random process on setting . We have firstly considered a Matérn correlation function with three different parametrization for the smoothness parameter and the shape parameter . The values of the range parameters, , have been chosen in order to obtain a practical range equal to . The corresponding correlation function are plotted (from left to right) in the centre panel of Figure 2 and the bottom panel reports the histograms of the observations.
The simulation experiment has been also conducted for the Generalized Wendland correlation function (see Figure 3). Here, we have set three different values of the parameters that correspond to the same degree of the smoothness of the Matérn correlation function and common values for and . The shape of the marginal distribution does not change with respect the previous experiment, i.e. .
It is apparent that the correlation inherits the change of the differentiability at the origin from when increasing . This changes have consequences on the geometrical properties of the associated random processes. In fact the smoothness of the realizations increase with . Note also the flexibility of the Weibull process when modelling positive data since both positive and negative skewness can be achieved with different values of , see Figures 3(g-i).
2.3 Extremal properties
In climatology and environmental data analysis, the Weibull distribution is a commonly used marginal model. For instance, the distribution of wind speeds in latitudes extra to the Tropical zone are well known to comply closely the Weibull distribution. Thus methods based on fitting a Weibull distribution to all available data are often applied to extrapolate sizes of future extreme winds, using the tail of the Weibull distribution. When studying exceedances over a high threshold, standard measures of dependence such as correlation, are typically inappropriate due to the need to extrapolate beyond the observation range. In this context different notions of dependence arise, with the main distinction being between asymptotic dependence and asymptotic independence of exceedances as the threshold level increases. Under spatial asymptotic dependence the likelihood of a large event happening in one location is tightly related to high values being recorded at a location nearby; the opposite is true under asymptotic independence, in which large events might be recorded at one location only, and not in neighbouring locations (Wadsworth and Tawn, 2012).
where , denote the generalized inverse distribution functions of two observations and . We say that and are asymptotically dependent if the limit is positive. The case characterizes asymptotic independence. In the Appendix (Proposition 2) we prove that the Weibull random field (5) is asymptotically independent i.e.
for any pairs and .
3 Estimation and prediction
3.1 Pairwise likelihood inference
Suppose that we have observed at the locations and let be the vector of unknown parameters for the random processes or . The evaluation of the full likelihood for is impracticable for moderately large . A computationally more efficient alternative (Lindsay, 1988; Varin et al., 2011) combines the bivariate distributions of all possible distinct pairs of observations . The pairwise likelihood function is given by
where is one of the bivariate densities in (2) and is a non-negative weight. The choice of a cut-off weight, namely if , and otherwise, for a positive value of , can be motivated by its simplicity and by observing that that dependence between observations which are distant is weak. Therefore, the use of all pairs may skew the information confined in pairs of near observations (Davis and Yau, 2011; Bevilacqua and Gaetan, 2015).
The maximum pairwise likelihood estimator is
and, arguing as in Bevilacqua et al. (2012) and Bevilacqua and Gaetan (2015), it can be shown that, under increasing domain asymptotics, is consistent and asymptotically Gaussian with asymptotic covariance matrix given by the inverse of the Godambe information
where and .
Model selection can be performed by minimizing the pairwise likelihood information criterion, defined as
which is an analogue of the Akaike information Criterion (AIC) in a pairwise likelihood framework (Varin and Vidoni, 2005).
Since the evaluation of the pairwise likelihood can be split into a number of parallel tasks, it has been implemented in the Open Computing Language (OpenCL) for parallel computing. The implementation is part of GeoModels, an upcoming R package.
3.2 Linear prediction
The lack of workable multivariate densities forestalls the use of the conditional distributions for the prediction. Therefore we choose a suboptimal solution following Bellier et al. (2010) and De Oliveira (2014) and we suggest to use two linear predictors for the random variable at some unobserved location based on the data at locations .
If the mean function in known, we define the error or residuals process with , where the process is indifferently the non stationary Gamma or Weibull process defined in Section 2.
Now two linear predictors can be considered, namely
Both predictors are unbiased, , and linear predictors of .
The vector of weights is derived minimizing the mean square error with respect to . The solutions for the predictors are given by the equations of the simple and ordinary kriging, respectively, (Cressie, 1993, ch. 3) i.e.
Here is the unit vector of dimension , is a whose -th element is and .
In practice the predictors cannot be evaluated since and are unknown. For this reason we suggest to use a plug-in estimate for and using the pairwise likelihood estimates.
Finally, we note that the error process is stationary with marginal distributions or and this feature can be used for checking the adequacy of the marginal and dependence specification as will be seen in Section 4.
4 Numerical results
4.1 Simulation study
We report the results of a simulation study for assessing the performance of the estimation method. We have considered examples of spatio-temporal processes tailored to the models we have used in the real data analysis.
We consider two thousands observations over a space time domain where one hundred locations , , are uniformly distributed over the unit square and , with . We choose the non-separable model (Gneiting, 2002b)
with , , , and .
In the simulation experiment spatial range, temporal range and non-separability parameter are fixed at , and , respectively.
Non-stationarity for and is induced by taking where is a value from the -uniform distribution. The values of the regression parameters are and . Furthermore we will consider different values for the shape of the marginal distributions, namely for and for .
Finally two vectors of unknown parameters for the pairwise likelihood (13) are examined, namely and . In shaping the pairwise likelihood, we opt for a cut-off weight equal to one if and , zero otherwise, which bypasses an explosion of the number of pairwise terms and shifts focus to relatively short-range distances where dependence matters most.
Table 2 and 2 report the bias and the root of the mean square error (RMSE), obtained by 1 000 Monte Carlo runs. Moreover, Figure 4 and 5 contain the boxplots of the estimates for two values of the shape parameter ( and ) as illustrative examples.
As general comment the estimates are overall approximatively unbiased, see the values on the top of each cell in Table 2-2. The distribution of the estimates are quite symmetric (see Figure 4, centred around the true value and 5) numerically stable with very few outliers. For the random process the RMSE is affected by the magnitude of the shape parameter. More precisely the RMSE seems to decrease when the shape parameter increases. In the case of the random process this pattern is less evident, but the precision of estimates of the regression parameters benefits of the increment of the parameter . Finally, the fact that the temporal parameter is worst estimated is not surprising since we have less observations along the temporal dimension.
4.2 Irish wind speed data
We consider a dataset of daily average wind speeds , , , observed for a period , at twelve synoptic meteorological stations in the Republic of Ireland. An in-depth analysis of these data has been carried out for the first time by Haslett and Raftery (1989) and later the data have been considered in several papers (see, for instance, Gneiting, 2002b; Stein et al., 2004; Stein, 2005; De Luna and Genton, 2005). The original time series contain positive values, except for few zeroes and the marginal distributions are noticeably asymmetric. Moreover the standard deviations are correlated with means that vary over the seasons.
For these reasons the previous analysis have suggested to preliminary transform the data and then extract a common seasonal pattern. These transformations yield to approximately stationary time-series, with bell-shaped marginal distributions, that have been modelled by different Gaussian random processes.
Differently from the previous attempts, we adopt the and random processes as models of the original data and we replace the sixteen zeros with the mean speed of the previous and next day. The seasonal variations in the level and in the variability are modelled by the common trend
Following Gneiting (2002b) we restrict our attention to the short distance and short lag dependence using the the same correlation function (16), with different values of . However we note that the space-time separability, i.e. for two correlation functions and , is inherited only by the random process.
A natural competitor of our models is the Log-Gaussian random process namely , , where is a zero mean, unit variance Gaussian process with correlation (see, for instance, De Oliveira, 2006). In such case and
In order to asses the predictive performances of the different models we split the data set into a training period consisting of years , with days, and a test period for the remaining years, days. The sample size () prevents the use of a full likelihood approach even for the Log-Gaussian model. Therefore the estimation of parameters for the three models is based on the weighted pairwise likelihood, with a cut-off weight equal to one if jointly with and zero otherwise.
Our implementation, in the GeoModels package, based on C and OpenCL allows to speed-up approximatively five times the computation of the weigthed pairwise likelihood estimator with respect to a standard C implementation. In particular one evaluation of the function needs approximatively and seconds with or without OpenCL implementation where the time of evaluation (in seconds) has been measured in terms of elapsed time on a laptop with 2.4 GHz processor and 16 GB of memory.
In Table 3 we report the estimates with the standard errors and PLIC values computed using the sub-sampling technique in Bevilacqua et al. (2012). Actually, we have estimated several specification of for but, for simplicity, we report only our best choice, , according to the PLIC value.
The estimated trends are not dissimilar for the three models. Instead it is evident that the estimates of and entail a stronger spatial and temporal range for Gamma and Weibull models with respect to the Log-Gaussian ones. A comparison of the three models in terms of PLIC leads to a clear discrimination among them, in favour of the first ones.
We remind that is a random variable with distribution Gamma, Weibull or Log-Gaussian. Thus the residuals can be used for checking the goodness-of-fit of the models. The marginal fitting of the Weibull model is extremely good, according Figure 6-(a) in which we compare the quantiles of the residuals with the quantiles of the distribution . Moreover, the choice of (16) as a model for is supported by the comparison of the empirical semi-variograms of the residuals with the estimated semi-variograms, see Figures 6-(b-c).
Then we compare the prediction performances of the three models. The root-mean-square error (RMSE) and the mean absolute error (MAE) for each station, namely
are customary measures of the prediction performances. Here we consider one-day ahead predictions of the wind speed in the test period using the observations for the last three days.
For the Gamma and the Weibull model we have reported the results obtained by using the simple kriging predictor (14) since we have not found remarkable differences with respect to use the alternative ordinary kriging predictor (15). Instead for the Log-Gaussian model we have chosen the optimal predictor in De Oliveira (2006). As benchmark, we have also considered the naïve predictor , that uses the observation recorded the day before at the station.
Even though the simple kriging predictor is an suboptimal solution, Gamma and Weibull models outperform again the Log-Gaussian model as we can infer from the last column in Tables 4-5. Overall, the three models performs better than the naïve predictor.
5 Concluding remarks
In this paper we proposed two models for regression and dependence analysis when dealing with positive continuous data. Specifically we propose two non stationary random processes with Gamma and Weibull marginals for modeling positive skew data.
The Gamma process is obtained by rescaling a sum of independent copies of a squared ‘parent’ Gaussian random process and the Weibull process is obtained as a power transformation of a specific Gamma process. Our proposal share some similarity with the trans-Gaussian random process. However, unlike many other trans-Gaussian random process, we have showed that nice properties such as stationarity, mean-square continuity and degrees of mean-square differentiability are inherited from the ‘parent’ Gaussian random process.
It is worth noting that a hierarchical spatial model like the generalized linear model (Diggle et al., 1998) always induces discontinuity in the path. In our case discontinuity of the paths can be easily induced by choosing a discontinuous correlation function for the ’parent’ Gaussian process. Another relative merits of the parametrization (7) and (8) is that allows a clear expression and interpretation of the mean function.
We also remark that even though we have limited to ourselves to a continuous Euclidean space, our models can be extended to a spherical domain (Gneiting, 2013; Porcu et al., 2016) or to a network space. In this respect the random process should represent a generalization of the Markov model in Warren (1992).
A common drawback for the proposed Gamma and Weibull processes is the lack of an amenable expression of the density outside of the bivariate case that prevents an inference approach based on likelihood methods and the derivation of an optimal predictor that minimizes the mean square prediction error. We have showed that an inferential approach based on the pairwise likelihood is an effective solution for estimating the unknown parameter. On the other hand probabilities of multivariate events could be evaluated by Monte Carlo method since the random processes can be quickly simulated. However our solution to the conditional prediction, based on a linear predictor, is limited and deserves further consideration even if in our real data application has been performed well. Finally, a clear limitation of the Gamma process is that the shape parameter is restricted to the half-integer values. This problem could be solved by considering a scale mixture of a Gamma process with a process with marginal Beta distributions exploiting the results in Yeo and Milne (1991) and this will be studied in future work.
Partial support was provided by FONDECYT grant 1160280, Chile and Iniciativa Cientifica Milenio - Minecon Nucleo Milenio MESCD for Moreno Bevilacqua, by grant CONICYT-PCHA, Doctorado Nacional, 2015-21150156 for Christian Caamaño and by the IRIDE program of the Ca’ Foscari University of Venice for Carlo Gaetan.
The product moment of any pairs and is given by
Note that equation (6) can be rewritten in terms of the hypergeometric function using the identity
Then by definition of product moments and using series expansion of hypergeometric function , we have: