Non-Gaussian Autoregressive Processes with Tukey -and- Transformations

Yuan Yan and Marc G. Genton^{1}^{1}1
CEMSE Division,
King Abdullah University of Science and Technology,
Thuwal 23955-6900, Saudi Arabia.

E-mail: yuan.yan@kaust.edu.sa, marc.genton@kaust.edu.sa

This publication is based upon work supported by the King Abdullah University of Science and Technology
(KAUST) Office of Sponsored Research (OSR) under Award No: OSR-2015-CRG4-2640.

August 1, 2019

Abstract

When performing a time series analysis of continuous data, for example from climate or environmental problems, the assumption that the process is Gaussian is often violated. Therefore, we introduce two non-Gaussian autoregressive time series models that are able to fit skewed and heavy-tailed time series data. Our two models are based on the Tukey -and- transformation. We discuss parameter estimation, order selection, and forecasting procedures for our models and examine their performances in a simulation study. We demonstrate the usefulness of our models by applying them to two sets of wind speed data.

Some key words: Autoregressive; Heavy tails; Non-Gaussian; Skewness; Time series; Tukey -and- transformation.

Short title: Tukey -and- Autoregressive Processes

## 1 Introduction

To study climate change, it is essential to understand the temporal properties of the data of interest. Climate and environmental data, such as precipitation, temperature, thickness of glacial varves, wind speed, concentration of air pollutants, are continuous in nature. In a time series analysis of continuous random variables, the autoregressive moving-average (ARMA) model, especially the Gaussian one, is popularly adopted because of its simplicity and interpretability. However, data from climate or environmental science are often asymmetric with heavy tails. Therefore, a non-Gaussian time series analysis is needed. Since most climate or environmental data we come across are continuous, we do not consider time series analysis of categorical or binary data in this paper, which are intrinsically non-Gaussian.

In linear models, three approaches are mainly used to deal with non-Gaussian data: transformation of the dependent variable, regression with non-Gaussian error, and generalized linear models (GLM). Similarly, we can also classify many of the existing non-Gaussian time series methods and models into three categories: 1) transformations to ‘Gaussianize’ the data, 2) ARMA models with non-Gaussian noise, and 3) extension of the GLM to the time series context.

The transformation approach is widely used among practitioners to, first, Gaussianize data and then examine the latent Gaussian process. Equivalently, it assumes that the observed process is obtained after a transformation of the latent Gaussian process :

(1) |

For example, precipitation data can be approximated by the gamma distribution, then a square or cube root transformation is usually applied to normalize the data; similarly wind speed data can be assumed to follow a Weibull distribution, and taking a logarithm alleviates the departure from normality (Haslett and Raftery, 1989). The logarithm, square and cube root transformations all belong to the Box-Cox power transformation (Box and Cox, 1964), which is a family of transformations with one parameter () that can be applied only to positive values. A parametric form of transformations allows users to find the optimal transformation from among the members of the Box-Cox transformation family by estimating from the data, instead of attempting many different transformations. A large literature exists on the Box-Cox power transformation, its modifications and its application in linear models and time series analysis. Nelson and Granger (1979) extensively explored the use of Box-Cox transformations in 21 time series of economics. Non-parametric transformations are an appealing alternative to assuming a parametric form of transformations due to their flexibility. Johns et al. (2003) explored non-parametric transformations on both the dependent variable and the predictor variables in regression. Block et al. (1990) used the empirical distribution and probability integral transform (PIT) to form ARMA models with arbitrary continuous marginal distributions. However, there is a trade-off between flexibility and parsimony.

Instead of using transformations to Gaussianize the data, an ARMA model with non-Gaussian noise may be specified, whether it is the marginal distribution of the process or the distribution of the error term. Given the marginal distribution, the distribution of the error term can be found by linking their moment generating function, namely, the Laplace transformation of the probability density function. Many models were previously proposed following this direction, for example, ARMA models with exponential marginals (Lawrance and Lewis, 1980) and AR models with marginal gamma distribution (Gaver and Lewis, 1980). On the other hand, specifying the distribution of the non-Gaussian error term is easier for data simulation and maximum likelihood inference since the conditional likelihood can then be written down in a closed form. ARMA models that are driven by non-Gaussian innovations and their properties were discussed in Davies et al. (1980) and Li and McLeod (1988).

For the third class of existing non-Gaussian time series methods and models, Benjamin et al. (2003) introduced the generalized ARMA model, extending the GLM to a time series setting. Earlier exploration of GLM in time series can be found in Benjamin et al. (2003) and references therein. Cordeiro and de Andrade (2009) proposed a model combining the GLM idea with the use of transformations, and considered its use in time series.

Apart from the three categories summarized above, there exists many other specialized models for non-Gaussian, non-stationary time series. The most famous ones are the autoregressive conditional heteroskedasticity (ARCH) (Engle, 1982) and the generalized autoregressive conditional heteroscedasticity (GARCH) (Bollerslev, 1986) models. These models have been used canonically for analyzing financial time series that exhibit changes in volatility. Other models include the zeroth-order self-exciting threshold autoregressive (SETAR) model (Tong, 1990), the multimodality multipredictor autoregressive time series (MATS) models (Martin, 1992), the Gaussian mixture transition distribution (GMTD) models (Le et al., 1996), the mixture autoregressive models (Wong and Li, 2000), and their extension to the Student -mixture autoregressive model (Wong et al., 2009).

Considering the trade-offs between model flexibility and parsimony, in this paper we use an alternative parametric family of transformations, the Tukey -and- (TGH) transformation, which is more flexible and interpretable than the Box-Cox family and at the same time keeps the model fairly simple compared to the non-parametric approaches. We build two autoregressive models for non-Gaussian time series via the TGH transformation in both the data transformation and non-Gaussian error approaches. Our first model uses the TGH transformation for in (1). Our second model assumes the non-Gaussian error term in the AR process to be a TGH transformation of Gaussian white noise.

The remainder of this paper is organized as follows. In Section 2, we review the TGH transformation, discuss properties of the univariate TGH distribution and its extensions, and introduce our two AR models based on the TGH transformation. In Section 3, we describe the parameter estimation and forecasting algorithm for our models. In Section 4, we demonstrate the estimation and prediction performances of our models through a simulation study. In Section 5, we illustrate the usefulness of our models by applying them to two wind speed datasets. We summarize our findings and prospects for future research in Section 6.

## 2 Two Tukey -and- Autoregressive Models

### 2.1 TGH Transformations and Properties

The TGH transformation (Tukey, 1977) is defined as a strictly monotonic increasing function of for and :

(2) |

When applying the TGH transformation to a univariate standard normal random variable , the transformed random variable is said to follow the TGH distribution. The support of is the real line when ; for and , it has a lower (upper) bound of for (). The univariate TGH distribution is a flexible family of distributional models for non-Gaussian data. It has two interpretable parameters, and , which respectively control the skewness and tail heaviness of . The tails become heavier as increases, and the -th moment of exists only for . The level of skewness increases as increases (becoming right-skewed when ). Martinez and Iglewicz (1984) included the case of , for which the TGH transformation is no longer monotonic and the resulting TGH distribution has lighter tails than the Gaussian one. However, their method is unconventional, so we only consider the case of . Also, from a practical point of view, real data are more often heavy-tailed rather than light-tailed. Special cases of the TGH distribution include the shifted log-normal random variable for and , and a Pareto-like distribution for and , which was discussed in detail by Goerg (2015). Moments of the TGH distribution were derived by Martinez and Iglewicz (1984).

A drawback of the TGH transformation is that its inverse transformation does not have an explicit form (except when either or is equal to 0) and, thus, the likelihood inference is difficult. Xu and Genton (2015) avoided this difficulty by using an approximated likelihood and proposing an efficient parameter estimation algorithm, which greatly improved the parameter estimation performance compared to the moment or quantile-based methods without compromising the computational speed. We discuss the approximated likelihood method in Section 3.1 for the parameter estimation of our models.

Extensions of the univariate TGH distribution to the multivariate case can be found in Field and Genton (2006) and He and Raghunathan (2012). Xu and Genton (2016) used the TGH transformation to build a new max-stable process for the modeling of spatial extremes. Recently, Xu and Genton (2017) further extended the TGH distribution to the spatial case and defined TGH random fields. Those models were found to be useful in practice and have been applied to air pollution, wind speed, rainfall and economic data. In this paper, we apply the TGH transformation in the time series context and build two models that take advantage of the AR structure.

### 2.2 TGH Transformation of a Latent AR Process

Our first model transforms a latent Gaussian process similarly to Xu and Genton (2017). Let , be a stationary Gaussian AR process of order with zero mean and unit variance. Our model 1, the Tukey -and- transformed autoregressive (TGH-AR()-t) process , has the same form as (1), with being a generalized version of the TGH transformation:

(3) |

where and are the covariates and the corresponding coefficients respectively, and and are the location and scale parameters, respectively. In a time series analysis, the covariates are usually functions of that model the trend or the seasonality, which may also be incorporated through a seasonal integrated AR model. In our first model, unlike the commonly adopted Box-Cox approach, the deterministic part that consists of the location parameter, covariates and their coefficients, is outside the transformation. In this way, the trend (also seasonality or periodicity) is attributed directly to the process rather than the underlying Gaussian process . We control the skewness and tail behavior of the stochastic part of the process by applying the TGH transformation to with different values of and .

When , the TGH-AR(1)-t process is of the form (3), with , and where is a Gaussian white-noise process with zero mean and constant variance . There is a constraint that is a function of so that has unit variance. This dependency may seem strange at first, but it is effectively equivalent to standardizing to a process with unit variance by multiplication by a scaling constant. For example, the usual AR(1) process with a Gaussian error term of variance will result in a process of variance , and can be standardized to have a unit variance by multiplying the rescaling factor by . This is equivalent to an AR(1) process with an error term of variance of .

To simulate samples from the TGH-AR()-t model, we first simulate data from the underlying Gaussian AR() process for . Then we transform the process by applying the generalized version of the TGH transformation to at each time point. Figure 1 shows the functional boxplots (Sun and Genton, 2011) of 1000 realizations from the TGH-AR(1)-t model of sample size without covariates, where , and for different values of and . The moments of are essentially the moments of the TGH distribution; the formulas can be found in Martinez and Iglewicz (1984). The autocovariance of can be found in terms of the autocorrelation of the underlying Gaussian process , which was derived in Xu and Genton (2017). The autocorrelation of is always weaker than the autocorrelation of . The functional boxplots are labeled with values of the marginal mean (, green line), standard deviation (), skewness () and excess kurtosis () for the transformed process . From the functional boxplots of the sample paths of , we see clearly that the parameter controls the skewness and controls the occurrence rate of extreme values in the time series.

### 2.3 AR Process with TGH Error

Instead of applying the transformation to the time series itself, in our second model we transform the error term in an AR process to an non-Gaussian error term that follows a TGH distribution. Thus, model 2 (the TGH-AR()-e model) is defined as:

(4) |

where is a process of median 0; and are the covariates and corresponding coefficients respectively; and are the location and scale parameters, respectively; and are the AR coefficients under constraints such that the AR process is weakly stationary. Although the marginal distribution of in the TGH-AR()-e model cannot be written down in a closed form, the moments of can be derived under the assumption of a weakly stationary AR process. The relationship between the moments of and the moments of depends on the AR coefficients . For example, the mean of the process is , where . However, is not a zero-mean process when . Davies et al. (1980) provided the algebraic relations between the skewness and kurtosis of the process and error term in ARMA processes. Figure 2 shows the functional boxplots of 1000 realizations from the TGH-AR(1)-e model with a sample size of 100, and the same parameters as in Figure 1. Again, we see that skewness and tail behaviour of the process can be controlled by and .

### 2.4 Comparison of the Two Models

Comparing Figures 1 and 2, we notice that, given the same values for and , the skewness and kurtosis of the TGH-AR(1)-t process are larger than those of the TGH-AR(1)-e process, while the variance is smaller. This agrees with the findings in Davies et al. (1980), where the ARMA process was less skewed than the skewed innovation error.

We interpret the difference between the two models from the viewpoint of the data-generating mechanism. In model 1, TGH-AR-t, we assume that there is an underlying latent Gaussian process and that the observed process is a realization of a non-linear transformation of . Goerg (2011) justified this idea using financial data. Overreactions to changes in the stock market skew the underlying symmetric process of financial news and result in skewed log-returns of the stock market; see also De Luca et al. (2006). For model 2, the TGH-AR-e model, we do not assume such a latent process, but itself is an AR model with non-Gaussian noise that follows the TGH distribution. The difference between the two models can be seen by plotting the relationship between and (Figure 3). The relationship is linear for model 2 and non-linear for model 1. The exploratory plots shown in Figure 3 for a sample size of and different values of , and can be used as a reference for deciding whether model 1 or 2 is more suitable for the time series data to be analyzed.

We want to emphasize that our time series models are not simply variations of the spatial case (Xu and Genton, 2017). We utilize the unique properties of the AR structure for discrete time processes, which the random field approach does not possess. One main difference between the AR time series and the random fields is that the correlation structure of the random process is induced by the AR coefficients, whereas in the random fields, it is modeled directly. Moreover, the property of conditional independence of an AR process allows us to write down the likelihood explicitly instead of a matrix form as for the random fields. The AR structure also makes it possible for us to build the TGH-AR-e model, which is impossible in the random fields setting.

## 3 Inference

### 3.1 Parameter Estimation

The parameters involved in our two TGH-AR models are . Since the maximum likelihood estimator (MLE) is well-known to possess many good asymptotic properties, we prefer the likelihood-based estimator over the moment or quantile-based estimators. The vector of the parameters related to the TGH transformation is denoted by , the autoregressive parameters by , and set .

For model 1, the log-likelihood function given observations can be written as:

(5) |

where is the conditional log-likelihood for the underlying Gaussian AR process , which is also Gaussian.

Since the explicit form of is intractable for model 2, we consider the conditional likelihood given the first observations. The conditional log-likelihood of model 2 can be written as:

(6) |

where and .

Although we can find the MLE for in our two models by maximizing (5) or (6) with respect to in principle, it is computationally expensive to find the inverse numerically for each data point and iteration of the optimization since does not have a closed form. To bypass this computational challenge, we use the idea of approximated likelihood from Xu and Genton (2015) for the univariate independent and identically distributed TGH distribution. Xu and Genton (2017) extended this idea to the random field case and proposed an algorithm for iteratively estimating both the parameters related to the TGH transformation and the parameters in the model of the covariance function. In essence, they linearized the inverse transformation function piecewisely and maximized the approximated likelihood function with the piecewisely linearized inverse function instead of the exact one.

Equipped with this linearization procedure, now we numerically maximize the approximated log-likelihood by using the approximated inverse transformation function instead of in either function (5) or (6). Thus we obtain the maximum approximated likelihood estimator (MALE) of the parameters in our models much faster. We also found that iteratively optimizing and is better than optimizing directly; however, it is about 20 times slower on a Lenovo computer with 16 2.50GHZ Intel Xeon(R) CPUs. Hence, we optimize all of the parameters for estimation directly in the sequel.

### 3.2 Order Selection

The order selection for an AR model is usually performed via the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). In a simulation study (all simulations are done in the same computer as mentioned above), we found that the algorithm based on BIC performs much better than the one based on AIC for our models. Therefore, we use the order selection procedure based on BIC with our estimation algorithm. For model 1, BIC=; for model 2, BIC=.

We evaluate the empirical correct order selection rate by BIC with the approximated likelihood for our two models through a simulation study. We generate data from both models with , , without covariates, with sample sizes and and a true AR order of . For and , we use multiple values for the AR coefficient(s), with different behaviors of the autocorrelation function (ACF) and different intensities for the spectral density. We get correct order selection rate from 1000 simulations. Table 1 summarizes the correct order selection rate by BIC for our two models with different sample sizes and orders for different values of ().

TGH-AR()-t | TGH-AR()-e | ||||
---|---|---|---|---|---|

=0 | 0.959 | 0.988 | 0.938 | 0.990 | |

=1 | 0.951 | 0.987 | 0.934 | 0.987 | |

0.950 | 0.988 | 0.932 | 0.981 | ||

0.952 | 0.982 | 0.940 | 0.985 | ||

0.961 | 0.987 | 0.942 | 0.992 | ||

=2 | 0.825 | 0.991 | 0.858 | 0.987 | |

0.532 | 0.986 | 0.592 | 0.986 | ||

0.706 | 0.987 | 0.777 | 0.983 | ||

0.948 | 0.982 | 0.937 | 0.983 |

The correct order selection rate is increased by increasing the sample size from 100 to 500, and the overall correct order selection rate is satisfactory.

### 3.3 Forecasting

In this section, we derive one-step-ahead point and probabilistic forecasts for our two models.

For model 1, the transformed process , where with mean 0, variance 1. Then, model 1 becomes . Given observations , we derive the conditional distribution:

where and are the conditional mean and variance, respectively, for of the underlying Gaussian AR() process . Here, and are determined by and can be found efficiently by the Durbin-Levinson algorithm. With this conditional distribution, the point predictors for minimizing the absolute prediction error (conditional median) and the squared prediction error (conditional mean) are: and , respectively.

For model 2, the conditional distribution is:

The point predictors for minimizing the absolute loss (conditional median) and squared loss (conditional mean) are and , respectively. We note that the conditional median has the same form as that of an AR model with Gaussian error. In practice, we need to estimate the parameters first to make forecasts for future values of the time series based on those estimated parameters. Thus, the difference between the point predictions based on the conditional medians of our model and the Gaussian AR model comes from their respective estimations of and .

Confidence intervals (CI) can be found easily from the conditional distributions of our two models. However, unlike for the symmetric Gaussian distribution, there are different ways to form the two-sided CIs for asymmetric distributions. One popular choice for the CI is to exclude from both tails and then use the central probability interval. Another choice is to find the smallest probability interval. For example, the lower and upper bounds of the 95% CI of model 1 can be found by transforming the lower and upper bounds of the 95% CI of the underlying normal distribution of mean and variance . The minimum-length CIs need to be optimized numerically for different values of and .

## 4 Simulation Study

We run a Monte Carlo simulation study to assess the finite sample behavior of the MALE and compare the forecasting performances of our models to the performance of a Gaussian AR model.

### 4.1 Comparison of Estimation Methods

Using the TGH-AR-t model, we first test whether the MALE is improved by sequentially rather than simultaneously estimating by and by the transformed process . We generate 1000 realizations of sample sizes and from the TGH-AR(1)-t model with , and . In addition to using the MALE for our TGH-AR-t model, we also use the letter-value-based method (Dutta and Babbel, 2002) and the MALE for independent TGH distributions (Xu and Genton, 2015) to estimate first and then estimate by the transformed process .

Figure 4 presents boxplots of the parameters estimated by the three different estimators for 1000 replications. For all three methods, the bias and variance of the estimators improve as the sample size grows from 100 to 500. We also see that estimating the parameters in the TGH-AR-t model jointly is much better than estimating them sequentially, especially for and .

### 4.2 Estimation Performance of MALE

Next, we check the estimations produced by MALE from our two models. In each simulation, we generate realizations of sample size and from both the TGH-AR(1)-t and TGH-AR(1)-e models with the same values of specified in Section 4.1 and use six different values each for and . Then, we obtain the MALE of from the time series data without an order selection for both models. We run the simulation 1000 times.

Figures 6 and 6 show the estimation results from our two TGH-AR(1) models for . Plots for are not shown here as the estimation noticeably improves with the larger sample size. When , all of the parameters are very well estimated from both models by the MALE for all the values of and under consideration.

### 4.3 Forecasting Performance

Finally, we evaluate the forecasting performance. We use three methods, based on the TGH-AR-t, TGH-AR-e and Gaussian AR models, to produce point and probabilistic forecasts with the estimated parameters of data generated using the TGH-AR-t and TGH-AR-e models. Table 2 summarizes mean absolute errors (MAE) and root mean square errors (RMSE) of the point forecasts from each method. The MAE is calculated from the conditional median and the RMSE, from the conditional mean. It also shows the empirical coverage and average length of the 95% minimum-length CI from each method.

Data generated from | Model 1: TGH-AR-t | Model 2: TGH-AR-e | ||||
---|---|---|---|---|---|---|

Modeled by | TGH-AR-t | TGH-AR-e | Gau-AR | TGH-AR-t | TGH-AR-e | Gau-AR |

MAE | 0.841 | 0.845 | 0.847 | 1.350 | 1.345 | 1.350 |

RMSE | 1.075 | 1.086 | 1.086 | 1.746 | 1.744 | 1.748 |

95% CI coverage | 95.6% | 95.2% | 94.6% | 95.0% | 95.6% | 95.6% |

95% CI width | 4.22 | 4.42 | 4.41 | 7.13 | 7.15 | 7.20 |

The probabilistic forecasts of the three methods can be evaluated through a histogram of the probability integral transform (PIT) values, which is to apply the conditional distribution function to the true value of the process at the time point for forecasting. If the conditional distribution assumed by a certain model conforms with the true conditional distribution, the histogram should be flat (uniform). Another metric for evaluating probabilistic forecast is the continuous ranked probability score (CRPS); more on probabilistic forecasts and CRPS for a Gaussian distribution can be found in Gneiting and Katzfuss (2014) and Gneiting et al. (2006). Xu and Genton (2017) derived the CRPS for a TGH distribution. The lower the CRPS is, the better the probabilistic forecast is. A plot of the empirical versus nominal quantile, i.e., a reliability plot, can be used to check the quality of the quantile prediction.Figures 8 and 8 show histograms of the PIT values and reliability plots for all three forecasting methods applied to the data generated from both of our models. The mean CRPS value is also labeled.

Not surprisingly, the best forecast is achieved by using the same model that the data are generated from: the forecast has the smallest MAE and RMSE, a flatter PIT histogram, and a smaller mean CRPS. Also, the empirical quantile is close to the nominal level, and the width of the 95% CI is shorter than that of the Gaussian AR predictor while the empirical coverage is close to the nominal level. We also notice that the TGH-AR-t and TGH-AR-e models produce different forecasts, which means that the two models are not interchangeable.

## 5 Application to Wind Speed Data

The analysis of wind data is a crucial step in weather simulations for climate science. The accurate forecasting of wind speeds and quantifying the forecast uncertainty are also important for exploiting wind as clean energy. In this section, we illustrate the usefulness of our two non-Gaussian time series models on both wind simulations and wind speed forecasts.

### 5.1 Wind Speed Simulation

Climate models can produce multiple outputs of spatio-temporal wind speed data over the globe. Statistical models have been developed to reproduce the output from climate models for the sake of fast simulation instead of using the computationally extensive physical models. In order to fit a space-time statistical model for wind field over the entire world, a 4-step multi-resolution method has been proposed by Castruccio and Genton (2016), in which the first step is to model the time series of wind speed at each location individually.

We consider a publicly available Large Ensemble Project (LENS) dataset that consists of 30 ensembles of daily wind speed. We select one ensemble from this dataset and use the historical 86 years from 1920 to 2005 at 22 22 gridded locations over Saudi Arabia (bounded roughly by N and E). For each day in a year, we estimate the seasonality by taking the average of the wind speed across the 86 years. We remove the seasonal effect from each time point and analyze the residual wind speed data. We fit the flexible TGH-AR-t model to the residual wind speed time series model to accommodate the features of skewness and heavy tails shown by the residual wind speed. The TGH-AR-t model is more suitable in this wind speed simulation case because we want to find the optimal TGH transformation to Gaussianize the data. We assume that there is an underlying Gaussian AR process and can use this latent Gaussian AR process for analysis in next steps of the global spatio-temporal model. The estimated parameters at each location can also give insights into the pattern of the distributional properties for the wind speed residuals.

Figure 9 shows maps of the estimated values of the 4 parameters related to the Tukey -and- transformation. We may notice from these plots that and are distinctively different between land and ocean. We also observe that the and estimates show interesting patterns that are closely related to elevation and geographical features of a location.

### 5.2 Hourly Wind Speed Data at a Meteorological Station

We test the TGH-AR-e model on hourly wind speed data observed at a meteorological tower in Sunnyside, Oregon, from 1 December 2013 to 31 December 2014 (Kazor and Hering, 2015). With a temporal resolution of one hour, the wind speed at time is closely related to the wind speed at time . Therefore, we assume that the hourly wind speed time series follows an AR process with non-Gaussian innovations.

To demonstrate the usefulness of the TGH-AR-e model, we use the hourly wind speed data from June 2014. First, we notice that there exists a diurnal pattern, so we include harmonics with periods of 12 and 24 hours as the covariates in the model. We find the MALE with order selection for the TGH-AR-e model by fitting the wind speed time series. The estimated parameters are 5, and with .

Figure 10 shows the original hourly wind speed time series overlapped with a diurnal pattern estimated using the TGH-AR-e model. The histogram and normal Q-Q plot of the residual wind speeds after removing periodicity obviously deviate from Gaussianity. We present the skewness and kurtosis values with the histogram, which further support using our TGH-AR-e model to adapt to right-skewed and heavy-tailed data. The residual wind speed at time is plotted against time in the middle row of Figure 10, showing a strong linear relationship; hence, we choose to transform the error term using the TGH-AR-e model rather than the process itself. The ACF and the partial autocorrelation function (PACF) validate the suitability of using an AR(2) model, selected by BIC, for the wind speed residuals process. The bottom row of Figure 10 shows a histogram, a normal Q-Q plot and the PACF for the estimated back-transformed Gaussian error term , where . These plots also confirm the validity of using a TGH-AR(2)-e model in which .

We compare the forecast results from our TGH-AR-e model to those of the Gaussian AR model for this real dataset. We make one-step-ahead forecasts for the whole year of 2014, with a rolling window length of 30 days () for the parameter estimation to account for the non-stationarity caused by seasonality. A sample size of , as shown in Section 3.1, is sufficient to produce an accurate estimate from our TGH-AR-e model. To estimate the parameters for the Gaussian AR model, we remove the diurnal pattern using linear regression with the same harmonics as in the TGH-AR-e model. Then, we find the MLE of the autoregressive parameters from the residuals.

The MAE of the conditional median from the TGH-AR-e model is 4.05; the Gaussian AR forecast has an MAE of 4.28. The RMSEs for the conditional means of the TGH-AR-e model and the Gaussian AR model are 1.47 and 1.50, respectively. The empirical coverage of the 95% minimum-length CI is 94.2% for the TGH-AR-e model and 93.3% for the Gaussian model. We conclude that the point forecasts and CIs based on the TGH-AR-e model are better than those based on the Gaussian AR model for these hourly wind speed data.

Nevertheless, it is difficult to see the differences between the forecast results from only these numbers. Figure 11 shows the histograms of the PIT values from forecasts based on the TGH-AR-e and Gaussian AR models. By comparing these histograms we see the superiority of fitting the wind speed using a non-Gaussian TGH error in the AR model rather than a Gaussian error.

## 6 Discussion

In this paper, we applied the TGH transformation in a time series context and built two non-Gaussian autoregressive models for time series data. We used an efficient parameter estimation procedure that approximates the maximum likelihood estimator. We also described the order selection procedures for choosing the proper AR order and prediction aspects. We illustrated the empirical performances of estimation, order selection and forecasting of our models through a simulation study. Finally, we demonstrated the usefulness of our models by applying them to two wind speed datasets at different temporal resolutions. Our TGH-AR models produced competitive estimations and forecasts for both the simulated and real datasets.

The AR models considered in this paper cannot incorporate measurement errors. Extensions of the TGH-AR models to ARMA or state-space models need further research. Also, we feel that an exhaustive comparisons of the many existing transformations, including the Box-Cox, TGH and Sinh-arcsinh transformations (Jones and Pewsey, 2009), would be welcome. In a future study, the parameters and should be allowed to change smoothly across time, either by imposing a parametric function of for and or by a penalty of smoothness, instead of the moving window scheme we used here. Extension of the TGH framework to a spatio-temporal setting is also promising.

## References

- Benjamin et al. (2003) Benjamin, M. A., Rigby, R. A., and Stasinopoulos, D. M. (2003). Generalized autoregressive moving average models. Journal of the American Statistical Association, 98(461):214–223.
- Block et al. (1990) Block, H. W., Langberg, N. A., and Stoffer, D. S. (1990). Time series models for non-Gaussian processes. Lecture Notes-Monograph Series, 16:69–83.
- Bollerslev (1986) Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307 – 327.
- Box and Cox (1964) Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2):211–252.
- Castruccio and Genton (2016) Castruccio, S. and Genton, M. G. (2016). Compressing an ensemble with statistical models: An algorithm for global 3D spatio-temporal temperature. Technometrics, 58(3):319–328.
- Cordeiro and de Andrade (2009) Cordeiro, G. M. and de Andrade, M. G. (2009). Transformed generalized linear models. Journal of Statistical Planning and Inference, 139(9):2970 – 2987.
- Davies et al. (1980) Davies, N., Spedding, T., and Watson, W. (1980). Autoregressive moving average processes with non-normal residuals. Journal of Time Series Analysis, 1(2):103–109.
- De Luca et al. (2006) De Luca, G., Genton, M. G., and Loperfido, N. (2006). A multivariate skew-GARCH model. In Advances in Econometrics: Econometric Analysis of Economic and Financial Time Series, Part A (Special volume in honor of Robert Engle and Clive Granger, the 2003 winners of the Nobel Prize in Economics), chapter 20, pages 33–57. Elsevier.
- Dutta and Babbel (2002) Dutta, K. and Babbel, D. (2002). On measuring skewness and kurtosis in short rate distributions: The case of the US dollar London inter bank offer rates. Technical report, The Wharton School, University of Pennsylvania.
- Engle (1982) Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, 50(4):987–1007.
- Field and Genton (2006) Field, C. and Genton, M. G. (2006). The multivariate -and- distribution. Technometrics, 48(1):104–111.
- Gaver and Lewis (1980) Gaver, D. P. and Lewis, P. A. W. (1980). First-order autoregressive gamma sequences and point processes. Advances in Applied Probability, 12(3):727–745.
- Gneiting and Katzfuss (2014) Gneiting, T. and Katzfuss, M. (2014). Probabilistic forecasting. Annual Review of Statistics and Its Application, 1(1):125–151.
- Gneiting et al. (2006) Gneiting, T., Larson, K., Westrick, K., Genton, M. G., and Aldrich, E. (2006). Calibrated probabilistic forecasting at the stateline wind energy center. Journal of the American Statistical Association, 101(475):968–979.
- Goerg (2011) Goerg, G. M. (2011). Lambert W random variables - a new family of generalized skewed distributions with applications to risk estimation. The Annals of Applied Statistics, 5(3):2197–2230.
- Goerg (2015) Goerg, G. M. (2015). The Lambert way to Gaussianize heavy-tailed data with the inverse of Tukey’s transformation as a special case. The Scientific World Journal, 2015:909231.
- Haslett and Raftery (1989) Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(1):1–50.
- He and Raghunathan (2012) He, Y. and Raghunathan, T. E. (2012). Multiple imputation using multivariate transformations. Journal of Applied Statistics, 39(10):2177–2198.
- Johns et al. (2003) Johns, C. J., Nychka, D., Kittel, T. G. F., and Daly, C. (2003). Infilling sparse records of spatial fields. Journal of the American Statistical Association, 98(464):796–806.
- Jones and Pewsey (2009) Jones, M. C. and Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika, 96(4):761–780.
- Kazor and Hering (2015) Kazor, K. and Hering, A. S. (2015). Assessing the performance of model-based clustering methods in multivariate time series with application to identifying regional wind regimes. Journal of Agricultural, Biological, and Environmental Statistics, 20(2):192–217.
- Lawrance and Lewis (1980) Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA (p,q) process. Journal of the Royal Statistical Society. Series B (Methodological), 42(2):150–161.
- Le et al. (1996) Le, N. D., Martin, R. D., and Raftery, A. E. (1996). Modeling flat stretches, bursts, and outliers in time series using mixture transition distribution models. Journal of the American Statistical Association, 91(436):1504–1515.
- Li and McLeod (1988) Li, W. K. and McLeod, A. I. (1988). ARMA modelling with non-Gaussian innovations. Journal of Time Series Analysis, 9(2):155–168.
- Martin (1992) Martin, V. L. (1992). Threshold time series models as multimodal distribution jump processes. Journal of Time Series Analysis, 13(1):79–94.
- Martinez and Iglewicz (1984) Martinez, J. and Iglewicz, B. (1984). Some properties of the Tukey and family of distributions. Communications in Statistics - Theory and Methods, 13(3):353–369.
- Nelson and Granger (1979) Nelson, H. L. and Granger, C. (1979). Experience with using the Box-Cox transformation when forecasting economic time series. Journal of Econometrics, 10(1):57 – 69.
- Sun and Genton (2011) Sun, Y. and Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical Statistics, 20(2):316–334.
- Tong (1990) Tong, H. (1990). Non-linear Time Series. Oxford University Press, Oxford.
- Tukey (1977) Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
- Wong et al. (2009) Wong, C. S., Chan, W. S., and Kam, P. L. (2009). A Student t-mixture autoregressive model with applications to heavy-tailed financial data. Biometrika, 96(3):751 – 760.
- Wong and Li (2000) Wong, C. S. and Li, W. K. (2000). On a mixture autoregressive model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(1):95–115.
- Xu and Genton (2015) Xu, G. and Genton, M. G. (2015). Efficient maximum approximated likelihood inference for Tukey’s -and- distribution. Computational Statistics & Data Analysis, 91:78–91.
- Xu and Genton (2016) Xu, G. and Genton, M. G. (2016). Tukey max-stable processes for spatial extremes. Spatial Statistics, 18(Part B):431 – 443.
- Xu and Genton (2017) Xu, G. and Genton, M. G. (2017). Tukey -and- random fields. Journal of the American Statistical Association, 112:1236–1249.