Wind speed forecasting at different time scales: a non parametric approach
Abstract
The prediction of wind speed is one of the most important aspects when dealing with renewable energy. In this paper we show a new nonparametric model, based on semiMarkov chains, to predict wind speed. Particularly we use an indexed semiMarkov model, that reproduces accurately the statistical behavior of wind speed, to forecast wind speed one step ahead for different time scales and for very long time horizon maintaining the goodness of prediction. In order to check the main features of the model we show, as indicator of goodness, the root mean square error between real data and predicted ones and we compare our forecasting results with those of a persistence model.
keywords:
Wind speed; forecasting model; indexed semiMarkov chains;1 Introduction
The variations of wind speed, in a certain site, are strictly related to the economic aspects of a wind farm, such as maintenance operations, especially in the off shore farms, pitch angle control on new wind turbines and evaluation of a new site. Many researchers are working proposing new models that can allow the prediction of wind speed, minutes, hours or days ahead. Many of these models are based on neural networks 14 (); 03 (), autoregressive models 05 (); 07 (); 753 (), Markov chains sha05 (); nfa04 (); 06 (); 01 (); 13 (); 794 (), hybrid models where the previous mentioned models are combined you03 (); 01 (); 08 (); 10 (); 11 (); 16 (); 480 (); 555 (); fore4 (); fore6 (); fore8 (); fore13 (); f1 (); f2 () and other less used models fore1 (); fore2 (); fore3 (); fore7 (); fore9 (); fore11 (). Often, these models are either focused on specific time scale forecasting, or synthetic time series generation. Instead, our model can be used both for time series generation and for forecasting at different time scales.
The approach we propose here is based on indexed semiMarkov chain (ISMC) model that was advanced by the same authors in wind2 () and applied to the generation of synthetic wind speed time series. In wind2 () we showed that our model is able to reproduce correctly the statistical behavior of wind speed. The ISMC model is a nonparametric model because it does not require any assumption on the form of the distribution function of wind speed. In this work we use the same model, slightly modified by adding a daily deterministic component, to forecast future values of wind speed. We will show that this model performs better than a simple persistence model, by comparing the root mean square errors. The ISMC model is able to forecast wind speed at different time scale without loosing the goodness of forecasting which is almost independent from the time horizon. Another important aspect addressed by this work is the number of data needed to have a good forecast. With this aim we will show the root mean square error as a function of the data used to calibrate the model.
The paper is organized as follows. First of all, in Section 2, we describe the database used for the analysis. In Section 3, we present the model and its validation. Then, in Section 4, we present results of the wind speed forecasting through an indicator of goodness and comparison with the persistence model. Finally in Section 5 we present some concluding remarks.
2 Database
The database used for the analysis in this work is freely available from data () and is composed of more than 230000 data of wind speed collected every 10 minutes. The weather station of L.S.I. Lastem is situated in Italy at N 45 28’ 14,9” E 9 22’ 19,9” and at 107 of altitude. The station uses a combined speeddirection anemometer at 22 above the ground. It has a measurement range that goes from 0 to 60 , a threshold of 0,38 and a resolution of 0,05 . The database and its empirical probability density function are represented in Figure 1.
We discretized wind speed into 8 states (see Table 1) chosen to cover all the wind speed distribution. Table 1 shows the wind speed states with their related wind speed values.
Sate  Wind speed range 
1  0 to 1 
2  1  2 
3  2  3 
4  3  4 
5  4  5 
6  5  6 
7  6  7 
8  7 
In order to analyze the behavior at different time scales, we resampled the data at different sampling frequencies: namely 30 minutes, 1 hour and 2 hours.
3 Model
3.1 The indexed semiMarkov chain model
The general formulation of the ISMC as developed in references dami11a (), dami11b (), dami12b () and wind2 () is here discussed informally.
SemiMarkov processes have similar idea as those that generate Markov processes. The processes are both described by a set of finite states whose transitions are ruled by a transition probability matrix. The semiMarkov process differs from the Markov process because the transition times are generated according to random variables. Indeed, the time between transitions is random and may be modeled by means of any type of distribution functions. In studies concerning wind speed modeling the states indicates discretized wind speed at the nth transition and the time in which the nth change of wind speed occurs.
In dami12 (); wind4 (), different semiMarkov models were applied to the wind speed modeling and it was shown that the semiMarkov models over perform the Markov models and therefore they are to be preferred in the modeling of wind speed to Markovian models.
In order to better represent the statistical characteristics of wind speed, in a recent article, the idea of an ISMC was advanced in the field of wind speed, see wind2 (). The novelty, with respect to the semiMarkov case, consists in the introduction of a third random variable defined as follow:
(1) 
This variable can be interpreted as a moving average of order executed on the series of the past wind speed values with weights given by the fractions of sojourn times in that wind speed with respect to the interval time on which the average is executed . Also the process has been discretized, Table 2 shows the states of the process and their values.
Sate  range 
1  0 to 2.1 
2  2.1  2.6 
3  2.6  3.4 
4  3.4  6 
5  6 
The parameter must be optimized as a function of the specific database. The optimization is made by finding the value of that realize the minimum of the root mean square error (RMSE) between the autocorrelation functions (ACF) of real and simulated data, see wind2 (). In our analysis .
The reasons to introduce this index of memory are found in the presence of a strong autocorrelation that characterize the wind speed process. In the same work we have shown that if a too small memory is used, the autocorrelation is already persistent but decreases faster than real data. With a longer memory the autocorrelation remain high for a very long period and also its value is very close to that of real data. If is increased further the autocorrelation drops again to small values. This behavior suggests the existence of an optimal memory . In our opinion one can justify this behavior by saying that short memories are not enough to identify in which status (low, medium low, medium, medium high, high, see Table 2) is the index , too long memories mix together different status and then much of the information is lost in the average.
The one step transition probability matrix can be evaluated by considering the counting transition between the three random variables considered before. Then, the probability represents the transition probability from the actual wind speed state , to the wind speed state , given that the sojourn time spent in the state is equal to and the value of the process is . These probabilities can be computed as:
(2) 
where is the total number of transitions observed in the database from state to state in next period having a sojourn time spent in the wind speed equal to and the value of the index process equal to .
The ISMC model revealed to be particularly efficient in reproducing together the probability density function of wind speed and the autocorrelation function, see wind2 ().
3.2 Deterministic wind speed component
The speed of wind shows a diurnal behavior due to the alternation between night and day. In Figure 2, in which are plotted the ACF of real and simulated data, it is possible to note this sinusoidal trend (see Section 3.4 for a better explanation of the figure). To model this seasonality we add a deterministic component given by a sine wave to the indexed semiMarkov model:
(3) 
The value of the parameter has to be optimized according to the database used for the analysis. In our case and it has been obtained by minimizing the RMSE between the ACF of real and synthetic data by using a genetic algorithm.
3.3 Transition probability matrix
We computed the transition probability matrix by using equation to the wind speed database. Two examples of the estimated matrices are given in Tables 3 and 4. As described above, in the model the transition matrix do depend from initial and arrival states but also from the sojourn time and the value of the random variable U. In the example given here we show the transition matrices for and and for and respectively, evaluated from the original database with the sampling frequency of 10 minutes.
A first comparison between Table 3 and Table 4 reveals that the value of the index process affects seriously the transition probability to the next wind speed value. As a matter of example if , , , the probability to have a wind speed in next period is equal to , see Table 3. On the contrary, if , , , the probability to have a wind speed in next period becomes , see Table 4. The differences in the one step transition probabilities are significant and confirm the hypothesis that next wind speed depends also on the value of the index process. This fact shows that the index process should be used when dealing with wind speed data.
j=1  2  3  4  5  6  7  8  
i=1  0.7065  0.2856  0.0074  0.0001  0.0001  0.0001  0.0001  0.0000 
2  0.1546  0.7095  0.1310  0.0042  0.0004  0.0002  0.0001  0.0001 
3  0.0064  0.2779  0.6300  0.0800  0.0045  0.0008  0.0003  0.0003 
4  0.0005  0.0170  0.3227  0.5764  0.0773  0.0044  0.0011  0.0005 
5  0.0000  0.0054  0.0349  0.3737  0.4919  0.0753  0.0134  0.0054 
6  0.0000  0.0000  0.0000  0.0238  0.4048  0.3929  0.1786  0.0000 
7  0.0000  0.0000  0.0357  0.0357  0.0714  0.3571  0.2857  0.2143 
8  0.0000  0.0000  0.0000  0.0000  0.0435  0.0000  0.2174  0.7391 
j=1  2  3  4  5  6  7  8  
i=1  0.4900  0.4300  0.0700  0.0000  0.0100  0.0000  0.0000  0.0000 
2  0.1002  0.6171  0.2488  0.0242  0.0048  0.0016  0.0000  0.0032 
3  0.0048  0.1456  0.6323  0.1975  0.0185  0.0000  0.0014  0.0000 
4  0.0000  0.0154  0.2270  0.5886  0.1553  0.0113  0.0018  0.0006 
5  0.0000  0.0009  0.0268  0.2763  0.5638  0.1220  0.0092  0.0009 
6  0.0000  0.0000  0.0060  0.0301  0.3414  0.5120  0.1004  0.0100 
7  0.0000  0.0000  0.0000  0.0000  0.0467  0.3400  0.4600  0.1533 
8  0.0000  0.0000  0.0116  0.0233  0.0000  0.0465  0.2674  0.6512 
3.4 Model validation
We compute the ACF of real and synthetic data in order to assess the ability of the model to reproduce statistical properties of real wind speed data. We generate a synthetic time series by means of Monte Carlo simulation. The specific algorithm used for the generation of the trajectory can be found in wind2 (). If indicates wind speed, the time lagged autocorrelation of wind speed is defined as:
(4) 
The time lag was made to run from 10 minutes up to 100 hours. The ACF of real and the synthetic data are plotted in Figure 2. As it is possible to note, the ACF has a sinusoidal trend with a period of 24 hours. This behavior is reproduced by our model with the introduction of the deterministic wind speed component evaluated by the equation .
To asses the differences between the ACF of real and synthetic data we used the root mean square error (RMSE) which is defined as follows:
where and represent real data and synthetic one respectively, while is the length of the two series. For the ACF plotted in Figure 2 we obtained a RMSE equal to .
4 Results
4.1 Wind speed forecasting
In this section the ISMC model is used to forecast future wind speed states by using a one step ahead forecasting procedure, for different time horizons and for various time scales. Particularly, we tested our model using the previously described databases with a sampling frequency of 10 minutes, 30 minutes, 1 hour and 2 hours.
For each one of the sampling frequencies, the database is divided into two subsets: the first part is used to find the transition probability matrix (as described in the previous section), we will call this part the setting period; the second part is used to compare the model forecasting with real data (called testing period). As a first attempt to verify the model performance, we used two years of data as setting period and one year as testing. We will show in the paragraph 4.2 how to find the best setting period. Once the transition matrix is set, the forecasted states are computed as follows:
(5) 
where is the number of states in which wind speed is discretized and is the transition probability matrix. The formula represents the expected value of the next transition given that the present wind speed value is , the sojourn time spent in the state is equal to and the value of the index process is .
In Figure 3 we show the results obtained using our model for the four different time scales. In the figure the black continuous line represents real data while the dashed red line is the predicted series. In this figure the predicted series are long 100 time horizon (specific time depending on the sampling frequency).
Already from this figure, it is possible to note that the goodness of the prediction does not fall down at the increasing of the length of the forecasted series. To better verify this point, in Table 5 we show quantitative results of our forecasting model for all the considered time scales and for different time horizons. Particularly, we show mean and standard deviation of the RMSE between real and predicted data tested on 50 different forecasted series. Table 5 shows that the goodness of prediction remains almost constant even varying time scales and time horizons.







10 minutes  0.44 0.02  0.44 0.02  0.48 0.02  0.52 0.02  
30 minutes  0.48 0.01  0.50 0.01  0.56 0.01  0.62 0.01  
1 hour  0.54 0.01  0.54 0.01  0.61 0.01  0.64 0.01  
2 hour  0.56 0.01  0.59 0.01  0.65 0.01  0.69 0.01 
We compare our model with a simple persistence model. This simple method is often used, still today, in industry for its simplicity and for its efficiency for very shortterm predictions. It assumes that the wind speed at time is equal to the wind speed at time . Commonly this method is used to compare the behavior of new forecasting models pers (). Overall our model has a higher efficiency in the forecast for all the time scales and time horizons. The persistence model do not change its goodness of forecasting at varying of the time horizon. Then we compare our results with the persistence model at different time scales. For the frequency of 10 minutes, 30 minutes, 1 hour and 2 hours we have respectively an RMSE between the true series and the forecasted one generated through the persistence model of , , and . As is possible to note the persistence model has less precision on the forecasting of the wind speed with respect to our model and the standard deviation increases at the increasing of the time scale in contrast to our model that has a reduction of the variability at the increasing of the time scale.
4.2 Number of data optimization
A serious problem to deal with in applying a nonparametric model is that of data availability. An important point is that of establishing the dimension of the setting period needed for a correct implementation of the model. From one part, reducing the setting period may determine the goodness of prediction to drop down; on the other hand the availability of large database is time consuming and consequently not economically efficient and sometimes not statistically necessary. To fix this point as related to the ISMC model we computed the RMSE between real data and a forecasted time series of 1000 time horizon.
We show, in Figure 4, the results obtained for the 30 minutes sampling frequency. It can be noted that the RMSE, plotted as a function of the logarithm of the setting period length, after about 3000 data (corresponding roughly to 2 months) remains almost constant, suggesting that the use of a larger setting period is not necessary.
We repeated the same analysis for all the sampling frequency used obtaining: 20000 data (roughly 6 months) for 10 minutes sampling frequency, 2500 data (roughly 3 months) for 1 hour, and 2000 (roughly 5 months) for 2 hours. The decreasing in the number of data need to have a good forecasting is mainly due to the reduction of noise when the sampling frequency increases.
5 Discussion and conclusion
In previous works we presented new stochastic models, all based on a semiMarkov approach, to generate synthetic time series of wind speed. We showed that all the models perform better than corresponding Markov chain based models in reproducing statistical features of wind speed. Using these results, here, we tried to apply the model which we recognized to be the best among those, namely the indexed semiMarkov chain (ISMC) model, to forecast future wind speed in a specific site. The ISMC model is a nonparametric model and because of this it does not need any assumption on the distribution of wind speed and on wind speed variations.
In previous papers we showed that the ISMC model is able to reproduce correctly, and at the same time, both the probability distribution function of wind speed and the autocorrelation function.
The results presented in this paper show that the model can be efficiently used to forecast wind speed at different horizon times. The forecast performance is almost independent from the time horizon used to forecast; the model can be used without degradation during the considered horizon time, at different time scales (we showed this for time scales ranging from 10 minutes to 2 hours).
The number of data needed to reach a good forecast performance do depend on the time scale used for forecasting; the model always works better than a simple persistence model.
All these characteristics suggest that the advanced ISMC model may be used both for modeling wind speed data and for wind speed prediction. Therefore, it may be utilized as input data for any wind energy system.
References
Footnotes
 journal: Renewable Energy
References
 Guoa Z, Zhaob W, Luc H, Wang J. 2012. Multistep forecasting for wind speed using a modified EMDbased artificial neural network model. Renewable Energy 37: 241249.
 Bivona S, Bonanno G, Burlon R, Gurrera D, Leone C. 2011 Stochastic models for wind speed forecasting. Energy Conversion and Management 52: 11571165.
 Poggi P, Muselli M, Notton G, Cristofari C, Louche A. 2003. Forecasting and simulating wind speed in Corsica by using an autoregressive model. Energy Conversion and Management 44: 31773196.
 Kavasseri RG, Seetharaman . 2009. Dayahead wind speed forecasting using fARIMA models. Renewable Energy 34: 13881393.
 Ailliot P, Monbet V, Prevosto M. 2006. An autoregressive model with timevarying coefficients for wind fields. Environmetrics 17 : 107117.
 Shamshad A, Bawadi MA, Wan Hussin WMW, Majid TA, Sanusi SAM. 2005. First and second order Markov chain models for synthetic generation of wind speed time series. Energy 30: 693708.
 Nfaoui H, Essiarab H, Sayigh AAM. 2004. A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco. Renewable Energy 29: 14071418.
 Kantza H. Holsteina D, Ragwitzb M, Vitanov NK. 2004. Markov chain model for turbulent wind speed data. Physica A 342: 315321.
 Castino F, Festa R, Ratto CF. 1998. Stochastic modelling of wind velocities time series. Wind Engineering and Industrial Aerodynamics 74â76: 141151.
 Hocaoglua FO, Gerekb ÖN, Kurbanb M. 2010. A novel wind speed modeling approach using atmospheric pressure observations and hidden Markov models. Wind Engineering and Industrial Aerodynamics 98: 472481.
 Fawcett L, Walshaw D. 2006. Markov chain models for extreme wind speeds. Environmetrics 17: 795809.
 Youcef Ettoumi F, Sauvageot H, Adane AEH. 2003. Statistical bivariate modeling of wind using firstorder Markov chain and Weibull distribution. Renewable Energy 28: 17871802.
 Cadenasa E, Rivera W. 2010. Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMAANN model. Renewable Energy 35: 27322738.
 Turbelin G, Ngae P, Grignon M. 2009. Wavelet crosscorrelation analysis of wind speed series generated by ANN based models. Renewable Energy 34: 10241032.
 Pourmousavi Kania SA, Ardehali MM. 2011. Very shortterm wind speed prediction: A new artificial neural networkMarkov chain model. Energy Conversion and Management 52: 738745.
 Aksoy H, Toprak ZF, Aytek A, Ünal NE. 2004. Stochastic generation of hourly mean wind speed data. Renewable Energy 29: 21112131.
 Denison DGT, Dellaportas P, Mallick BK. 2001. Wind speed prediction in a complex terrain. Environmetrics 12: 499515.
 Pandey MD, Van Gelder PHAJM, Vrijling JK. 2003. Bootstrap simulations for evaluating the uncertainty associated with peaksoverthreshold estimates of extreme wind velocity. Environmetrics 14: 2743.
 Z. Guo, W. Zhao, H. Lu, J. Wang, Multistep forecasting for wind speed using a modified EMDbased artificial neural network model. Renewable Energy 37 (2012) 241249.
 M. Monfareda, H. Rastegar, H.M. Kojabadi, A new strategy for wind speed forecasting using artificial intelligent methods. Renewable Energy 34 (2009) 845848.
 G. Li, J. Shi, J. Zhou, Bayesian adaptive combination of shortterm wind speed forecasts from neural network models. Renewable Energy 36 (2011) 352359.
 H. Peng, F. Liu, X. Yang, A hybrid strategy of short term wind power prediction. Renewable Energy 50 (2013) 590595.
 R.G. Kavasseri, K. Seetharaman, Dayahead wind speed forecasting using fARIMA models. Renewable Energy 34 (2009) 1388â1393.
 H. Liua, H. Tiana, C. Chenb, Y. Lia, A hybrid statistical method to predict wind speed and wind power. Renewable Energy 35 (2010) 1857â1861
 S. SalcedoSanz, A.M. PerezBellido, E.G. OrtizGarcia, A. PortillaFigueras, L. Prieto, D. Paredes, Hybridizing the fifth generation mesoscale model with artificial neural networks for shortterm wind speed prediction. Renewable Energy 34 (2009) 1451â1457.
 I. SeguraHeras, G. EscrivaEscriva, M. AlcÃ¡zarOrtega, Wind farm electrical power production model for loadflow analysis. Renewable Energy 36 (2011) 1008â1013.
 K. Tar, S. Szegedi, A statistical model for estimating electricity produced by wind energy. Renewable Energy 36 (2011) 823828.
 Y. Jiang, Z. Song, A. Kusiak, Very shortterm wind speed forecasting with Bayesian structural break model. Renewable Energy 50 (2013) 637647.
 L. Lazic, G. Pejanovic, M. Zivkovic, Wind forecasts for wind power generation using the Eta model. Renewable Energy 35 (2010) 1236â1243.
 R.E. AbdelAal, M.A. Elhadidy, S.M. Shaahid, Modeling and forecasting the mean hourly wind speed time series using GMDHbased abductive networks. Renewable Energy 34 (2009) 1686â1699.
 D’Amico G, Petroni F, Prattico F, Wind speed modeled as an indexed semiMarkov process, Accepted for publication on Environmetrics, 2013.
 D’Amico G., Ageusage semiMarkov models, Applied Mathematical Modelling, 35, (2011), 43544366.
 D’Amico G. and F. Petroni, A semiMarkov model with memory for price changes, Journal of Statistical Mechanics: Theory and Experiment (2011) P12009.
 G. D’Amico and F. Petroni, Weightedindexed semiMarkov models for modeling financial returns, Journal of Statistical Mechanics: Theory and Experiment, (2012) P07015.
 D’Amico G, Petroni F, Prattico F. 2013. First and second order semiMarkov chains for wind speed modeling. Physica A 392: 11941201.
 D’Amico G, Petroni F, Prattico F. 2013. Reliability measures of second order semiMarkov chain applied to wind energy production. Journal of Renewable Energy 2013.
 http : //www.lsilastem.it/meteo/page/dwnldata.aspx
 S. Saurabh, H. Zareipour, A Review of Wind Power and Wind Speed Forecasting Methods With Different Time Horizons. IEEE North American Power Symposium (NAPS), 2010.