Wind speed forecasting at different time scales: a non parametric approach

Wind speed forecasting at different time scales: a non parametric approach


The prediction of wind speed is one of the most important aspects when dealing with renewable energy. In this paper we show a new nonparametric model, based on semi-Markov chains, to predict wind speed. Particularly we use an indexed semi-Markov model, that reproduces accurately the statistical behavior of wind speed, to forecast wind speed one step ahead for different time scales and for very long time horizon maintaining the goodness of prediction. In order to check the main features of the model we show, as indicator of goodness, the root mean square error between real data and predicted ones and we compare our forecasting results with those of a persistence model.

Wind speed; forecasting model; indexed semi-Markov chains;

1 Introduction

The variations of wind speed, in a certain site, are strictly related to the economic aspects of a wind farm, such as maintenance operations, especially in the off shore farms, pitch angle control on new wind turbines and evaluation of a new site. Many researchers are working proposing new models that can allow the prediction of wind speed, minutes, hours or days ahead. Many of these models are based on neural networks 14 (); 03 (), autoregressive models 05 (); 07 (); 753 (), Markov chains sha05 (); nfa04 (); 06 (); 01 (); 13 (); 794 (), hybrid models where the previous mentioned models are combined you03 (); 01 (); 08 (); 10 (); 11 (); 16 (); 480 (); 555 (); fore4 (); fore6 (); fore8 (); fore13 (); f1 (); f2 () and other less used models fore1 (); fore2 (); fore3 (); fore7 (); fore9 (); fore11 (). Often, these models are either focused on specific time scale forecasting, or synthetic time series generation. Instead, our model can be used both for time series generation and for forecasting at different time scales.

The approach we propose here is based on indexed semi-Markov chain (ISMC) model that was advanced by the same authors in wind2 () and applied to the generation of synthetic wind speed time series. In wind2 () we showed that our model is able to reproduce correctly the statistical behavior of wind speed. The ISMC model is a nonparametric model because it does not require any assumption on the form of the distribution function of wind speed. In this work we use the same model, slightly modified by adding a daily deterministic component, to forecast future values of wind speed. We will show that this model performs better than a simple persistence model, by comparing the root mean square errors. The ISMC model is able to forecast wind speed at different time scale without loosing the goodness of forecasting which is almost independent from the time horizon. Another important aspect addressed by this work is the number of data needed to have a good forecast. With this aim we will show the root mean square error as a function of the data used to calibrate the model.

The paper is organized as follows. First of all, in Section 2, we describe the database used for the analysis. In Section 3, we present the model and its validation. Then, in Section 4, we present results of the wind speed forecasting through an indicator of goodness and comparison with the persistence model. Finally in Section 5 we present some concluding remarks.

2 Database

The database used for the analysis in this work is freely available from data () and is composed of more than 230000 data of wind speed collected every 10 minutes. The weather station of L.S.I. -Lastem is situated in Italy at N 45 28’ 14,9” E 9 22’ 19,9” and at 107 of altitude. The station uses a combined speed-direction anemometer at 22 above the ground. It has a measurement range that goes from 0 to 60 , a threshold of 0,38 and a resolution of 0,05 . The database and its empirical probability density function are represented in Figure 1.

Figure 1: Database and its probability density distribution.

We discretized wind speed into 8 states (see Table 1) chosen to cover all the wind speed distribution. Table 1 shows the wind speed states with their related wind speed values.

Sate Wind speed range
1 0 to 1
2 1 - 2
3 2 - 3
4 3 - 4
5 4 - 5
6 5 - 6
7 6 - 7
8 7
Table 1: Wind speed discretization

In order to analyze the behavior at different time scales, we resampled the data at different sampling frequencies: namely 30 minutes, 1 hour and 2 hours.

3 Model

3.1 The indexed semi-Markov chain model

The general formulation of the ISMC as developed in references dami11a (), dami11b (), dami12b () and wind2 () is here discussed informally.

Semi-Markov processes have similar idea as those that generate Markov processes. The processes are both described by a set of finite states whose transitions are ruled by a transition probability matrix. The semi-Markov process differs from the Markov process because the transition times are generated according to random variables. Indeed, the time between transitions is random and may be modeled by means of any type of distribution functions. In studies concerning wind speed modeling the states indicates discretized wind speed at the nth transition and the time in which the nth change of wind speed occurs.

In dami12 (); wind4 (), different semi-Markov models were applied to the wind speed modeling and it was shown that the semi-Markov models over perform the Markov models and therefore they are to be preferred in the modeling of wind speed to Markovian models.

In order to better represent the statistical characteristics of wind speed, in a recent article, the idea of an ISMC was advanced in the field of wind speed, see wind2 (). The novelty, with respect to the semi-Markov case, consists in the introduction of a third random variable defined as follow:


This variable can be interpreted as a moving average of order executed on the series of the past wind speed values with weights given by the fractions of sojourn times in that wind speed with respect to the interval time on which the average is executed . Also the process has been discretized, Table 2 shows the states of the process and their values.

Sate range
1 0 to 2.1
2 2.1 - 2.6
3 2.6 - 3.4
4 3.4 - 6
5 6
Table 2: processes discretization

The parameter must be optimized as a function of the specific database. The optimization is made by finding the value of that realize the minimum of the root mean square error (RMSE) between the autocorrelation functions (ACF) of real and simulated data, see wind2 (). In our analysis .

The reasons to introduce this index of memory are found in the presence of a strong autocorrelation that characterize the wind speed process. In the same work we have shown that if a too small memory is used, the autocorrelation is already persistent but decreases faster than real data. With a longer memory the autocorrelation remain high for a very long period and also its value is very close to that of real data. If is increased further the autocorrelation drops again to small values. This behavior suggests the existence of an optimal memory . In our opinion one can justify this behavior by saying that short memories are not enough to identify in which status (low, medium low, medium, medium high, high, see Table 2) is the index , too long memories mix together different status and then much of the information is lost in the average.

The one step transition probability matrix can be evaluated by considering the counting transition between the three random variables considered before. Then, the probability represents the transition probability from the actual wind speed state , to the wind speed state , given that the sojourn time spent in the state is equal to and the value of the process is . These probabilities can be computed as:


where is the total number of transitions observed in the database from state to state in next period having a sojourn time spent in the wind speed equal to and the value of the index process equal to .

The ISMC model revealed to be particularly efficient in reproducing together the probability density function of wind speed and the autocorrelation function, see wind2 ().

3.2 Deterministic wind speed component

The speed of wind shows a diurnal behavior due to the alternation between night and day. In Figure 2, in which are plotted the ACF of real and simulated data, it is possible to note this sinusoidal trend (see Section 3.4 for a better explanation of the figure). To model this seasonality we add a deterministic component given by a sine wave to the indexed semi-Markov model:


The value of the parameter has to be optimized according to the database used for the analysis. In our case and it has been obtained by minimizing the RMSE between the ACF of real and synthetic data by using a genetic algorithm.

3.3 Transition probability matrix

We computed the transition probability matrix by using equation to the wind speed database. Two examples of the estimated matrices are given in Tables 3 and 4. As described above, in the model the transition matrix do depend from initial and arrival states but also from the sojourn time and the value of the random variable U. In the example given here we show the transition matrices for and and for and respectively, evaluated from the original database with the sampling frequency of 10 minutes.

A first comparison between Table 3 and Table 4 reveals that the value of the index process affects seriously the transition probability to the next wind speed value. As a matter of example if , , , the probability to have a wind speed in next period is equal to , see Table 3. On the contrary, if , , , the probability to have a wind speed in next period becomes , see Table 4. The differences in the one step transition probabilities are significant and confirm the hypothesis that next wind speed depends also on the value of the index process. This fact shows that the index process should be used when dealing with wind speed data.

j=1 2 3 4 5 6 7 8
i=1 0.7065 0.2856 0.0074 0.0001 0.0001 0.0001 0.0001 0.0000
2 0.1546 0.7095 0.1310 0.0042 0.0004 0.0002 0.0001 0.0001
3 0.0064 0.2779 0.6300 0.0800 0.0045 0.0008 0.0003 0.0003
4 0.0005 0.0170 0.3227 0.5764 0.0773 0.0044 0.0011 0.0005
5 0.0000 0.0054 0.0349 0.3737 0.4919 0.0753 0.0134 0.0054
6 0.0000 0.0000 0.0000 0.0238 0.4048 0.3929 0.1786 0.0000
7 0.0000 0.0000 0.0357 0.0357 0.0714 0.3571 0.2857 0.2143
8 0.0000 0.0000 0.0000 0.0000 0.0435 0.0000 0.2174 0.7391
Table 3: Transition matrix for and .
j=1 2 3 4 5 6 7 8
i=1 0.4900 0.4300 0.0700 0.0000 0.0100 0.0000 0.0000 0.0000
2 0.1002 0.6171 0.2488 0.0242 0.0048 0.0016 0.0000 0.0032
3 0.0048 0.1456 0.6323 0.1975 0.0185 0.0000 0.0014 0.0000
4 0.0000 0.0154 0.2270 0.5886 0.1553 0.0113 0.0018 0.0006
5 0.0000 0.0009 0.0268 0.2763 0.5638 0.1220 0.0092 0.0009
6 0.0000 0.0000 0.0060 0.0301 0.3414 0.5120 0.1004 0.0100
7 0.0000 0.0000 0.0000 0.0000 0.0467 0.3400 0.4600 0.1533
8 0.0000 0.0000 0.0116 0.0233 0.0000 0.0465 0.2674 0.6512
Table 4: Transition matrix for and .

3.4 Model validation

We compute the ACF of real and synthetic data in order to assess the ability of the model to reproduce statistical properties of real wind speed data. We generate a synthetic time series by means of Monte Carlo simulation. The specific algorithm used for the generation of the trajectory can be found in wind2 (). If indicates wind speed, the time lagged autocorrelation of wind speed is defined as:


The time lag was made to run from 10 minutes up to 100 hours. The ACF of real and the synthetic data are plotted in Figure 2. As it is possible to note, the ACF has a sinusoidal trend with a period of 24 hours. This behavior is reproduced by our model with the introduction of the deterministic wind speed component evaluated by the equation .

Figure 2: Autocorrelation function of real and synthetic data

To asses the differences between the ACF of real and synthetic data we used the root mean square error (RMSE) which is defined as follows:

where and represent real data and synthetic one respectively, while is the length of the two series. For the ACF plotted in Figure 2 we obtained a RMSE equal to .

4 Results

4.1 Wind speed forecasting

In this section the ISMC model is used to forecast future wind speed states by using a one step ahead forecasting procedure, for different time horizons and for various time scales. Particularly, we tested our model using the previously described databases with a sampling frequency of 10 minutes, 30 minutes, 1 hour and 2 hours.

For each one of the sampling frequencies, the database is divided into two subsets: the first part is used to find the transition probability matrix (as described in the previous section), we will call this part the setting period; the second part is used to compare the model forecasting with real data (called testing period). As a first attempt to verify the model performance, we used two years of data as setting period and one year as testing. We will show in the paragraph 4.2 how to find the best setting period. Once the transition matrix is set, the forecasted states are computed as follows:


where is the number of states in which wind speed is discretized and is the transition probability matrix. The formula represents the expected value of the next transition given that the present wind speed value is , the sojourn time spent in the state is equal to and the value of the index process is .

In Figure 3 we show the results obtained using our model for the four different time scales. In the figure the black continuous line represents real data while the dashed red line is the predicted series. In this figure the predicted series are long 100 time horizon (specific time depending on the sampling frequency).

Figure 3: Wind speed forecasting one step ahead for 100 time horizon. (a) 10 minutes database, (b) 30 minutes database, (c) 1 hour database, (d) 2 hours database.

Already from this figure, it is possible to note that the goodness of the prediction does not fall down at the increasing of the length of the forecasted series. To better verify this point, in Table 5 we show quantitative results of our forecasting model for all the considered time scales and for different time horizons. Particularly, we show mean and standard deviation of the RMSE between real and predicted data tested on 50 different forecasted series. Table 5 shows that the goodness of prediction remains almost constant even varying time scales and time horizons.


10 minutes 0.44 0.02 0.44 0.02 0.48 0.02 0.52 0.02
30 minutes 0.48 0.01 0.50 0.01 0.56 0.01 0.62 0.01
1 hour 0.54 0.01 0.54 0.01 0.61 0.01 0.64 0.01
2 hour 0.56 0.01 0.59 0.01 0.65 0.01 0.69 0.01
Table 5: RMSE between real data and forecasted series for different time scale and time horizon.

We compare our model with a simple persistence model. This simple method is often used, still today, in industry for its simplicity and for its efficiency for very short-term predictions. It assumes that the wind speed at time is equal to the wind speed at time . Commonly this method is used to compare the behavior of new forecasting models pers (). Overall our model has a higher efficiency in the forecast for all the time scales and time horizons. The persistence model do not change its goodness of forecasting at varying of the time horizon. Then we compare our results with the persistence model at different time scales. For the frequency of 10 minutes, 30 minutes, 1 hour and 2 hours we have respectively an RMSE between the true series and the forecasted one generated through the persistence model of , , and . As is possible to note the persistence model has less precision on the forecasting of the wind speed with respect to our model and the standard deviation increases at the increasing of the time scale in contrast to our model that has a reduction of the variability at the increasing of the time scale.

4.2 Number of data optimization

A serious problem to deal with in applying a nonparametric model is that of data availability. An important point is that of establishing the dimension of the setting period needed for a correct implementation of the model. From one part, reducing the setting period may determine the goodness of prediction to drop down; on the other hand the availability of large database is time consuming and consequently not economically efficient and sometimes not statistically necessary. To fix this point as related to the ISMC model we computed the RMSE between real data and a forecasted time series of 1000 time horizon.

We show, in Figure 4, the results obtained for the 30 minutes sampling frequency. It can be noted that the RMSE, plotted as a function of the logarithm of the setting period length, after about 3000 data (corresponding roughly to 2 months) remains almost constant, suggesting that the use of a larger setting period is not necessary.

Figure 4: RMSE between real wind speed and forecasted series as a function of the logarithm of the number of data.

We repeated the same analysis for all the sampling frequency used obtaining: 20000 data (roughly 6 months) for 10 minutes sampling frequency, 2500 data (roughly 3 months) for 1 hour, and 2000 (roughly 5 months) for 2 hours. The decreasing in the number of data need to have a good forecasting is mainly due to the reduction of noise when the sampling frequency increases.

5 Discussion and conclusion

In previous works we presented new stochastic models, all based on a semi-Markov approach, to generate synthetic time series of wind speed. We showed that all the models perform better than corresponding Markov chain based models in reproducing statistical features of wind speed. Using these results, here, we tried to apply the model which we recognized to be the best among those, namely the indexed semi-Markov chain (ISMC) model, to forecast future wind speed in a specific site. The ISMC model is a nonparametric model and because of this it does not need any assumption on the distribution of wind speed and on wind speed variations.

In previous papers we showed that the ISMC model is able to reproduce correctly, and at the same time, both the probability distribution function of wind speed and the autocorrelation function.

The results presented in this paper show that the model can be efficiently used to forecast wind speed at different horizon times. The forecast performance is almost independent from the time horizon used to forecast; the model can be used without degradation during the considered horizon time, at different time scales (we showed this for time scales ranging from 10 minutes to 2 hours).

The number of data needed to reach a good forecast performance do depend on the time scale used for forecasting; the model always works better than a simple persistence model.

All these characteristics suggest that the advanced ISMC model may be used both for modeling wind speed data and for wind speed prediction. Therefore, it may be utilized as input data for any wind energy system.



  1. journal: Renewable Energy


  1. Guoa Z, Zhaob W, Luc H, Wang J. 2012. Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model. Renewable Energy 37: 241-249.
  2. Bivona S, Bonanno G, Burlon R, Gurrera D, Leone C. 2011 Stochastic models for wind speed forecasting. Energy Conversion and Management 52: 1157-1165.
  3. Poggi P, Muselli M, Notton G, Cristofari C, Louche A. 2003. Forecasting and simulating wind speed in Corsica by using an autoregressive model. Energy Conversion and Management 44: 3177-3196.
  4. Kavasseri RG, Seetharaman . 2009. Day-ahead wind speed forecasting using f-ARIMA models. Renewable Energy 34: 1388-1393.
  5. Ailliot P, Monbet V, Prevosto M. 2006. An autoregressive model with time-varying coefficients for wind fields. Environmetrics 17 : 107-117.
  6. Shamshad A, Bawadi MA, Wan Hussin WMW, Majid TA, Sanusi SAM. 2005. First and second order Markov chain models for synthetic generation of wind speed time series. Energy 30: 693-708.
  7. Nfaoui H, Essiarab H, Sayigh AAM. 2004. A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco. Renewable Energy 29: 1407-1418.
  8. Kantza H. Holsteina D, Ragwitzb M, Vitanov NK. 2004. Markov chain model for turbulent wind speed data. Physica A 342: 315-321.
  9. Castino F, Festa R, Ratto CF. 1998. Stochastic modelling of wind velocities time series. Wind Engineering and Industrial Aerodynamics 74—76: 141-151.
  10. Hocaoglua FO, Gerekb ÖN, Kurbanb M. 2010. A novel wind speed modeling approach using atmospheric pressure observations and hidden Markov models. Wind Engineering and Industrial Aerodynamics 98: 472-481.
  11. Fawcett L, Walshaw D. 2006. Markov chain models for extreme wind speeds. Environmetrics 17: 795-809.
  12. Youcef Ettoumi F, Sauvageot H, Adane AEH. 2003. Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribution. Renewable Energy 28: 1787-1802.
  13. Cadenasa E, Rivera W. 2010. Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA-ANN model. Renewable Energy 35: 2732-2738.
  14. Turbelin G, Ngae P, Grignon M. 2009. Wavelet cross-correlation analysis of wind speed series generated by ANN based models. Renewable Energy 34: 1024-1032.
  15. Pourmousavi Kania SA, Ardehali MM. 2011. Very short-term wind speed prediction: A new artificial neural network-Markov chain model. Energy Conversion and Management 52: 738-745.
  16. Aksoy H, Toprak ZF, Aytek A, Ünal NE. 2004. Stochastic generation of hourly mean wind speed data. Renewable Energy 29: 2111-2131.
  17. Denison DGT, Dellaportas P, Mallick BK. 2001. Wind speed prediction in a complex terrain. Environmetrics 12: 499-515.
  18. Pandey MD, Van Gelder PHAJM, Vrijling JK. 2003. Bootstrap simulations for evaluating the uncertainty associated with peaks-over-threshold estimates of extreme wind velocity. Environmetrics 14: 27-43.
  19. Z. Guo, W. Zhao, H. Lu, J. Wang, Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model. Renewable Energy 37 (2012) 241-249.
  20. M. Monfareda, H. Rastegar, H.M. Kojabadi, A new strategy for wind speed forecasting using artificial intelligent methods. Renewable Energy 34 (2009) 845-848.
  21. G. Li, J. Shi, J. Zhou, Bayesian adaptive combination of short-term wind speed forecasts from neural network models. Renewable Energy 36 (2011) 352-359.
  22. H. Peng, F. Liu, X. Yang, A hybrid strategy of short term wind power prediction. Renewable Energy 50 (2013) 590-595.
  23. R.G. Kavasseri, K. Seetharaman, Day-ahead wind speed forecasting using f-ARIMA models. Renewable Energy 34 (2009) 1388–1393.
  24. H. Liua, H. Tiana, C. Chenb, Y. Lia, A hybrid statistical method to predict wind speed and wind power. Renewable Energy 35 (2010) 1857–1861
  25. S. Salcedo-Sanz, A.M. Perez-Bellido, E.G. Ortiz-Garcia, A. Portilla-Figueras, L. Prieto, D. Paredes, Hybridizing the fifth generation mesoscale model with artificial neural networks for short-term wind speed prediction. Renewable Energy 34 (2009) 1451–1457.
  26. I. Segura-Heras, G. Escriva-Escriva, M. Alcázar-Ortega, Wind farm electrical power production model for loadflow analysis. Renewable Energy 36 (2011) 1008–1013.
  27. K. Tar, S. Szegedi, A statistical model for estimating electricity produced by wind energy. Renewable Energy 36 (2011) 823-828.
  28. Y. Jiang, Z. Song, A. Kusiak, Very short-term wind speed forecasting with Bayesian structural break model. Renewable Energy 50 (2013) 637-647.
  29. L. Lazic, G. Pejanovic, M. Zivkovic, Wind forecasts for wind power generation using the Eta model. Renewable Energy 35 (2010) 1236–1243.
  30. R.E. Abdel-Aal, M.A. Elhadidy, S.M. Shaahid, Modeling and forecasting the mean hourly wind speed time series using GMDH-based abductive networks. Renewable Energy 34 (2009) 1686–1699.
  31. D’Amico G, Petroni F, Prattico F, Wind speed modeled as an indexed semi-Markov process, Accepted for publication on Environmetrics, 2013.
  32. D’Amico G., Age-usage semi-Markov models, Applied Mathematical Modelling, 35, (2011), 4354-4366.
  33. D’Amico G. and F. Petroni, A semi-Markov model with memory for price changes, Journal of Statistical Mechanics: Theory and Experiment (2011) P12009.
  34. G. D’Amico and F. Petroni, Weighted-indexed semi-Markov models for modeling financial returns, Journal of Statistical Mechanics: Theory and Experiment, (2012) P07015.
  35. D’Amico G, Petroni F, Prattico F. 2013. First and second order semi-Markov chains for wind speed modeling. Physica A 392: 1194-1201.
  36. D’Amico G, Petroni F, Prattico F. 2013. Reliability measures of second order semi-Markov chain applied to wind energy production. Journal of Renewable Energy 2013.
  37. http : //
  38. S. Saurabh, H. Zareipour, A Review of Wind Power and Wind Speed Forecasting Methods With Different Time Horizons. IEEE North American Power Symposium (NAPS), 2010.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description