# An Empirical Analysis of Constrained Support Vector Quantile Regression

for Nonparametric Probabilistic Forecasting of Wind Power

###### Abstract

Uncertainty analysis in the form of probabilistic forecasting can provide significant improvements in decision making processes in the smart power gird for better integrating renewable energies such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. This paper analyzes the effectiveness of an approach for nonparametric probabilistic forecasting of wind power that combines support vector machines and nonlinear quantile regression with non-crossing constraints. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 20%, 40%, 60% and 80% prediction intervals which are evaluated using the pinball loss function and reliability measures. Three benchmark models are used for comparison where results demonstrate the proposed approach leads to significantly better performance while preventing the problem of overlapping quantile estimates.

An Empirical Analysis of Constrained Support Vector Quantile Regression

for Nonparametric Probabilistic Forecasting of Wind Power

Kostas Hatalis, Shalinee Kishore Dept. of Electrical and Computer Engineering Lehigh University {kmh511,shk2}@lehigh.edu Katya Scheinberg Dept. of Industrial Engineering Lehigh University kas410@lehigh.edu Alberto Lamadrid Dept. of Economics Lehigh University all512@lehigh.edu

Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

## Introduction

Predicting and managing uncertainty in the production of wind power is one of the biggest challenges facing its integration into the smart grid. Forecasting uncertainty in wind is needed for many operational applications in a wind farm from turbine and storage control to bidding and trading in energy markets. Forecasting horizons can be categorized into three main time scales: short-term looking out several hours or days, long-term looking out to weeks or a month, and seasonal. Traditionally wind power prediction is based on deterministic point forecasts where they provide an expected output for a given look-ahead time. These forecasts however lack uncertainty information. As such a large research effort has been taken recently by the renewables forecasting community (?) to produce full probabilistic predictions which derive quantitative information on the associated uncertainty of power output. Although various methods have been proposed, it is still a challenge to make accurate and robust probabilistic predictions for highly nonlinear and complex data, such as wind.

Probabilistic wind models are based on either meteorological ensembles that are obtained by a weather model (?) or on statistical learning methods (?). Focusing on statistical learning, these methods can be applied to forecast full predictive distributions in the form of quantiles or prediction intervals. For instance, in (?) prediction intervals are estimated by adaptive re-sampling which is a common probabilistic forecasting strategy. Quantile regression (QR) is another very popular approach. In (?) local QR is applied to estimate different quantiles while In (?) spline based QR is used to estimate quantiles of wind power. In (?) quantile loss gradient boosted machines are used to estimate 99 quantiles and in (?) multiple quantile regression is used to predict a full distribution with optimization done using the alternating direction method of multipliers. A thorough overview of probabilistic wind power forecasting is provided in (?).

In most of these approaches, estimation of each quantile is conducted independently. This could lead to the quantile cross over problem where a lower quantile overlaps a higher one. This is undesirable as it violates the principle of distribution functions where their associated inverse functions should be monotone increasing. A way to prevent this issue is to utilize a simple heuristic of reordering estimated quantiles, however this does not have much theoretical basis and may lead to inappropriate quantiles.

The solution then is to optimize quantiles together with non-crossing constraints. In (?) a constrained support vector quantile regression (CSVQR) method was developed with non-crossing constraints where it was used to fit quantiles on static data. This formulation is re-purposed here for probabilistic forecasting. Other machine learning frameworks have been used before for uncertainty prediction of renewables such as nearest neighbors (?), neural networks (?), and extreme learning machines (?) but support vector machines (SVMs) have yet to be examined for wind uncertainty forecasting. We propose that SVMs are not only effective in long term prediction due to their ability to handle nonlinear data via kernels but can be easily extended with constraints to ensure non-overlapping quantile estimates. Our study is the first to showcase the use of CSVQR with a sliding window of training data as well as showcase the effectiveness of constraints to ensure monotonically increasing quantiles for probabilistic prediction. We provide the derivation of CSVQR and analysis of experimental results on publicly available wind data. Several common benchmark methods are used for comparison.

## Nonparametric Probabilistic Forecasting

This sections highlights the underlying theory and evaluation methods used in probabilistic forecasting. For a random variable such as wind power at time its probability density function is defined as and its the cumulative distribution function as . If is a strictly increasing, the quantile with proportion of the random variable is uniquely defined as the value such that or equivalently as the inverse of the distribution function . A quantile forecast with nominal proportion is an estimate of the true quantile for the lead time , given predictor values (such as numerical wind speed forecasts). Prediction intervals then give a range of possible values within which an observed value is expected to lie with a certain probability . A prediction interval produced at time for future horizon is defined by its lower and upper bounds, which are the quantile forecasts whose nominal proportions and are such that .

If it is assumed the future density function will take a certain form then this is called parametric probabilistic forecasting. For a nonlinear and bounded process such as wind generation, probability distributions of future wind power for instance may be skewed and heavy-tailed distributions (?). Else if no assumption is made about the shape of the distribution, a nonparametric probabilistic forecast (?) can be made of the density function by gathering a set of quantiles forecasts such that with chosen nominal proportions spread on the unit interval. In this paper we consider nonparametric forecasting of wind power on the resolution of one hour (predicting outwards to a month worth of values). On a short time scale of an hour, the wind density may fluctuate therefore making nonparametric forecasting more ideal then fitting a parametric density (?).

For nonparametric probabilistic forecasting quantile regression, introduced by (?), is a popular choice for estimating conditional quantiles. It is closely related to models for the conditional median (?). Minimizing the mean absolute function leads to an estimate of the conditional median of a prediction. By applying asymmetric weights to errors through a tilted form of the absolute value function the conditional quantiles of a predictive distribution can be computed. To achieve this the pin ball loss function is used, which is defined by

where . A visualization of the pinball function with several different values of is shown in Fig. 1. Given a vector of predictors where , weights and intercept coefficient in a linear regression fashion, the conditional quantile is given by . The weights and intercept can be estimated by solving the following minimizing problem

(1) |

where is the observed value of the predictand. The problem in Eq. (1) can be minimized by linear programming.

### Evaluation Methods

In probabilistic forecasting it is important to evaluate the quantile estimates and derived predictive intervals. Prediction intervals (PIs) show where future wind power observations are expected to lie with an assigned probability termed as the PI nominal confidence (PINC) . The coverage probability of estimated PIs are expected to eventually reach a nominal level of confidence over the test data. A good measure for reliability which shows target coverage of the PIs is the PI coverage probability (PICP) which is defined by

is the indicator of PICP and is the number of test samples. For reliable PIs, the examined PICP should be close to its corresponding PINC. A related assessment index is the average coverage error (ACE) which is defined by

To ensure PIs with high reliability, the ACE should be as close to zero as possible. Next to evaluate quantile estimates and full predictive densities it is important to use the pinball function as an assessment score called the quantile score (Q-score). The Q-score is obtained for every estimated quantile and is averaged over all target quantiles for all future time steps. For a quantile forecast the Q-score is defined as

where is the observation used for forecast evaluation. A lower Q-score indicates a better forecast.

## Support Vector Quantile Regression

To fit the nonlinearity of wind data, nonlinear quantile regression (NQR) can be utilized. NQR is implemented by projecting an input vector into a potentially higher dimensional feature space using a nonlinear mapping function implicitly defined by a kernel . This gives the functional form of where is the -th quantile of the distribution of conditional on the values of , is a vector of parameters. The NQR simplifies into linear quantile regression if . To solve the NQR problem it can be expressed by the following formulation with added penalty to prevent overfitting

By introducing slack variables and the problem can be re-written as a support vector quantile regression problem

(2) |

### Non-crossing Quantile Constraints

In Eq. (2) a single quantile is estimated. To estimate multiple quantiles this formulation could be run to solve for different ’s independently. However in doing so quantiles may cross each other which is not desirable since it violates the principle of monotone increasing inverse density functions. To prevent this, constraints need to be introduced (?). are defined as the orders of conditional quantiles to be estimated. To ensure these quantiles do not cross each other the following constraint is needed . With this constraint the primal problem of the non-crossing conditional quantile estimator is given by

(3) |

The Largrangian for the problem is then defined by

(4) |

where a Lagrange multiplier is introduced for , , and . By letting the partial derivatives of with respect to be zero, the following is obtained

(5) |

Partial derivatives of the other primal variables and are

(6) | ||||

(7) |

Plugging these equalities back into Eq. (4) the following dual minimization problem can be obtained

(8) |

From this dual formulation the conditional quantile can then be given by

(9) |

Since the dual form is a quadratic programming (QP) problem it can be solved by a number of QP methods. For testing the constrained SVQR (CSVQR) method the radial basis function (RBF) kernel is utilized as it is a popular kernel function choice for support vector machines. Other kernels were tested on the case data sets described in the next section but resulted in poor results. The RBF kernel, given two samples and which are represented as feature vectors, is calculated as

An advantage of a RBF kernel is that it can project vectors into an infinite dimensional feature space. In order to quickly solve for conditional quantile estimates sequential minimization optimization (?) is applied to Eq. (8).

## Application To The GEFCom2014 Dataset

Data for this case study comes from the publicly available Global Energy Forecasting Competition 2014 (?). The goal of the competition was to design parametric or nonparametric forecasting methods that would allow conditional predictive densities of the wind power generation to be described as a function of input data which were future weather forecasts and/or past wind power. Data is provided for the years of 2012 and 2013 from 10 wind farms titled Zone 1 to Zone 10. The predictors are numerical weather predictions (NWPs) in the form of wind speeds at an hourly resolution at two heights, 10m and 100m above ground level. These forecasts are for the zonal and meridional wind components (denoted U and V). It was up to users to deduce exact wind speed, direction, and other wind features if necessary. These NWPs were provided for the exact locations of the wind farms. Additionally, power measurements at the various wind farms, with an hourly resolution, are also provided. All power measurements are normalized by the nominal capacity of their wind farm. The goal in forecasting was to learn to associate the provided NWPs (or derived features) with wind power. Then NWPs are provided for the forecasting horizon of one month and it is up to a learning model to use those NWPs as input to a learning model to predict quantiles at each future time step. Fig. 2 showcases an example month worth of data where Fig. 2.a shows the four NWP given and Fig. 2.b shows their corresponding normalized wind power output.

In our analysis of CSVQR we used the summer months of June 2013 to August 2013 and fall months of September 2013 to November 2013 for testing from Zone 1. Training was done using a sliding window of three previous months to forecast the fourth month. For instance to predict June training was done on observed data from March to May, then to predict July training was done from April to June, etc. Thirteen features were derived from the raw data for training the CSVQR model. Features used are derived wind speeds at 10m and 100m, wind direction at 10m and 100m, wind energy at 10m and 100m, wind shear, wind energy difference (between 10m and 100m), wind direction difference (between 10m and 100m), and included in training are also the four raw wind speeds at 10m and 100m for U and V directions. All features were normalized between 0 and 1. Denoting and as the wind components and as the energy density (we used ), the equations used to compute wind speed (ws), wind direction (wd), wind energy (we), and wind shear (wsh) are

To empirically analyze the CSVQR model as an appropriate method for wind forecasting it is compared with two industry models and a naive model that are used for benchmarking in probabilistic wind forecasting applications (?; ?; ?). The first is called the persistence method which is the most common benchmark and is considered difficult to outperform for short-term forecasting. This method corresponds to the persistence distribution and is formed by the most recent observations. For this case study, the past 12 hours of wind power observations were used to form the persistence distribution. Second method is the climatology approach where its predictive distribution is unconditional and based on all available past wind power observations. It is considered harder to beat in long-term forecasting. Lastly, the uniform distribution is used for a naive benchmark method where it assumes all wind power values at each time step occur with equal probability.

### Results

To visualize a probabilistic forecast Fig. 3 shows an example prediction for 80%, 60% 40%, and 20% prediction intervals for the month of July 2013. Observed wind power is shown in red. From such probabilistic forecasts it is then possible to derive full predictive density functions following that the estimated conditional quantiles are nondecreasing (?). Evaluation results for reliability of probabilistic forecasts in the form of prediction intervals of wind power over the months of June 2013 to November 2013 is shown in Table 1. Results are shown for the CSVQR method and for the climatology, persistence, and uniform benchmark methods. Evaluation metrics for the PINC are the PICP and ACE. For the month of June and October, the climatology method was slightly better but this was due to the fact that this model can yield wide intervals to cover more data. However in all other months CSVQR outperformed all three benchmarks by several magnitudes. To further fully evaluate the forecasts it is also important to look at the quantile score to measure the coverage of the estimated quantiles. Table 2 shows the summary of Q-scores averaged across all quantiles from all lookahead periods for every forecast month. Their standard deviation is also provided to quantify the amount of variation among the quantiles. The Q-scores of the proposed approach was very low and gave excellent probabilistic forecasts across all different months.

## Discussion

Wind power forecasting is crucial for many decision making problems in power systems operations, and is a vital component in integrating more wind into the power grid. Due to the chaotic nature of the wind it is often difficult to forecast. Uncertainty analysis in the form of probabilistic wind prediction can provide a better picture of future wind coverage. This paper studies a framework for probabilistic forecasting using support vector quantile regression with non-crossing constraints to ensure multiple quantiles can be predicted without overlapping each other. Effectiveness of the CSVQR approach is validated with the real world dataset of the Global Energy Forecasting Competition 2014. Forecasts are compared to common benchmarks and are evaluated using the quantile score and reliability metrics. Results show adequate reliability and low quantile scores across the prediction horizon, which verify effectiveness of the model for forecasting while preventing estimated quantiles from overlapping. Furthermore, this approach has the potential to be applied across a variety of domains. Future work will look into applying CSVQR to forecast electricity pricing and load demand for smart grid applications.

Month | PINC | CSVQR | Climatology | Persistence | Uniform | ||||
---|---|---|---|---|---|---|---|---|---|

PICP | ACE | PICP | ACE | PICP | ACE | PICP | ACE | ||

June 13 | 80% | 85.00 | 5.00 | 95.28 | 15.28 | 46.11 | 33.89 | 60.97 | 19.03 |

60% | 66.25 | 6.25 | 62.50 | 2.50 | 37.64 | 22.36 | 40.97 | 19.03 | |

40% | 45.56 | 5.56 | 42.92 | 2.92 | 30.56 | 9.44 | 23.47 | 16.53 | |

20% | 25.42 | 5.42 | 22.64 | 2.64 | 26.30 | 6.31 | 10.69 | 9.31 | |

July 13 | 80% | 78.49 | 1.50 | 76.08 | 3.92 | 12.77 | 67.23 | 59.27 | 20.73 |

60% | 56.04 | 3.95 | 55.38 | 4.62 | 6.72 | 53.28 | 36.96 | 23.04 | |

40% | 38.70 | 1.29 | 35.08 | 4.92 | 5.24 | 34.76 | 21.91 | 18.09 | |

20% | 20.96 | 0.96 | 16.80 | 3.20 | 2.55 | 17.45 | 10.08 | 9.92 | |

August 13 | 80% | 78.36 | 1.64 | 65.73 | 14.27 | 22.04 | 57.96 | 61.83 | 18.17 |

60% | 59.27 | 0.73 | 42.61 | 17.39 | 13.44 | 46.56 | 44.49 | 15.51 | |

40% | 40.46 | 0.46 | 25.94 | 14.06 | 7.80 | 32.20 | 30.11 | 9.89 | |

20% | 19.89 | 0.11 | 9.95 | 10.05 | 4.57 | 15.43 | 15.05 | 4.95 | |

September 13 | 80% | 79.03 | 0.97 | 81.81 | 1.81 | 31.53 | 48.47 | 60.69 | 19.31 |

60% | 60.69 | 0.69 | 59.30 | 0.70 | 23.75 | 36.25 | 35.56 | 24.44 | |

40% | 42.92 | 2.92 | 34.31 | 5.69 | 14.86 | 25.14 | 20.97 | 19.03 | |

20% | 22.36 | 2.36 | 15.83 | 4.17 | 5.97 | 14.03 | 9.31 | 10.69 | |

October 13 | 80% | 83.20 | 3.20 | 81.85 | 1.85 | 52.82 | 27.18 | 62.77 | 17.23 |

60% | 68.15 | 8.15 | 62.77 | 2.77 | 23.92 | 36.08 | 45.70 | 14.30 | |

40% | 52.55 | 12.55 | 46.24 | 6.24 | 6.85 | 33.15 | 28.76 | 11.24 | |

20% | 24.36 | 4.36 | 25.27 | 5.27 | 1.88 | 18.12 | 16.67 | 3.33 | |

November 13 | 80% | 80.42 | 0.42 | 90.14 | 10.14 | 25.83 | 54.17 | 72.36 | 7.64 |

60% | 59.31 | 0.69 | 75.00 | 15.00 | 15.14 | 44.86 | 48.75 | 11.25 | |

40% | 36.11 | 3.89 | 55.69 | 15.69 | 11.94 | 28.06 | 29.17 | 10.83 | |

20% | 16.53 | 3.47 | 29.03 | 9.03 | 10.42 | 9.58 | 13.19 | 6.81 |

Month | Method | Q-Score | SD |
---|---|---|---|

June 13 | CSVQR | 0.0404 | 0.0119 |

Climatology | 0.0628 | 0.0230 | |

Persistence | 0.0880 | 0.0406 | |

Uniform | 0.1105 | 0.0434 | |

July 13 | CSVQR | 0.0546 | 0.0169 |

Climatology | 0.1038 | 0.0401 | |

Persistence | 0.1799 | 0.0681 | |

Uniform | 0.1112 | 0.0428 | |

August 13 | CSVQR | 0.0677 | 0.0199 |

Climatology | 0.1374 | 0.0555 | |

Persistence | 0.1734 | 0.0738 | |

Uniform | 0.1033 | 0.0380 | |

September 13 | CSVQR | 0.0590 | 0.0172 |

Climatology | 0.0992 | 0.0401 | |

Persistence | 0.1659 | 0.0582 | |

Uniform | 0.1107 | 0.0429 | |

October 13 | CSVQR | 0.0561 | 0.0159 |

Climatology | 0.0971 | 0.0366 | |

Persistence | 0.1807 | 0.0977 | |

Uniform | 0.1033 | 0.0382 | |

November 13 | CSVQR | 0.0557 | 0.0186 |

Climatology | 0.0844 | 0.0396 | |

Persistence | 0.1089 | 0.0533 | |

Uniform | 0.0978 | 0.0406 | |

All | CSVQR | 0.0556 | 0.0167 |

Climatology | 0.0974 | 0.0391 | |

Persistence | 0.1494 | 0.1261 | |

Uniform | 0.1061 | 0.0409 |

## References

- [Bremnes 2004] Bremnes, J. B. 2004. Probabilistic wind power forecasts using local quantile regression. Wind Energy 7(1):47–54.
- [Dorvlo 2002] Dorvlo, A. S. 2002. Estimating wind speed distribution. Energy Conversion and Management 43(17):2311–2318.
- [Foley et al. 2012] Foley, A. M.; Leahy, P. G.; Marvuglia, A.; and McKeogh, E. J. 2012. Current methods and advances in forecasting of wind power generation. Renewable Energy 37(1):1–8.
- [Giebel et al. 2003] Giebel, G.; Landberg, L.; Badger, J.; Sattler, K.; Feddersen, H.; Nielsen, T. S.; Nielsen, H. A.; and Madsen, H. 2003. Using ensemble forecasting for wind power. Proceedings Cd-rom. Cd 2.
- [Hong et al. 2016] Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; and Hyndman, R. J. 2016. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting 32(3):896–913.
- [Juban et al. 2016] Juban, R.; Ohlsson, H.; Maasoumy, M.; Poirier, L.; and Kolter, J. Z. 2016. A multiple quantile regression approach to the wind, solar, and price tracks of gefcom2014. International Journal of Forecasting 32(3):1094–1102.
- [Koenker and Bassett Jr 1978] Koenker, R., and Bassett Jr, G. 1978. Regression quantiles. Econometrica: journal of the Econometric Society 33–50.
- [Koenker 2005] Koenker, R. 2005. Quantile regression. Number 38. Cambridge university press.
- [Landry et al. 2016] Landry, M.; Erlinger, T. P.; Patschke, D.; and Varrichio, C. 2016. Probabilistic gradient boosting machines for gefcom2014 wind forecasting. International Journal of Forecasting 32(3):1061–1066.
- [Mangalova and Shesterneva 2016] Mangalova, E., and Shesterneva, O. 2016. K-nearest neighbors for gefcom2014 probabilistic wind power forecasting. International Journal of Forecasting 32(3):1067–1073.
- [Nielsen, Madsen, and Nielsen 2006] Nielsen, H. A.; Madsen, H.; and Nielsen, T. S. 2006. Using quantile regression to extend an existing wind power forecasting system with probabilistic forecasts. Wind Energy 9(1-2):95–108.
- [Pinson and Kariniotakis 2004] Pinson, P., and Kariniotakis, G. 2004. On-line assessment of prediction risk for wind power production forecasts. Wind Energy 7(2):119–132.
- [Pinson and Kariniotakis 2010] Pinson, P., and Kariniotakis, G. 2010. Conditional prediction intervals of wind power generation. IEEE Transactions on Power Systems 25(4):1845–1856.
- [Pinson et al. 2007] Pinson, P.; Nielsen, H. A.; Møller, J. K.; Madsen, H.; and Kariniotakis, G. N. 2007. Non-parametric probabilistic forecasts of wind power: required properties and evaluation. Wind Energy 10(6):497–516.
- [Platt and others 1998] Platt, J., et al. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines.
- [Quinonero-Candela et al. 2006] Quinonero-Candela, J.; Rasmussen, C. E.; Sinz, F.; Bousquet, O.; and Schölkopf, B. 2006. Evaluating predictive uncertainty challenge. 1–27.
- [Sideratos and Hatziargyriou 2012] Sideratos, G., and Hatziargyriou, N. D. 2012. Probabilistic wind power forecasting using radial basis function neural networks. IEEE Transactions on Power Systems 27(4):1788–1796.
- [Takeuchi et al. 2006] Takeuchi, I.; Le, Q. V.; Sears, T. D.; and Smola, A. J. 2006. Nonparametric quantile estimation. Journal of Machine Learning Research 7(Jul):1231–1264.
- [Wan et al. 2014] Wan, C.; Xu, Z.; Pinson, P.; Dong, Z. Y.; and Wong, K. P. 2014. Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Transactions on Power Systems 29(3):1033–1044.
- [Zhang, Wang, and Wang 2014] Zhang, Y.; Wang, J.; and Wang, X. 2014. Review on probabilistic forecasting of wind power generation. Renewable and Sustainable Energy Reviews 32:255–270.