A Novel Smoothed Loss and Penalty Function for Noncrossing Composite Quantile Estimation via Deep Neural Networks
Abstract
Uncertainty analysis in the form of probabilistic forecasting can significantly improve decision making processes in the smart power grid when integrating renewable energy sources such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. Traditionally quantile regression is applied for such forecasting and recently quantile regression neural networks have become popular for weather and renewable energy forecasting. However, one major shortcoming of composite quantile estimation in neural networks is the quantile crossover problem. This paper analyzes the effectiveness of a novel smoothed loss and penalty function for neural network architectures to prevent the quantile crossover problem. It’s efficacy is examined on the wind power forecasting problem. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 10%, to 90% prediction intervals which are evaluated using a quantile score and reliability measures. Benchmark models such as the persistence and climatology distributions, multiple quantile regression, and support vector quantile regression are used for comparison where results demonstrate the proposed approach leads to improved performance while preventing the problem of overlapping quantile estimates.
I Introduction
In the last thirty years wind power has experienced rapid global growth, and in some countries, it is the most used form of renewable energy. However, due to the chaotic nature of the weather, variable and uncertain wind power production poses planning and operational challenges unseen in conventional generation. From the grid operator’s perspective, uncertainty in wind production could cause inefficiencies in the power flow, operating reserve requirements, stochastic unit commitment, and electricity market settlements [makarov2009operational, bukhsh2016integrated, botterud2012wind]. From the wind generator’s perspective, reliable wind forecasts are needed for several operations at a wind farm, ranging from energy storage control to bidding and trading in energy markets. Thus, to ensure both stable grid operations and continued growth and increased penetration of wind power, highly reliable forecasting of wind power production is needed.
Traditionally wind power prediction has focused on developing point forecasts which provide a single expected output for a given lookahead time. Point forecasting horizons fall into several scales: very shortterm (seconds or minutes ahead), shortterm (hours to days ahead), longterm (weeks or months ahead), and seasonal. A thorough review in wind forecasting can be found in [monteiro2009wind]. However, point forecasting can result in certain errors which can be significant and they also lack information on uncertainty. Therefore, a significant research effort has begun recently by the renewables forecasting community [hong2016probabilistic] to produce fully probabilistic predictions which derive quantitative information on the associated uncertainty of power output. For example, to capture the uncertainty of wind power, forecasting errors can be statistically analyzed and modeled by the Beta distribution. However, such assumption may not be applicable for shortterm forecasting, and thus researchers are looking at different approaches for probabilistic wind power forecasts by quantifying prediction uncertainty. Although there are various methods proposed, it is still a challenge to make accurate and reliable probabilistic predictions for volatile renewables, such as wind.
Probabilistic forecast can play a key role in integrating and managing wind farms. For instance, in [doherty2005new] the optimal level of generation reserves is estimated using the uncertainty of wind power predictions, and in [usaola2004benefits, castronuovo2004optimization] the optimization of wind energy production is investigated taking into account the forecasts of a probabilistic prediction method. Additionally, increased revenues can be obtained using bidding strategies built on predictive densities, as shown in [bathurst2002trading, pinson2007trading]. Wind power density forecasting can be used for analysis of probabilistic load flow, as in [karakatsanis1994probabilistic].
Our work is motivated by exploring a direct and nonparametric probabilistic forecasting approach for wind power. To address the problem of dealing with nonlinearity in wind data [lange2005uncertainty], we propose a novel neural network model which we call the smooth pinball neural network (SPNN). This network is able to provide probabilistic forecasts in the form of multiple monotonically increasing quantiles estimated simultaneously. The main contributions of our approach can be summarized as follows:

We propose and investigate a new objective function which is a logistic based smooth approximation of the pinball loss function for multiple quantile regression.

We introduce a smooth penalty scheme to prevent the quantile crossover problem.

We showcase how a multiple quantile based neural network can be used for probabilistic forecasting of wind.

We design experiments to validate our model using publicly available data from 10 wind farms from the Global Energy Forecasting Competition 2014 and benchmark performance with common and advanced methods.

We show our method improves the skill, reliability, and sharpness of forecasts over various benchmarks.
In Section II, we provide a literature review of nonparametric probabilistic forecasting approaches for wind power and we dive deeper into related work on QR, quantile based neural networks, and approaches for preventing what is known as the quantile cross over problem. In Section III, we provide the mathematical background on probabilistic forecasting, QR, and evaluation methods. Section IV overviews our model, its architecture, and training. Results and discussion of our case study are presented in Section V. We conclude the paper and review future research directions in Section VI.
Ii Related Work
Iia Probabilistic Forecasting of Wind Power
Over the last several years there has been a large body of work conducted in nonparametric probabilistic forecasting of wind power as well as other renewables such as solar power. Recently the Global Energy Forecasting Competition in 2014 [hong2016probabilistic] and 2017 are further proof of the rising interest in probabilistic forecasting. Probabilistic wind models are either meteorological ensembles that are obtained by a weather model [giebel2003using] or are statistical methods [foley2012current]. Under the statistical approach, we can estimate full predictive distributions in the form of quantiles or prediction intervals (PIs). One PI estimation scheme is shown in [sideratos2012probabilistic] which uses a radial basis function neural network.
Some of the most recent forecasting methods include extreme learning machines [wan2017direct] where a direct quantile regression approach was presented to efficiently generate nonparametric probabilistic forecasting of wind power generation combining extreme learning machine and quantile regression. Hybrid intelligent methods have also been explored in [haque2014hybrid] by feeding deterministic wind power forecasts made by a combination of wavelet transform and fuzzy ARTMAP network, optimized by using firefly optimization algorithm, in quantile regression. Another approach to forecast the density of wind power is to take an ensemble of point forecasts and calculate the mean and variance of the combined forecasts. This has been studied in [wang2017deep] where a wavelet transform and a convolutional neural network are used for ensemble point forecasting. Another ensemble approach can be seen in [taylor2009wind] where time series models such as ARMA and GARCH are combined to form density forecasts. One of the most prevalent approaches to probabilistic forecasting of wind power is to apply quantile regression (QR) which can be used to estimate different wind power quantiles [zhang2014review].
Another alternative to nonparametric probabilistic wind forecasting is the application of the Lower Upper Bound Estimation (LUBE) method [khosravi2011lower]. The LUBE method constructs a neural network with two outputs for estimating the prediction interval bounds. The coverage widthbased criterion is used as the loss function for estimating PIs, and simulated annealing or particle swarm optimization [quan2014short] can be used to minimize that loss function. A complete review on probabilistic forecasting of wind power can be found in [zhang2014review]. Other reviews on probabilistic forecasting methods can be found in [van2018review] for solar power, [hong2016probabilisticload] for load forecasting, and [nowotarski2017recent] for electricity price forecasting.
IiB Nonlinear Quantile Regression
Here we provide a thorough review of QR methods, particularly nonlinear versions, as background to our proposed method. There are many variations of QR which are traditionally solved using linear programming algorithms. In [bremnes2004probabilistic] local QR is applied to estimate different quantiles, while in [nielsen2006using] a splinebased QR is used to estimate quantiles of wind power. In [landry2016probabilistic] quantile loss gradient boosted machines are used to estimate many quantiles and in [juban2016multiple] multiple quantile regression is used to predict a full distribution with optimization achieved by using the alternating direction method of multipliers. Quantile regression forests [juban2008uncertainty] are another approach in forecasting which are an extension of regression forests based on classification and regression trees.
With QR being a comprehensive strategy for providing the conditional distribution of a response given , we highlight several of its variants. In a generalization of QR [powell1986censored] introduce the censored QR model, which consistently estimates conditional quantiles when observations on the dependent variable are censored. Yu and Jones [yu1998local] propose a nonparametric version of QR estimation by using a kernelweighted local linear fitting. Chen et al. [chen2009copula] propose a copulabased nonlinear quantile autoregression, addressing the possibility of deriving nonlinear parametric models for different conditional quantile functions. QR can also be hybridized with machine learning methods to form powerful nonlinear models. For instance, support vector regression is introduced for QR in [hwang2005simple], yielding support vector quantile regression (SVQR). SVQR extends the QR model to nonlinear and high dimensional spaces, but it requires solving a quadratic programming problem.
Due to their flexibility in modeling elaborate nonlinear data sets, artificial neural networks are another dominant class of machine learning algorithms which are also very popular for renewable forecasting [hatalis2014multi, azad2014long]. Taylor [taylor2000quantile] is the first to propose a quantile regression neural network (QRNN) method, combining the advantages of both QR and a neural network. This method can reveal the conditional distribution of the response variable and can also model the nonlinearity of different systems. The author applies this method to estimate the conditional distribution of multiperiod returns in financial systems, which avoids the need to specify the explanatory variables explicitly. However, the paper does not address how the network was optimized. The same QRNN was later used by [feng2010robust] for credit portfolio data analysis where results showed that QRNN is more robust in fitting outliers compared to both local linear regression and spline regression. In [xu2016quantile] an autoregressive version of QRNN is used for applications to evaluating value at risk, and [cannon2011quantile] implements the QRNN model in R as a statistical package.
In all the QR approaches mentioned, only a single quantile is estimated at a time. In the case of estimating multiple quantiles, this could lead to what is known as the quantile crossover problem, where a lower quantile overlaps a higher quantile. Equivalently, a prediction interval for a lower probability (e.g., range in which 10% of future values are predicted to lie) exceeds that of a higher probability (e.g., the range in which 20% of the future values are predicted to lie). Crossing quantiles are undesirable as it violates the principle of cumulative distribution functions where their associated inverse functions should be monotonically increasing. A possible way to prevent this issue is to utilize simple heuristics of reordering estimated quantiles. However, this approach does not have a strong theoretical foundation and may lead to inappropriate quantiles. The solution then is to optimize quantiles together with noncrossing constraints. In [takeuchi2006nonparametric, hatalis2017empirical] a constrained support vector quantile regression (CSVQR) method is developed with noncrossing constraints where it was used to fit quantiles on static data. However, CSVQR is computationally very expensive and slow to train. In Section IV we review approaches for preventing the quantile crossover problem in neural networks and we also propose a novel way to prevent this problem using a smooth penalty function.
Iii Probabilistic Forecasting
This section highlights the underlying mathematics in probabilistic forecasting, overviews linear quantile regression, and summarizes the main evaluation metrics for density forecasts. Given a random variable such as wind power at time , its probability density function is defined as and its the cumulative distribution function as . If is strictly increasing, the quantile of the random variable with nominal proportion is uniquely defined on the value such that . It can also be defined as the inverse of the distribution function . A quantile forecast is an estimate of the true quantile for the lead time , given a predictor values (such as numerical wind speed forecasts). Prediction intervals are another type of probabilistic forecast and give a range of possible values within which an observed value is expected to lie with a certain probability . A prediction interval is defined by its lower and upper bounds, which are the quantile forecasts whose nominal proportions and are such that .
In probabilistic forecasting, we are trying to predict one of two classes of density functions, either parametric or nonparametric. When the future density function is assumed to take a certain distribution, such as the Normal distribution, then this is called parametric probabilistic forecasting. For a nonlinear and bounded process such as wind generation, probability distributions of future wind power, for instance, may be skewed and heavytailed distributed [dorvlo2002estimating]. Else if no assumption is made about the shape of the distribution, a nonparametric probabilistic forecast [pinson2007non] can be made of the density function by gathering a set of quantiles forecasts such that with chosen nominal proportions spread on the unit interval. In this paper, we consider forecasting wind power on the resolution of one hour (predicting outwards to a month worth of values). On this resolution scale of an hour, the wind density may fluctuate therefore making nonparametric forecasting more ideal then fitting a parametric density [zhang2014review].
Iiia Quantile regression
Quantile regression is a popular approach for nonparametric probabilistic forecasting. Koenker and Bassett [koenker1978regression] introduce it for estimating conditional quantiles and is closely related to models for the conditional median [koenker2005quantile]. Minimizing the mean absolute function leads to an estimate of the conditional median of a prediction. By applying asymmetric weights to errors through a tilted transformation of the absolute value function, we can compute the conditional quantiles of a predictive distribution. The selected transformation function is the pinball loss function as defined by
(1) 
where is the tilting parameter. To better understand the pinball loss, we look at an example for estimating a single quantile. If an estimate falls above a reported quantile, such as the 0.05quantile, the loss is its distance from the estimate multiplied by its probability of 0.05. Otherwise, the loss is its distance from the realization multiplied by one minus its probability (0.95 in the case of the 0.05quantile). The pinball loss function penalizes lowprobability quantiles more for overestimation than for underestimation and vice versa in the case of highprobability quantiles. Given a vector of predictors where , a vector of weights and intercept coefficient in a linear regression fashion, the conditional quantile is given by . To determine estimates for the weights and intercept we solve the following minimization problem
(2) 
where is the observed value of the predictand. The formulation above in Eq. (2) can be minimized by a linear program.
IiiB Evaluation Metrics
In probabilistic forecasting it is essential to evaluate the quantile estimates and if desired also evaluate derived predictive intervals. Therefore, we use as evaluation measures the quantile score, interval reliability, and interval sharpness. To evaluate quantile estimates, one can use the pinball function directly as an assessment called the quantile score (QS). We choose QS as our main evaluation measure for the following reasons. When averaged across many quantiles it can evaluate full predictive densities; it is found to be a proper scoring rule [grushka2017quantile]; it is related to the continuous rank probability score; and it is also the main evaluation criteria in the 2014 Global Energy Forecasting Competition (GEFCOM 2014), the source of our testing data. QS calculated overall test observations and quantiles is defined as
where is an observation used to forecast evaluation such future wind power observations. To evaluate full predictive densities, QS is averaged across all target quantiles for all look ahead time steps using equal weights. A lower QS indicates a better forecast.
With the QS calculated we can then also see what the relative performance of SPNN is with respect to some benchmark method. We can assess relative performance between methods using the quantile verification skill score (QVSS) [friederichs2007statistical]
where is QS for the forecast method of interest (SPNN in our case), and is the QS value for the reference forecast of a benchmark method, which we will assume to be linear quantile regression. If QVSS is positive then forecast of interest performs better than the reference forecast, and a QVSS = 1 means a perfect forecast. Negative QVSS values indicate that forecast of interest performs worse than the reference forecast.
In some applications, it may be needed to have wind forecasts in the form of prediction intervals (PIs) and as such, we look at two secondary evaluation measures: reliability and sharpness. Reliability is a measure which states that over an evaluation set the observed and nominal probabilities should be as close as possible, and the empirical coverage should ideally equal the preassigned probability. Sharpness is a measure of the width of prediction intervals, defined as the difference between the upper and lower interval values. For interval reliability we use the average coverage error (ACE) metric [zhang2014review] and for measuring interval sharpness we use the interval score (IS) which can also be used to evaluate the overall skill of PIs [gneiting2007strictly]. For measuring reliability, PIs show where future wind power observations are expected to lie, with an assigned probability termed as the PI nominal confidence (PINC) . Here indicates a specific coverage level. The coverage probability of estimated PIs is expected to eventually reach a nominal level of confidence over the test data. A measure of reliability which shows target coverage of the PIs is the PI coverage probability (PICP), which is defined by
For reliable PIs, the examined PICP should be close to its corresponding PINC. A related and easier to visualize assessment index is the average coverage error (ACE), which is defined by
This assumes calculation across all test data and coverage levels. To ensure PIs have high reliability, the ACE should be as close to zero as possible. A high reliability can be easily achieved by increasing or decreasing the distance between lower and upper interval bounds. Thus, the width of a PI can also influence its quality. For measuring the effective width of PIs we use the sharpness score proposed by [pinson2007non] which measures how wide PIs are by focusing on the mean size of the intervals only. We define as the size of the central interval forecast with nominal coverage rate . For lead times , a measure of sharpness for PIs is then given by the mean size of the intervals
A lower sharpness score is considered more ideal, but too small and the PIs would not cover enough of the observed data. Thus sharpness is typically a measure to be considered along with reliability and a skill score. QS is a score that measures the skill of individual quantiles; to measure the skill of individual PIs we apply the interval score (IS) [gneiting2007strictly]. The IS  when evaluated with all test data and coverage levels  is defined by
The prediction model is rewarded for narrow PIs and is penalized if the observation misses the interval. The size of the penalty depends on . Including all aspects of PI evaluation, the IS can be used to compare the overall skill and sharpness of interval forecasts. However, IS cannot identify the contributions of reliability and sharpness to the overall skill. Thus, ACE and sharpness are both used for evaluation of PIs along with QS for evaluation of quantile estimation.
Iv Smooth Pinball Network Model
We propose to use a feedforward neural networks for probabilistic forecasting due to their flexibility and strength in dealing with nonlinear and nonstationary data. We can use the pinball loss in the objective function of such a neural network to estimate conditional quantiles. However, the pinball function employed by the original linear quantile regression model in Eq. (1) is not differentiable at the origin, . The nondifferentiability of makes it difficult to apply gradientbased optimization methods in fitting the quantile regression model. Gradientbased methods are preferable for training neural networks since they are time efficient, easy to implement and yield a local optimum. Therefore, we need a smooth approximation of the pinball function that allows for the direct application of gradientbased optimization. We call our new model the smooth pinball neural network (SPNN).
We are not the first to apply a smooth approximation to the pinball function for a quantile regression based neural network. [cannon2011quantile] used the Huber norm to construct smooth approximations of the pinball loss function, following the work in [chen2007finite], to form a QRNN. Using the same Huber norm approximation, a composite QRNN is proposed in [xu2017composite] to estimate multiple quantiles. The Huber norm requires multiple optimization runs with a fixed schedule of a decreasing smoothing constant to from the final weights and biases. Chen et al. [chen1996class] introduced another class of smooth functions for nonlinear optimization problems and applied this idea to support vector machines [lee2001ssvm]. Emulating the work of Chen, a study by Zheng [zheng2011gradient] presents an approximation to the pinball loss function by a smooth logistic function; this then allows the application of gradient descent for optimization. Zheng called the resulting algorithm the gradient descent smooth quantile regression model. We extend that model here for the case of a neural network. Based on our knowledge, we are the first to investigate the usage of a smooth logistic loss function to estimate multiple quantile using a neural network.
Iva Smooth Quantile Regression
The smooth approximation [zheng2011gradient] of the pinball function in Eq. (1) is given by
(3) 
where is a smoothing parameter and is the quantile level we are trying to estimate. In Fig. 2 we see the pinball function with as the red line and the a smooth approximation as the blue line with . Zheng proves [zheng2011gradient] that in the limit as that . He also derives and discusses several other properties of the smooth pinball function. The smooth quantile regression optimization problem then becomes
(4) 
where is the number of training examples and where are the model parameters and is a vector of features at time . This form conveniently allows gradient based algorithms to be used for optimization.
IvB Smooth Pinball Neural Network
For simplicity we describe here the construction of a single hidden layered SPNN for nonlinear multiple quantile regression, but SPNN can easily be extended to multiple hidden layers. In a single hidden layered SPNN the input layer consists of number of input nodes and takes vector of input features at time . The hidden layer consists of number of hidden neurons and the output layer consists of number of output nodes corresponding to the estimated quantiles where is the quantile level we want to estimate at time . Every element in the first layer is connected to hidden neurons with the weight matrix of size and bias vector of size . A similar connection structure is present in the second layer in the network between the hidden and output layers with the output weight matrix of size and bias vector of size .
The input to hidden neurons is calculated, in vectorization notation, by , the output of the hidden layer then uses the logistic activation function . The input to output neurons is then calculated by , and the output layer uses the identity activation function .
The objective function for our SPNN model is then the smooth pinball approximation summed over number of ’s we are trying to estimate in the output layer. We also use L2 regularization on the network weights in the objective function to prevent overfitting during training. The objective function for SPNN is then given by
(5) 
where is the Frobenius norm. Fig. 1 shows a schematic diagram of our SPNN model with number of input features and number of quantile outputs.
Standard gradient descent with backpropagation can be used to train SPNN. Through this process we compute the gradient of the objective function at each data point at time with respect to and . We start with the gradient with respect to the hiddentooutput weights . In order to compute the gradient at time , we apply the chain rule in vector notation as follows
where is a vector of all our ’s. The gradient of can be calculated similarly. Next we calculate the gradient of the objective function with respect to the weights of the first layer as follows
The gradient of can be calculated similarly. These gradients can then be directly used in many other gradient descent based optimization schemes. As such, we apply the Adam optimizer [kingma2014adam], an algorithm for firstorder gradientbased optimization, to learn the parameters of SPNN. Adam has been shown [kingma2014adam] to yield superior results compared to other gradientbased optimizers.
IvC Noncrossing Quantiles
In quantile regression normally a single quantile is estimated. To estimate multiple quantiles, one could be run QR to solve for different ’s independently. However, in doing so, quantiles may cross each other which is not desirable since it violates the principle of monotonically increasing inverse density functions. To prevent this, we need to introduce constraints as per [takeuchi2006nonparametric]. The condition are defined as the orders of conditional quantiles to be estimated. To ensure these quantiles do not cross each other the following constraint is needed .
However, it is not easy to solve the neural network optimization problem with such constraints using gradient descent methods. One possible solution is proposed in [cannon2018non] where a monotonic composite QRNN is presented that applies partial monotonicity constraints to the weights of the network and uses a stacked input matrix of covariates of size with an added covariate . This can add additional complexity to the network, by adding more parameters, so we propose a simpler alternative of applying a penalty term [freund2004penalty] directly into the cost function. We define the noncrossing quantile penalty term as follows
(6) 
where , is the least amount that the two quantile should differ by, and is the penalty parameter with a high value. This penalty is added to the cost function in Eq. 5. If the constraints are not violated no penalty is added to the cost function. If a lower quantile exceeds the value of a higher one, the squared difference of these two quantiles is added to the cost function as a penalty. A full model implementation flowchart is shown in Fig. 3. First the data is preprocessed which includes deriving different input features, feature standardization, and partitioning the data into training and testing sets. Training of the model is conducted using gradient descent optimization method. After the max number of training epochs is reached the model is ready to be used on testing data for multiple quantile estimation.
V Results and Discussions
To validate our model for probabilistic forecasting of wind power we utilize wind data from the publicly available Global Energy Forecasting Competition 2014 (GEFCom2014) [hong2016probabilistic]. The goal of the wind component of GEFCom2014 was to design parametric or nonparametric forecasting methods that would allow conditional predictive densities of the wind power generation to be a function of input data which are numerical weather predictions (NWPs). Evaluation of predicted densities was done using the quantile score. Data is from the years of 2012 and 2013 from 10 wind farms titled Zone 1 to Zone 10. The predictors are NWPs in the form of wind speeds at an hourly resolution at two heights, 10m and 100m above ground level. These forecasts are for the zonal and meridional wind components (denoted U and V). It was up to the contestants to deduce exact wind speed, direction, and other wind features if necessary. These NWPs are from the exact locations of the wind farms. Additionally, power measurements at the various wind farms, with an hourly resolution, are also provided. All power measurements are normalized by the nominal capacity of their wind farm. The goal in forecasting is to learn to associate the provided NWPs (or derived features) with wind power. NWPs are provided for the forecasting horizon of one month, and it is up to a forecasting model to use those NWPs as input to predict quantiles at each future time step.
Va Benchmark Methods
We use three standard [sideratos2012probabilistic] and two advanced benchmark methods for density forecasting of wind power. The standard methods are the persistence model that corresponds to the normal distribution and is formed by the last 24 hours of observations, the climatology model that is based on all past wind power, and the uniform distribution that assumes all observations occur with equal probability. For our advanced benchmarks, we use a linear and nonlinear version of QR. The linear version is multiple quantile regression (QR) with L2 regularization, and nonlinear version is support vector quantile regression (SVQR) [hwang2005simple] with a radial basis function kernel.
VB Case Study Descriptions
In the analysis of SPNN for forecasting wind power quantiles, we conduct studies with SPNN having one and two hidden layers denoted as SPNN1 and SPNN2. We study if the addition of a second hidden layer improves performance. Our SPNN model is a fully connected feedforward neural network, with rectified linear units for hidden activation functions, and it uses Adam for weight optimization [kingma2014adam]. Default Adam parameters follow those provided in the original paper. The quality of the quantile estimates is sensitive to the hyperparameters of the network. SPNN has several hyperparameters that need to be chosen before training. Through empirical testing on training data, we found the following values as adequate for our model hyperparameters: 2000 training iterations, 200 batch size, 40 hidden nodes for SPNN1, 20 and 40 hidden nodes for SPNN2, 0.01 for the smoothing rate, 0.01 for each of the weight regularization terms, 1000 for the crossover penalty term, and 0 for the crossover margin.
For testing we conduct two case studies using the GEFCom2014 wind datasets. To ensure that our study is unbiased, we use for assessment the whole year of 2013. This dataset gives a total of test samples for wind power forecasting per wind farm. The first case study uses wind data from Zone 1 and 2. We estimate quantile to produce prediction intervals with nominal coverage from 10% to 90% in increments of 10%. The goal of this study is to evaluate the quantile and prediction interval estimates from SPNN in detail for reliability and sharpness. We also look at QVSS to see improvements between SPNN1 and SPNN2 use QR as the reference model. We also compare results to SVQR as it is the only other nonlinear quantile regression benchmark model.
In the second case study, we estimate 99 quantiles on par with GEFCom2014. Results are derived for all ten wind farms in total, where we have 87,600 total test observations. For each test month, we are estimating 99 quantiles for 720 look ahead hours across ten farms. Results are derived across all Zones for QS, IS, ACE, and Sharpness. Given so much data we need a way to summarize results. Thus for every farm, we take the mean of all the evaluation scores across all Zones/months. In both case studies training is done using a sliding window of the previous twelve months to forecast the whole next month. Data from 2013 are used for hold out test sets. For instance, we start with predicting January 2013 using the past 12 months of 2012. After a month is predicted, the training window moves to incorporate new data and the prediction model is retrained to get a new prediction.
We run our case study on a computer with an Intel i7 6700 2.6 GHz, and 16 GB of RAM. For both studies, we use as input features the raw wind speed data at 10m and 100m for U and V directions. The only engineered features are four time features based on the hour of the day and day of the year: .
This is contrast with the winning teams from GEFCom2014 who each used dozens of engineered features including lagged data, data from neighboring wind farms, and more complex features such as derived wind speeds, wind direction, wind energy, wind shear, direction differences between 10m and 100m, etc. Most of the winning teams in GEFCom2014 conducted heavy manual feature engineering to reduce the quantile score throughout the competition. The goal of our study is not custom feature engineering, which might result in better scores, but to highlight the effectiveness of SPNN in creating its own latent features via its hidden layers, and to showcase the feasibility of our method as a robust probabilistic forecasting model.
VC Case Study 1
For this first case study, quantiles are computed to form predictive intervals. Each prediction interval is estimated to have a future observation of wind power within a lower and upper bound for a given probability or nominal coverage rate. As previously mentioned, we estimate quantile to produce prediction intervals with nominal coverage from 10% to 90% in increments of 10%. We estimate intervals for SPNN1, SPNN2, QR, and SVQR. The difference between the nominal coverage rates and the observations for Zone 1 are shown in Fig. 4. This reliability diagram showcases results similar to the ACE score. It can be see that SPNN2 has the lowest deviation from the nominal coverage with SPNN1 and QR coming second and third with result magnitudes ranging from 3% to 0.3%. SVQR has a very poor coverage with deviations as high as 40%. This can be attributed to having too tight intervals and overfitting. In Fig. 7 we showcase reliability results from Zone 2. Similarly to Zone 1, SPNN2 yields intervals with a deviation close to 0, while SVQR continued to have a poor coverage.
Sharpness is the other important statistic that we look at for individual predictive intervals which is calculated independent of observations. Measured as the mean interval size as described in Eq. IIIB, it demonstrates the usefulness of predictions. Ideally, we would like to have intervals as small as possible but too small and observations may fall outside the intervals. Thus, too wide and too narrow intervals providing poor forecasts. Sharpness needs to be analyzed together with reliability to ensure robust predictions. In Fig. 5, we see the mean interval sizes for each coverage level for Zone 1. QR resulted in having the widest intervals and SVQR having the narrowest intervals. With such narrow intervals SVQR was not able to capture the observations which indicated in its reliability diagram. In Fig. 8 we see similar results for Zone 2. Our proposed method, SPNN1 and SPNN2, were able to estimate effective sized intervals that resulted in high reliability with good sharpness.
As a last evaluation, we look at the performance of the individual quantiles that formed the prediction intervals of this case study. We do this using QVSS to analyze relative performance gain relative to a reference benchmark model. Here we use quantile regression for the reference model and we study if the nonlinear quantile regression models, SPNN and SVQR provide any improvements over QR. In Fig. 6, we report the QVSS across the 18 quantiles for the three nonlinear methods. SPNN1 and SPNN2 provide a clear performance increase with respect to QR. For quantiles with a nominal probability less then 0.7, we see SPNN1 having a small lead over SPNN2. While SPNN2 shows a small lead for quantiles with . Not surprisingly, SVQR shows a decreased negative performance over QR, indicating its inability to extract meaningful features from the raw data for Zones 1 and 2. In Fig. 9, we see similar QVSS results for Zone 2, but with SPNN2 showing a small lead over SPNN1 for quantiles with .
VD Case Study 2
In our second case study we analyze a higher number of estimated quantile (99) across all wind farms for all 12 test months to ensure an unbiased assessment of SPNN relative to the benchmark models. Due to the large number of quantiles and wind farms, instead of forming reliability or sharpness diagrams for individual PIs and QVSS diagrams for individual quantiles, we instead look at box plots and report the distribution of evaluation results including QS, IS, ACE, and Sharpness.
In Fig. 10 we report the QS metric for SPNN and the five benchmark methods. We see that SPNN2 had the lowest QS range from 0.036 to 0.047 with SPNN1 being a close second. The other benchmarks had a QS in the range of 0.075 to 0.011. Inspecting the coverage analysis of our prediction intervals with the ACE score in Fig. 11, we see that SPNN overall has the lowest ACE with SPNN2 having a median value lower then SPNN1. The uniform benchmark produced a wide range for the ACE score due to having fixed size intervals across all zones and months, while SPNN2 had the narrowest range of ACE scores. Looking at the sharpness of PIs with the interval score in Fig. 12 and general sharpness score in Fig. 13, we see that SPNN has the sharpest intervals across all farms. The persistence and climatology methods yielded a wide distribution for the interval score but narrow one for sharpness. SVQR in contrast to the first case study did not calculate narrow intervals when estimating 99 quantiles.
Since both QS and IS also measure skill, we can say that SPNN was able to produce the highest quality estimates from all methods. An interesting observation is the SPNN is designed to produce optimal quantile estimates and that indirectly it also produces adequate interval forecasts. If the primary goal is to reduce ACE and IS as best as possible, alternative loss functions that incorporate prediction interval coverage and width functions can be used. However, while not directly optimizing for coverage or sharpness, SPNN does produce superior results from the advanced benchmarks multiple quantile regression and support vector quantile regression.
Lastly, we compare the mean QS of our proposed method to the final quantile scores for the top teams in the GEFCom2014 as originally reported in [hong2016probabilistic]. We note again that the top teams used a wide range of engineered features while we used raw wind speed data along with time as input to our model. The winning team in GEFCom2014 was kPower with a mean QS of 0.038. Our method SPNN2 has a close mean QS of 0.042 which would qualify SPNN to be in the top winning teams. Comparing the results from the four box plots, we see the robust prediction ability of the proposed SPNN prediction method. Additionally, for all the runs across months and farms, the preassigned PI coverage levels are satisfied which implies that the constructed PIs cover the target values with a high probability and with the lowest QS and IS.
Vi Conclusion
Wind power forecasting is crucial for many decisionmaking problems in power systems operations and is a vital component in integrating more wind into the power grid. Due to the chaotic nature of the wind, it is often difficult to forecast. Uncertainty analysis in the form of probabilistic wind prediction can provide a better picture of future wind coverage. This paper proposes a novel approach we call SPNN for probabilistic wind forecasting using a neural network with a smooth approximation to the pinball ball loss function in estimating multiple quantiles.
We also introduce noncrossing constraints in the form of a smooth penalty in the loss function. This is done to ensure multiple quantiles can be estimated simultaneously without overlapping each other. We verify the effectiveness of our SPNN model with the dataset of the Global Energy Forecasting Competition 2014. We compare forecasts to standard and advanced benchmarks and employ standard quantile score, reliability, and sharpness metrics. Our results show superior performance across the prediction horizons, which verify the effectiveness of the model for forecasting while preventing estimated quantiles from overlapping.
Our SPNN method has the potential to be applied to a variety of domains for probabilistic forecasting or multiple quantile estimations. Future work will look into applying SPNN to forecast solar and ocean wave power, to test its effectiveness across different renewable energies, and on electricity pricing and load demand for smart grid applications. In this study, we trained our model using NWP data. Another problem to study is very shortterm probabilistic forecasting using only past wind power data. Future work can also then look into expanding the SPNN model for providing full predictive densities given lagged past data of power only.