Influences in Forecast Errors for Wind and Photovoltaic Power: A Study on Machine Learning Models
Abstract
Despite the increasing importance of forecasts of renewable energy, current planning studies only address a general estimate of the forecast quality to be expected and selected forecast horizons. However, these estimates allow only a limited and highly uncertain use in the planning of electric power distribution. More reliable planning processes require considerably more information about future forecast quality. In this article, we present an indepth analysis and comparison of influencing factors regarding uncertainty in wind and photovoltaic power forecasts, based on four different machine learning (ML)models. In our analysis, we found substantial differences in uncertainty depending on MLmodels, data coverage, and seasonal patterns that have to be considered in future planning studies.
Keywords:
uncertainty analysis, machine learning models, seasonal effects, data coverage
1 Introduction
With the further expansion of wind and photovoltaic (PV)energy, the power supply system will change significantly in the coming decades. The overall power supply will become more weatherdependent, and solutions must be found to ensure a robust and inexpensive power supply that maintains grid stability. Since high investments over a long period are necessary for that purpose, the expansion must be planned as precisely as possible in the long term. Simulations of different possible scenarios for the future power supply system are essential to compare different options and optimize the expansion.
The major challenges for the energy system transformation can mainly be traced back to two aspects: Firstly, the actual power supply of wind and solar energy plants is directly dependent on the weather and thus not directly compatible to consumption. Secondly, the expected power of the next hours and days is uncertain due to the strong dependence on the weather and must be predicted by power forecasts based on numerical weather predictions (NWPs).
Despite the increasing importance of forecasts for renewable power supply, current planning studies only address the forecast quality to be expected in the future for the whole of Germany based on representative forecasts (see, e.g., the dena grid study [Kohler2010]). Further, often these studies only consider a limited number of forecasts horizons. However, these estimates allow only for a limited and highly uncertain use in the planning of the electricity supply system. More reliable planning processes require considerably more information about future forecast quality.
2 Main Contribution
The article provides a comprehensive study on influences in forecast uncertainty which has to be taken into consideration for future planning studies. The articles investigates uncertainty in four types of common MLmodels for wind and PV: LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR (LASSO), gradient boosting regression tree (GBRT), support vector regression (SVR), and multilayer perceptron (MLP). Models are trained to forecasts the estimated dayahead power generation based on NWPfeatures as input. By repeated training with different test datasets, we create forecasts over the entire the datasets for later analysis.
In the next step, we compare error distributions for each model concerning known influencing factors: Amount of training samples, forecast horizon, the terrain of wind farms, and a comparison of the uncertainties between the different machine learning (ML)models. By comparing binned forecasts errors, e.g., for different forecast horizons, with the KullbackLeibler Divergence (KLD), we measure similarities and differences of these distributions. This comparison allows estimating when a bin is substantially different compared to a baseline and therefore gives insights to influential factors. Further, bins are compared with the KruskalWallis [Hedderich2018] hypothesis test to verify a significant difference. The main contributions are:

We utilize common MLmodels, the grid search algorithm to find optimal model parameters, and common feature engineering techniques to provide forecasts results for wind and PVfarms. By repeated training of the models on different training sets, we create forecasts for the complete dataset.

For wind power forecasts, we show that the amount of training samples influences the forecasts error up to a certain threshold of data coverage (where data coverage is the proportion of the maximum number of data samples in the dataset to the actual amount of data samples within the historical data).

Analyzing seasonal patterns reveals different influences for wind and PVthat are related to different weather conditions for a different season of a year. Interestingly, forecast errors of adjacent seasons are not necessarily similar to each other in PVforecasts.

The comparative study of MLforecasts models shows, that wind forecast errors within a similar terrain are more alike than for a similar amount of training samples. This relation suggests that forecasts models are more alike when external influences are excluded.
The remainder of this article is structured as follows. In Section 3 we detail related work. Section 4 outlines evaluation measures and applied MLmodels. Section 5 describes the experimental design and evaluation results w.r.t. data coverage, seasonal patterns, terrain, and model differences. Finally, we conclude our work and propose future work in Section 6.
3 Related Work
In current planning studies on future energy systems, considerations on the current and future uncertainty of power forecasts are only inconsiderably taken into account. The German Dena II study [Kohler2010], e.g., only considers forecasts error up to a horizon of two hours, neglecting that an increasing amount of renewable requires a larger forecast horizon such dayahead forecasts. Further, the study is missing an analysis of seasonal effects and forecast model specific uncertainties.
These (mostly) missing influential factors [Yan2015] categorizes into the NWPinput data, the power curve, and the prediction algorithm. The thesis of M. Lange [Lange2003] relates forecasts error to the NWPdata. In particular, the forecast uncertainty is assessed with respect to (w.r.t.)certain meteorological situations. However, the study only employs a physical model of the power curve with error correction and spatial refinement.
In the work of [Pinson2006], time series analysis techniques (e.g., ARIMA, ARX, BoxJenkins) and a physical model are used to evaluate the forecasting skill. The author includes an analysis for different forecast horizons based on and root mean squared error (RMSE). Further, it contains a small subsection on the evaluation of the error distribution.
The results of [Mohrlen2004] include an analysis for time horizons between zero to nine hours and up to five days ahead. But the study is again focusing on the physical model and not considering machine learning models. Also [Holttinen2013, Ko2015] are focusing on time series analysis techniques (NARX) and (adapted) physical models, for uncertainty analysis.
More recently, in [Gensler2018] uncertainty in MLmodels, such as MLP, SVR, and an ensemble technique, are analyzed, but the thesis misses an evaluation for error distributions relating to different forecast horizons of wind power. Also, in [Reindl2017] an analysis of MLmodels such as extreme gradient boosting technique, random forest, adaptive boosting, and persistence method is used to access the economic value for PVpower generation.
In [Brecl2018] uses physical and semiphysical models for developing a forecast methodology for households that do not have access to solar irradiance information and are therefore limited to discrete weather information. The results are analyzed w.r.t.the discrete weather features.
An interesting approach presents [Nam2018], in which the kriging method interpolate data with geographical properties for a location with no available data. A Naïve bayes classifier along with a Gaussian probability distribution based on the overall data performs dayahead forecasts of solar power based on the probability in onehour intervals. The method is evaluated against the persistence model with mean absolute error (MAE)for different months of a year.
The simulation in [NREL2013] creates uncertainties for PVat different time scales to evaluate the economic and reliability effect for the grid. As it is a simulation tool, it is different from MLmodels. The proposed method in [Murata2018] allows for modeling the PVuncertainty based on past observations by using multivariate normal distributions.
The literature review shows that most of the work is focusing on models relating to time series analysis techniques and physical models. Further, the reviewed articles are missing a quantified comparison between the distribution of uncertainties or are even missing an indepth analysis of the error distribution.
4 Method
To evaluate influences in forecast uncertainty this section gives a summary on common MLalgorithms and present their differences. Using different MLalgorithms assures to cover a broad spectrum in forecast errors. In the final section, we summarize error measures to estimate the deviation between actual and forecasted power generation.
4.1 Lasso
LASSO, also known as basis pursuit, is a linear model. Linear models typically provide a robust estimation, when NWPsare uncertain. Further linear models allow measuring the contribution of individual features through their coefficient, hence, making them highly relevant for analysis on error origin [Hastie2001]. In contrast to other linear regression models, LASSOallows for automatic selection of essential features. This selection is achieved by penalty, that effectively causes the coefficient of features to be exactly zero and hence excluding individual features.
4.2 Support Vector Regression
SVRis based on the concept of support vector machines (SVMs)for classification with changes in the definition of the optimization problem. One appealing property of SVMsis that the determination of parameters is locally and globally optimal due to the convex optimization [Bishop2006]. Further, by making use of the kerneltrick original NWPinput features are transformed in a higher dimensional, even infinite dimensional, space. Transforming features into a higherdimensional space provides features that are linearly separable [Vapnik2000]. The transformed features allow the SVRto achieve good results in many applications [Bishop2006], making them highly relevant for the evaluation of forecast uncertainty.
4.3 Multilayer Perceptron
MLPS, and more recently deep neural networks are a common technique for regression and classification tasks. In an MLPinput features are transformed using matrix multiplication and a subsequent (mostly) nonlinear transformation. The former two operations are summarized as layers and successive applications of these layers, where the output of one layer is the input to the next layers, allows us to find a good representation of the data. In the final layer, the output layer, a simple linear combination can be used for renewable energy forecast. Primarily through their capability to find good representations of the NWPdata, MLPsachieve state of the art performance in renewable power forecast [Gensler2016]. This performance makes them highly relevant for the evaluation of forecast uncertainty.
4.4 GradientBoostingRegressionTree
GBRToriginate from the idea, that a combination of weak learners improves the overall performance. Therefore, the gradient boosting algorithm trains trees in regions of most substantial forecast error. The ensemble technique combines the individual trees improving the overall performance. A single tree partitions the features space in a set of rectangles and estimates a constant forecast value for each rectangle [Hastie2001]. The partitioning provides an interpretable structure to explain forecast decision which is not feasible with SVRand MLPs. Further, the algorithm is not making use of any data representation techniques as with these approaches.
4.5 Error Measures
To assess influences in forecast uncertainty, through the forecast error, it is essential to evaluate the error with . It gives insights between the actual power generation compared to the forecasted power . In contrast to mean based measures, provides the most detailed view on the error; combined with a visualization of the error distribution through a histogram or a boxplot it allows to assess skewness and other statistical measures of the error distribution. A comprehensive analysis of deterministic error measures in the field of renewable energy forecast is given in [Gensler2018]. The results can be summarized as follows

The coefficient of determination assesses how much of the variance in the historical power data is explained by the model. As it is only capable of evaluating the amount of linear correlation it is often used as a measure to compare different forecast techniques.

To account for extreme errors of , quadratic errors such as the mean squared error (MSE)are recommend.

Absolute measures such as MAEare suited for monetary evaluation criteria (linear evaluation criteria).
In the following, we will stick to as it allows for comparison of overall forecast quality of the model by terms of mean (MSE) and median, especially when visualized via boxplot.
To compare distributions of errors with another we use the KLD. The KLDis a nonsymmetric statistical measurement to determine the difference between two distributions allowing to quantify the similarity, e. g., between the error distribution from the GBRTand the SVR.
5 Experimental Evaluation
In the following, we provide analysis on error distributions from different MLmodels and measure their similarity to another for wind and PV. By estimating the KLDbetween distributions for different (external) factors we get insights on how they relate to another. Therefore, we first give details on the model training and the two datasets. The first study estimates influences caused by a limited amount of training samples for wind power forecasts. Results are evaluated w.r.t.the data coverage, where data coverage refers to the proportion between the maximum number of data samples to the actual amount of data samples within the dataset. In the next section, we analyze seasonal influences such as the hour of the day or season of the year for wind and PVas well as terrain specific influences in the wind dataset. As results for PVmodels suggest that there are strong seasonal patterns to consider  that are less present for wind models  we limit the final analysis to the WindFarm dataset. Limiting these and other external influences allow to compare the error distributions of the different power forecasting models.
5.1 Design of Experiment
For the following two datasets we train the LASSO, SVR, MLP, and GBRTto forecast the power generation.
Solar Farm Dataset: The SolarFarm dataset consists of PV facilities in Germany. Their installed nominal power ranges between 7.2 and 12573. The dataset has a threehour resolution and is recorded from the beginning of 2016 to the end of April 2017 resulting in a maximum of data points. In total the dataset has 51 input features as input. Features with correlation to the power generation (e.g., sun position, solar height, clear sky, and radiation) are shifted in time by three hours to take future and past effects of the weather into account for prediction.
Wind Farm Dataset: The WindFarm dataset contains the power generation taken from wind farms that are distributed throughout Germany. These values were recorded hourly over two years (2016 and 2017) resulting in a maximum of data points. The dataset contains information about the terrain of each farm (flatland, forest, and offshore). In total the dataset has NWPfeatures as input. Features of wind speed and wind direction influencing the power generation [Jens2018] are timeshifted by two hours to take future and past effects of the weather into account for prediction.
Both datasets were manually filtered to remove outliers, e.g., caused by maintenance. Depending on the number of outliers and maintenance the amount of data coverage ranges between and percent for wind data w.r.t.recorded period, where data coverage refers to the proportion between the maximum number of data samples to the actual amount of data samples within the dataset. The data coverage for PVis mostly above %.
To compare forecast errors, we normalize the generated power by the maximum power generation. Input features are standardized for zero mean and unit variance based on the training dataset in each run. We optimize each model through a grid search on the validation dataset. To make the best use of the full data range, we use different runs of the experiment to shift the test data throughout the recorded period: Six months for the wind and four months for the PVdataset resulting in four runs for each dataset. In each run, the remaining data is used for training (%) and validation (%). After completing all training runs, combing predictions from all test datasets provides an evaluation dataset for estimating influences in the complete period of the original data. To account for extreme errors and measure the quality between a single forecast and the historical power we use the squared error. We fit distributions of the squared error with the distribution to compare them with the KLD.
5.2 Influence of the Amount of Training Data
The digitalization of the current and future energy market will provide an increasing amount of training data. To determine the extent to which the amount of training data influences the forecast error we analyze it in this section.
Therefore we estimate the data coverage of a farm in percent compared to the maximum number of data points. It turns out that the data coverage in PVfarms is consistently above except for one farm, respectively we do not consider it in further analysis. However, the data coverage from wind farm range between and allowing for clustering them in ten percent steps, see Figure 1. As the size of the test dataset is constant in each run, the size of the training data is as well, respectively, the data coverage is directly linked to the amount of training data and will be treated equally in the following.
Figure 1 shows the relation between the number of training samples and the error: With the increasing amount of data, the median as well as the mean decrease. The spread of the error is similar for bin two, three, and four. Bin zero and one have a broader spread of the error. The decreasing mean, median, and spread show that there is a relation between the amount of data for training and the forecast error.
To verify a significant difference between these bins, we compare them with the KLDand the KruskalWallis hypothesis test. KruskalWallis hypothesis shows that the forecast error for all MLmodels and all bins of data coverage are significantly different at a significance level of . The exemplary results in Table 1, highlight the previous observation: A decreasing data coverage, causes an increased spread, median, and mean resulting in larger values of the KLD, e.g., when comparing bin zero with bin four. Bin two, three, and four are quite similar to each other nonetheless.
Bins  0  1  

0  
1  
2  
3 
As expected there is a relation between the amount of data available and the forecast error. With an increasing amount of available data, the MLmodel tends towards a minimum error, the NWPinput data probably cause that.
5.3 Influences by Seasonal Patterns and Terrain
Seasonal influences that are present in seasons of a year or hours of a day are well known. Nonetheless, there is limited research on how these patterns affect the error distribution in wind and PVforecast based on MLtechniques. More common is the analysis of forecast error w.r.t.their terrain, which this section also covers.
In the following, we address season of a year and the hour of the day. In terms of PV, the hour of the day has two meanings. First, due to the daily pattern of the sun, we can observe patterns within the power generation. Second, with the rising time of the day, the forecast horizon of the NWPmodel increases (as the socalled NWPmodel run typically originates from 12 UTC). As the horizon increases, the error of the weather forecast increases and respectively that of the power forecast model. The latter also holds for wind power forecasts.
In the sample boxplot, Figure 2 for wind errors we can observe this pattern. The median and mean errors for different hours of the day do not increase drastically due to the absence of seasonal weather patterns in the wind; detailed observations exist when measuring similarity through KLDin Table 2. The errors at the end of the day are more similar to another by means of the KLD, compared to the origin of the weather forecast due to the increased forecast error of the NWPs. Nonetheless, all errors, when comparing different hours of the day, are significantly different in the KruskalWallis hypothesis test () except: Four cases in the linear model, one for the MLP, and two comparisons for the GBRTmodel.
Hour  

0  
3  
6  
9  
12  
15  
18 
For PVhowever, see Figure 3, we can observe a strong seasonal pattern in the error distributions for different hours of the day. This observation holds even more true when estimating the KLDresulting in substantial large values when comparing with o’clock, see Table 2. This seasonal pattern is to expected, as during the night there is no power generation, and respectively the difference in the error distribution is notable when compared to the day. Compared to wind, there are also more considerable differences in the error distributions during the daytime. The daily pattern of the sun causes these differences that result in different error distributions. Again, all errors, when comparing different hours of the day, are significantly different in the KruskalWallis hypothesis test () except: Four cases in the SVRmodel and one case for the MLPmodel.
In the analysis for different seasons of a year for wind, we observe that in the third season all models and datasets have the lowest median, mean, and spread of the error for wind. In other seasons of the year, extreme weather conditions are more common, causing larger error values. The KLD, see Table 4, confirms our intuition, that error distributions for seasons close to another are more similar to another than those far away.
Contrary to wind forecasts, PVmodels have more substantial errors in the third season. In other seasons of the year, the different position of the sun causes a different amount of direct and diffuse radiation making it the forecast model easier to forecast the power generation. For instance, the solar radiation (direct and diffuse) is the smallest in season one in the dataset. The analysis of the KLDin Table 4 suggest that the difference of uncertainty is significant even for seasons of the year that are close to another. These differences are caused by the larger magnitude of the forecast error, especially in the third season. Only when comparing season one and three for errors of the GBRTmodel on the WindFarm dataset the KruskalWallis () estimates no significant difference.
The analysis of the terrain in Figure 4 shows that the smallest errors are present for parks located in a farmland terrain. All errors, when comparing the different terrains, are significantly different in the KruskalWallis hypothesis test (). Note that the terrains have a varying amount of farms. Farmland has , the forest has , and offshore includes four farms. Interestingly, when measuring the similarity, the error distribution of offshore farms is closer to the farmland, than farmland to the forest employing the KLD. This smaller KLDmight be due to more complex weather conditions in the forest and offshore terrain. For instance, turbulence on the sea might be similarly present in forests (that are often also elevated) causing a similar uncertainty distribution.
Conclusively, we showed similarity and dissimilarity in seasonal and terrain specific patterns. Interestingly, the difference in error distribution is one of the largest for different seasons of the year for wind and PV. Wind errors are more significant in the winter and autumn, while PVmodels have larger values in the spring and summer time. Finally, the uncertainty distribution in offshore terrain is like that of the forest.
5.4 Influences by Models
After analyzing external influences to the error distributions, in this section, we are interested in comparing the similarity between the MLmodels. As results for PVmodels suggest that there are strong seasonal patterns to consider  that are less present for wind models  we limit further analysis to the WindFarm dataset.
In previous results from wind models, we show that the error is the smallest for the farmland terrain and when training the MLmodel on a data coverage between to .
Limiting the analysis to the smallest errors gives us insights, more similar to an absence of external influences. As those influences must have a smaller effect on the error distribution compared to distribution with larger mean, median, and spread. Ultimately, allowing to access the differences in the MLmodels and not those caused by external influences.
In both analysis we observe that GBRTachieves the smallest error, SVRthe second, MLPthe third smallest and LASSOhas the most substantial error. Table 5 and 6 summarize the the difference in their error distributions for the experiment with maximum data coverage and the farmland terrain.
GBRT  LASSO  SVR  MLP  

GBRT  
LASSO  
SVR 
GBRT  LASSO  SVR  MLP  

GBRT  
LASSO  
SVR 
Results suggest that distributions within a terrain are more similar to another than within maximum data coverage caused by the relation that specific weather conditions, are individual for different terrains, resulting in terrain specific forecast errors. Nonetheless, estimates of the KruskalWallis hypothesis test () shows that they are still substantially different.
6 Conclusion and Future Work
In this article, we presented an indepth analysis and comparison for influencing factors of uncertainty in wind and PVpower forecasts based on four different MLmodels. In our analysis, we found substantial influences and differences between compared bins of uncertainty revealing the need to consider them in future planning studies.
For instance, the study reveals strong seasonal patterns in the uncertainty for wind and PV. For wind power forecasts, neighboring seasons and hours are similar to each other. For seasonal patterns within a year, these forecasts will benefit from optimizing NWPforecasts for extreme weather situations that cause substantial errors in the winter time. Due to significantly larger errors for the third season adjacent seasonal bins in PVforecasts are not necessarily similar to each other. Similar results are obtained for daily patterns. For daily patterns, we recommend to use NWPforecasts that are closer to the time (noon) of most substantial error.
By analyzing the relation between the amount of training data and the uncertainty we showed that models improve when using additional data up to a data coverage of about %. Reducing this error further is, e.g., possible with deep learning models that have a higher capacity to learn the relation between NWPfeatures. However, even with an increasing amount of data, the minimum forecast error will be limited to that error caused by the NWP.
The study reveals that after minimizing external influences, differences in the uncertainty distributions from the four MLmodels are still present motivating the need to consider the underlying forecast model in future planning studies.
In the future, we aim to investigate how transfer learning can be utilized to reduce forecast uncertainty when limited data is available.
Acknowledgement This work was supported within the project Prophesy (0324104A) funded by BMWi (Deusches Bundesministerium für Wirtschaft und Energie / German Federal Ministry for Economic Affairs and Energy).