Comparison of statistical post-processing methods for probabilistic NWP forecasts of solar radiation

Comparison of statistical post-processing methods for probabilistic NWP forecasts of solar radiation

Kilian Bakker Kirien Whan Wouter Knap Maurice Schmeits maurice.schmeits@knmi.nl Royal Netherlands Meteorological Institute (KNMI), P.O. Box 201, 3730AE De Bilt, the Netherlands
Abstract

The increased usage of solar energy places additional importance on forecasts of solar radiation. Solar panel power production is primarily driven by the amount of solar radiation and it is therefore important to have accurate forecasts of solar radiation. Accurate forecasts that also give information on the forecast uncertainties can help users of solar energy to make better solar radiation based decisions related to the stability of the electrical grid. To achieve this, we apply statistical post-processing techniques that determine relationships between observations of global radiation (made within the KNMI network of automatic weather stations in the Netherlands) and forecasts of various meteorological variables from the numerical weather prediction (NWP) model HARMONIE-AROME (HA) and the atmospheric composition model CAMS. Those relationships are used to produce probabilistic forecasts of global radiation. We compare 7 different statistical post-processing methods, consisting of two parametric and five non-parametric methods. We find that all methods are able to generate probabilistic forecasts that improve the raw global radiation forecast from HA according to the root mean squared error (on the median) and the potential economic value. Additionally, we show how important the predictors are in the different regression methods. We also compare the regression methods using various probabilistic scoring metrics, namely the continuous ranked probability skill score, the Brier skill score and reliability diagrams. We find that quantile regression and generalized random forests generally perform best. In (near) clear sky conditions the non-parametric methods have more skill than the parametric ones.

keywords:
Model Output Statistics (MOS), probabilistic forecasting, solar radiation, Numerical Weather Prediction (NWP)
journal: Solar Energy

1 Introduction

Forecasts of meteorological variables are produced by projecting an initial state of the atmosphere into the future using a numerical weather prediction (NWP) model. Both in the initial state and the projection there are unavoidable errors related to observational scarcity and parameterization respectively. This leads to uncertainty in the forecasts that can be quantified by generating a probabilistic forecast. We focus on global radiation and apply a statistical post-processing technique called model output statistics (MOS, Glahn and Lowry (1972)) to fit relationships between various NWP variables and radiation observations. The relationships are used for producing skillful and reliable probabilistic forecasts that can be very beneficial for users of solar energy by providing uncertainty information.
Quite some work has already been done on generating probabilistic forecasts of global radiation. For example Lorenz et al. (2009) produced 95% confidence intervals on errors in the clear sky index (CSI, defined as global radiation divided by clear sky radiation) by fitting a polynomial function to the standard deviation of the errors as a function of the forecast CSI and the solar zenith angle. Verzijlbergh et al. (2015) fitted a normal distribution to the CSI errors using NWP output from the GFS model of NCEP. Other distributions were also used for forecasting solar radiation, such as the beta and two sided power distribution in Fatemi et al. (2018) and the gamma distribution in Bracale et al. (2013). They both use observations of various meteorological variables to forecast radiation.
Massidda and Marrocu (2018) produced probabilistic forecasts by fitting quantiles of the underlying distribution of radiation, called quantile regression, using deterministic and probabilistic NWP output of ECMWF. They also extended this to a method called gradient boosting decision trees. Almeida et al. (2015) use quantile regression forests to forecast quantiles of the distribution of solar power using NWP output from the WRF model run at Meteogalicia. Neural networks were used by Galván et al. (2017) to forecast confidence intervals of solar radiation using NWP output from the GEFS model of NCEP.
In recent literature, there are some reviews that give an overview of papers using post-processing methods to produce probabilistic forecasts of solar radiation or solar power production. For example Antonanzas et al. (2016) listed papers using neural networks, random forests, support vector machines and nearest neighbours for forecasting solar power. Van der Meer et al. (2018) listed papers using quantile regression, quantile regression forests, Gaussian processes, bootstrapping, lower upper bound estimate, gradient boosting, kernel density estimation, nearest neighbours and an analog ensemble for forecasting solar power. Voyant et al. (2017) listed papers using linear regression, generalized linear models, neural networks, support vector machines, decision tree learning, nearest neighbours and Markov chains for forecasting solar radiation. This last review is oriented towards deterministic forecasts, whereas the first two focus on probabilistic forecasts.
There are relatively few papers in which a comparison among multiple regression methods on the same data set for solar radiation or solar power forecasting is a primary goal. David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used only past observations of global radiation and the CSI as predictors. Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and lasso and ridge regression. They used ECMWF output variables as predictors in the regression methods. Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests with NWP output from the PEARP model at Météo France as predictors. Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with CSI observations as predictor.

In this paper we make a comparison of probabilistic forecasts between an extensive list of regression methods consisting of parametric methods (gamma and truncated normal distribution) and non-parametric methods. The non-parametric methods consist of quantile regression, quantile regression forests, gradient boosting decision trees and the relatively new generalized random forests and quantile regression neural networks. To our knowledge we are the first to make such an extensive comparison of probabilistic forecasting methods in a MOS setting. We use a large number of potential predictors, consisting of output from the high-resolution non-hydrostatic NWP model HARMONIE-AROME (HA, running at KNMI) and the atmospheric composition model Copernicus Atmosphere Monitoring Service (CAMS, running at ECMWF). This is a second novel aspect of this work, as previous work has not explored the benefit of including both NWP (here HA) and CAMS predictors to our knowledge. The regression methods fit relationships between the predictors and CSI observations for producing probabilistic forecasts. We investigate the importance of the predictors in the fitted relationships. Additionally, the probabilistic forecasts are verified and compared with multiple scoring metrics to see which method performs best in which situation. These include the continuous ranked probability skill score, Brier skill score and reliability diagrams (e.g. Wilks (2011)). We also show the improvement with respect to the raw forecast (the global radiation forecast from HA) with the root mean squared error skill score and the potential economic value (e.g. Richardson (2000)).

The paper is structured as follows: in Section 2 we describe the research setting, consisting of the data, the regression methods, the fitting procedure and the scoring metrics. In Section 3 we present our results and discuss them in Section 4. The paper is finished with conclusions in Section 5.

2 Data and Methods

In this section we first describe the model data and observations we used, including transformations applied to the model data. Next, we give an overview of the regression methods used to model the relationships between forecasts of weather variables (the predictors) and CSI observations (the predictand). The procedure we used for fitting is described and after that we describe verification measures to verify the methods.

2.1 Model data and observations

The meteorological variable we study in this paper is the global radiation, defined as the total amount of solar radiation reaching a horizontal plane on the Earth’s surface. We used hourly average observations of global radiation measured at 30 weather stations in the Netherlands by the KNMI network of automatic weather stations for the period of April 1, 2016 to March 16, 2018 (KNMI (2018)). We transformed the global radiation into CSI by dividing them by the hourly average clear sky radiation. This removes the large seasonal and diurnal cycle of radiation. The clear sky radiation is calculated using the European Solar Radiation Atlas (ESRA) model (Rigollier et al. (2000)), with as inputs the solar zenith angle (Michalsky (1988)) and the Linke turbidity factor (monthly average values for De Bilt are used from Remund et al. (2003)). The locations of the stations and the annual averaged observed CSI are shown in Figure 1. For coastal stations the annual averaged observed CSI is slightly higher than for inland stations.

Figure 1: The KNMI station locations and annual averaged observed CSI in the Netherlands. Station Cabauw is marked in blue.

The model data that we used as the potential predictors for the regression methods consist of forecasts of atmospheric temperature, humidity and different cloud, radiation and aerosol variables (Table 2). Most of the predictors (marked with *) are NWP output of HA (Bengtsson et al. (2017)). The aerosol predictors (marked with ** ) are forecasts produced by CAMS (Flemming et al. (2017)). Because HA uses climatological averages for the aerosols, we expect the aerosol forecasts as predictors to have added value in fitting the relationships (see also Alfadda et al. (2018)).
We used hourly output from HA for lead times 0-48 hours at the grid points closest to the stations. The CAMS lead times are 3-hourly and therefore we take the lead time closest to the corresponding time of the observations. The CAMS forecasts are later available than HA and therefore we use them from lead times of 24-72 hours. More information about HA and CAMS can be found in Table 1.

HA CAMS
Version(s) 38 43r3, 43r1, 41r1
Type of forecasts Deterministic
initial time 0 UTC
Time domain 0-48 hours 24-72 hours
Time resolution hourly 3-hourly
Spatial domain Western Europe (800x800 grid points) Global
Spatial resolution 2.5km x 2.5km x
Vertical levels 65 60
Table 1: Information about the models HA and CAMS.
Potential predictors
Temperature*
Relative humidity*
Radiation* , , , ,
Clouds and rain* , , ,
Aerosols** , ,
Time/place , , , , , ,
(a) HA = *, CAMS = **. ’surf’ stands for the Earth’s surface and ’toa’ for the top of the atmosphere (defined as the height with air pressure equal to 0.01 hPa). The layers indicate the lower (0-2000 m), middle (2000-6000 m), higher (6000 m-TOA) and total (0 m-TOA) part of the atmosphere.
Temperature
Relative humidity
Global radiation
Direct radiation
Net clear sky (global) radiation
Cloud cover
Cloud water
Precipitable water
Vertically integrated aerosol optical depth at 500 nm
Vertically integrated Ångström exponent
Vertically integrated ozone
Latitude
Longitude
Day of the year
Distance
(b) The abbreviations of the potential predictors explained. Net clear sky stands for the incoming minus the outgoing radiation under a clear sky.
Table 2: The potential predictors

For each of the potential predictors with ’layers’ as a subscript, we computed them for the lower (0-2000 m), middle (2000-6000 m) and upper atmosphere (6000 m and higher), and also an aggregated version of the total atmosphere (except for temperature and relative humidity, where the total atmosphere component is missing). The HA output consists of 65 levels in the atmosphere and for the cloud water we applied a weighted average (with weights equal to the distances between the levels) to reduce these to the four layers (lower, middle, upper and total atmosphere). For temperature and humidity we used HA output from 11 pressure levels and reduced these to the three layers (lower, middle and higher part of the atmosphere) by applying a weighted average. The precipitable water is computed using temperature and humidity from the 11 pressure levels together with the method outlined by McRae (1980) and applying a weighted average.
We transformed all radiation predictors into hourly average CSI values in the same way as the observations. The rain predictor values are precipitation sums over each hour. For all other potential predictors the valid time is used.
We also calculated time/place predictors, which are used as additional potential predictors. These consist of the latitude and longitude of the station locations, the day of the year, the distance to the coast (minimum distance to the sea), distance to water (minimum distance to the sea or one of the lakes), distance to inland (the distance to the intersection point of the borders of the Netherlands, Belgium and Germany) and the cosine of the solar zenith angle.

2.2 Regression methods

We used 7 different regression methods to find relationships between the predictors and predictand and these are described in the following subsections.

2.2.1 Parametric regression

Parametric regression methods assume a specific predictive distribution with parameters (location, scale, skewness and kurtosis). The parameters are first chosen as constant values and then fitted sequentially (one by one, each new one using the already fitted ones) with linear relationships:

(1)

where , are the predictors and are the regression parameters. The optimal linear relationship is found by minimizing the log-likelihood function.
We avoid overfitting by using only a selection of the predictors in the model. The selection that gives the best fit is found by forward (adding one predictor) and backward (removing one predictor) stepwise selection that aims to minimize the Akaike Information Criterion (AIC). The stepwise procedure is applied separately for each distribution parameter.
We apply this regression method in the programming language R using the gamlss package (Rigby and Stasinopoulos (2018)) for two distributions: the gamma distribution (GA) and the truncated normal distribution (NOTR). The truncated normal distribution is formed by left truncation at zero of the normal distribution (Rigby and Stasinopoulos (2005)) and has therefore, just like the gamma distribution, only probability mass for positive values, which suits for forecasting radiation.

2.2.2 Quantile regression

Quantile regression fits a number of quantiles from the underlying distribution separately by assuming the linear relationship:

(2)

The optimal linear relationship is found by minimizing the weighted sum of residuals (WSR, defined as observations - fitted quantiles), with weights equal to and for respectively positive and negative residuals (Koenker (2005)). Regularization to prevent overfitting was not successful in our study and therefore we apply the stepwise procedure, this time with the aim of minimizing the average over the AIC values for the different quantiles. Consequently, all quantiles are fit with the same predictors. It occasionally happens that negative values are forecast and in that case we set them to zero. Also, we solve crossings between quantiles by sorting them in ascending order. Lastly, some small random noise, generated from a normal distribution with mean zero and a variance of 0.001, is added to the predictors to prevent the matrix with the predictors from becoming singular. Quantile regression (QR) is applied in R using the quantreg package (Koenker (2018)).

2.2.3 Quantile regression neural network

Quantile regression neural networks are a non-linear extension of quantile regression. It can be visualized by an input layer (the predictors), one or more hidden layers and an output layer (the forecast for one quantile). In the hidden layers we adjust the input towards the output by multiplication with weights, adding some bias and applying a non-linear activation function, taken here as the default . The optimal weights and bias are found by minimizing the WSR in a boosting fashion where the negative gradient is followed for a number of iterations. More details can be found in Cannon (2011).
Regularization to prevent overfitting was not successful and therefore we apply the stepwise procedure. Further, negative forecasts are set to zero. Crossings between quantiles are solved by adding the quantiles as a predictor in the input layer with a monotonically increasing constraint (Cannon (2018a)). This is called the monotone composite extension of quantile regression neural network (MCQRNN) and we apply it in R using the qrnn package (Cannon (2018b)).

2.2.4 Random forests

Random forests have binary regression trees (Breiman et al. (1984)) as building blocks. A binary regression tree is formed by splitting the data set into subsets by applying thresholds on the predictors. Each split is chosen to minimize the variance in the predictand. The splitting is repeated until some stopping criterion is reached, such as a minimum number of predictand values left in each subset. A random forest is a collection of trees (Breiman (2001)) and reduces the chance for overfitting by constructing each tree on a different (bootstrap sampled) subset of the data. To make the trees even more independent, at each split only a (bootstrap sampled) subset of the predictor set is used. Probabilistic forecasts are made by looking at the underlying distribution of the predictand values in the terminal nodes of all trees and drawing quantiles from it (Meinshausen (2006)). This method is called quantile regression forests (QRF) and is applied in R using the quantregForest package (Meinshausen (2017)).
Generalized random forests (GRF) is very similar to QRF. Each split is now chosen to maximize the difference between the underlying distributions of the predictand values (Athey et al. (2016)). Also, whereas QRF uses bootstrap sampling with replacement, GRF uses bootstrap sampling without replacement. GRF is applied in R using the grf package (Tibshirani et al. (2018)).

2.2.5 Gradient boosting decision trees

Gradient boosting decision trees boosts the fitted quantiles (separately) towards the actual quantiles of the underlying distribution by following the negative gradient of the WSR for a number of iterations. The negative gradients are estimated using regression trees. Then in each iteration the fits are updated by adding the estimated negative gradients multiplied with the learning rate (Friedman (2001)). The learning rate is generally chosen to be small () to prevent overshooting the minimum in the WSR. To reduce the chance for overfitting, each tree is constructed on a (bootstrap sampled) subset of the data (Friedman (2002)). Consequently, in each iteration we only improve the fitted quantile for the predictand values in the chosen subset. Further, negative values are set to zero and crossings between quantiles are solved by sorting them in ascending order. Gradient boosting decision trees (GBDT) is applied in R using the gbm package (Ridgeway (2018)).

2.3 Fitting procedure

We fitted relationships between the potential predictors and the predictand using the 7 methods. The fitting was done per lead time, because systematic errors increase with lead time, and for all stations at once. We split the data set into a training (for model fitting) and testing (for verification) set, based on 2 different groups of consecutive dates. We applied a 3-fold cross-validation, leading to around 3200 training and 1600 testing cases for each fit. For each lead time we only take combinations of dates and stations during daylight into account, defined as a clear sky radiation higher than 20 W/m. Only for lead times +5h till +19h and +29h till +43h there are enough (defined by at least 50) combinations of dates and stations during daylight to make the fit. For each method, we generated probabilistic forecasts consisting of 49 quantiles, distributed evenly between 0.02 and 0.98 with a gap of 0.02 between them.
We can either fit the methods on the whole year or on the four meteorological seasons separately, defined by the Winter {December, January, February}, the Spring {March, April, May}, the Summer {June, July, August} and the Autumn {September, October, November}. As shown in Bakker (2019), fitting for each season separately produces better fits and is therefore used in producing the results. Also averaging the potential predictors from HA over 3 lead times and over a 9x9 block of grid points gives better fits. Next, the optimal hyper-parameter settings for each method can be found in Bakker (2019) (only the optimal settings for MCQRNN have changed) and are summarized in Table 3.

GA NOTR QR MCQRNN
#steps ( or quantiles) 5 5 5 5
#steps () 1 1 - -
#hidden layers - - - 1
#deep hidden layers - - - 0
Iterations - - - 10
Penalty - - - 0
QRF GRF GBDT
#trees 500 500 100
Minimal nodesize 5 5 5
Case sampling fraction 1/2 1/2 1/2
Predictor sampling fraction 1/3 1/3 -
Depth of trees - - 1
Learning rate - - 0.1
Table 3: The optimal hyper-parameter settings for each method. The descriptions of the hyper-parameters are found in Bakker (2019).

2.4 Scoring metrics

To quantify the performance of the regression methods, we calculated scoring metrics over the forecasts (e.g. Wilks (2011)). First, the medians of the forecasts are verified using the root mean squared error (RMSE):

(3)

where are the observations and the forecasts, with ’med’ denoting the 0.5 quantile. The RMSE is transformed to a skill score using the RMSE of the sample climatology:

(4)

For verifying the probabilistic forecasts we used the continuous ranked probability score (CRPS):

(5)

We calculated the continuous ranked probability skill score (CRPSS) using Equation (4) with the RMSE replaced by the CRPS.
For verification it is also informative to look at probabilities of not exceeding a certain threshold of CSI. Therefore we have binary observations () and forecast probabilities (), which are verified using the Brier score (BS):

(6)

The skill is measured by the Brier skill score (BSS), calculated using Equation (4) with the RMSE replaced by the BS.
We also produced reliability diagrams that compare forecast probabilities with observed relative frequencies. A perfect reliable forecast has a 1:1 correspondence between them.
The potential economic value measures the potential economic impact of the forecasts and is defined by a simple cost-loss model described in Richardson (2000). It looks at the event of not exceeding a certain threshold of CSI with the associated contingency table (Table 4). A hit and false alarm lead to some costs , a miss to a loss and with a correct rejection there are no costs or losses.

Observed Not observed
Forecast Hit False alarm
Not forecast Miss Correct rejection
Table 4: Contingency table corresponding to the event that the threshold of CSI does not exceed a certain value . indicates costs and losses.

The potential economic value (PEV) is calculated as:

(7)

where and are the hit, false alarm and miss frequencies and ORF the observed relative frequency.

3 Results

In this section we first investigate how important the predictors are for the different methods. Subsequently, we show the performance of the methods according to all scoring metrics defined in Section 2.4.

3.1 Importance of predictors

We have 34 potential predictors and some of them are more important than others. In Table 5 we list the 10 most important predictors for each method. For the stepwise-based methods (GA, NOTR, QR and MCQRNN) the ranking is based on the amount of times the predictors are chosen in the procedure. For the tree-based methods (QRF, GRF and GBDT) the ranking is based on the improvement the predictors have on the fitted trees, considered over all splits where the corresponding predictor is chosen. The improvement is defined differently in QRF, GRF and GBDT and more details can be found in Breiman et al. (2018), Tibshirani et al. (2018) and Ridgeway (2018). The final ranking is then the average improvement over all trees in all fits.

Rank GA NOTR QR MCQRNN QRF GRF GBDT
1
2
3
4
5
6
7
8
9
10
Table 5: The top 10 ranking of the importance of the predictors for each method.

We see that the raw HA global radiation forecast () is the most important predictor for almost all methods. This is what one would expect, because already accounts for all kinds of effects from other HA variables. It does however not account optimally for these effects and consequently other predictors are also important, such as direct radiation, relative humidity, cloud cover and cloud water. The relative humidity is more important than the direct radiation in the stepwise-based methods, whereas for the tree-based methods it is the opposite way. After , the direct radiation and relative humidity, cloud predictors such as or follow in most methods. Sometimes the cloud predictors are even higher in the ranking. The aerosol predictors from CAMS are more important for the stepwise-based methods than for the tree-based methods, with as most important of the three. The aerosol predictors are in general of lower importance than cloud related predictors. This can be explained by the fact that clouds block on average more incoming radiation than aerosols. For the predictors with multiple layers, the lower component is in general more important than the middle or higher component. Therefore the lower atmosphere has more influence on global radiation.
Furthermore, it is good to note that although some predictors are more important than others, they are all used (seen over all fits) in forecasting global radiation. Also, the tree-based methods have the advantage over the stepwise-based methods that they make use of all predictors in all fits, whereas for the stepwise-based methods we set a limit on the amount of predictors to prevent overfitting.

3.2 Verification of deterministic forecasts

We computed the RMSE_SS for all lead times, all stations, the four seasons and the different methods and in Figure 2 we plot the RMSE_SS against the lead time for the four seasons. All methods are more skillful than the raw forecast and have therefore an increased accuracy. The RMSE_SS with respect to the raw forecast ranges between 2-3% in winter, 5-7.5% in spring and 7.5-10% in summer and fall. For lead times close to the night, the skill drops due to the fact that the sample climatology gets closer to zero and becomes harder to beat. The second forecast day shows lower RMSE_SS than the first day as expected.
In winter and spring the RMSE_SS is relatively stable during the day. Only QRF and MCQRNN perform slightly worse than the other methods in respectively winter and spring. In summer and autumn we see that the skill scores drop during the day, which might be caused by convection as it is difficult to forecast by NWP models. The highest skill scores are reached in the morning. In summer, MCQRNN performs slightly worse than the other methods. In autumn, all methods perform similarly according to the RMSE_SS.

Figure 2: RMSE_SS versus lead time for the four seasons and averaged over all stations. The abbreviations of the methods are explained in Section 2.2 and RAW stands for the raw forecast.

3.3 Probabilistic forecasts for selected cases

Two interesting cases are shown for respectively a clear sky and fully overcast day at station Cabauw (see Figure 1). First, we look at the case of April 9, 2017, which was a day without clouds. We compare the raw forecast and post-processed forecasts for lead times +6h till +18h with the observations for 6-18 UTC in Figure 3. Two methods are chosen, one parametric (GA) and one non-parametric (QRF), for which the median and the forecast range between the lowest and highest quantile (the gray area) are plotted. We see that the raw forecast is already very accurate and this can be seen on most clear sky days. The raw forecast has a small negative bias around 12 UTC and both GA and QRF correct this in the median, but with a small positive bias, especially for GA. Looking at the uncertainty, we see that QRF is much more certain than GA, with a bandwidth of around 100 W/m versus 200 W/m.

(a) GA method
(b) QRF method
Figure 3: Clear sky case of April 9, 2017 at station Cabauw (see Figure 1). OBS stands for the observations, RAW for the raw forecast, GA for the gamma distribution and QRF for the quantile regression forest. The gray area denotes the probability distribution from the 0.02 until the 0.98 quantile.

For a fully overcast day we take December 31, 2016. In Figure 4 we compare GA and QRF with the raw forecast for lead times +9h till +15h. We see that the raw forecast clearly underestimates the amount of radiation. Both GA and QRF correct for this bias almost completely in the median. QRF generally keeps a small negative bias, while GA has a small positive bias after 13 UTC. Both methods have bandwidths for the uncertainty ranging approximately between 50 and 100 W/m. This indicates that the methods are very certain that it will be a fully overcast day. GA is slightly more uncertain about the amount of radiation than QRF. The distributions of both GA and QRF are asymmetrical and skewed towards higher (than the median) amounts of radiation, which is (physically) realistic in this case, given that the lower bound of radiation should be strictly greater than zero in the middle of the day.

(a) GA method
(b) QRF method
Figure 4: As Figure 3 but for December 31, 2016, which was a fully overcast day.

3.4 Verification of probabilistic forecasts

For verifying the probabilistic forecasts, we show the results from multiple scoring metrics in the following subsections to compare the performance of the different methods.

3.4.1 Continuous ranked probability skill score

Starting with the CRPSS, in Figure 5 we plot the CRPSS over the lead times for the different methods in the four seasons. All methods are more skillful during the day than closer to the night, because the climatology is harder to beat closer to the night. The second day shows as expected less skill than the first day. All methods have similar performance with skill scores around 0.4 on day 1 and 0.3-0.35 on day 2. In spring and summer the methods have more skill than in winter and autumn. In winter and spring the skill is relatively stable during the day, while in summer and autumn the skill is best in the morning and decreases during the day. The difficulty in forecasting convection might be the cause of this. In spring and summer, GA performs worse than the other methods, while in winter and autumn the difference in performance between the methods is smaller. Overall, QR and GRF perform best in terms of the average CRPSS.

Figure 5: CRPSS versus lead time for the four seasons and averaged over all stations. The abbreviations of the methods are explained in Section 2.2.

It is also informative to look at the spatial patterns of the CRPSS and therefore we plot the maximum value for each station for lead time +12h in Figure 6. The maxima range between 0.3 and 0.47. We see that the method reaching the maximum varies between the seasons and the stations, only NOTR never performs best. In winter QR is most skillful at most stations, while in spring it is GRF. In summer and autumn it is less clear which method is most skillful overall. Although the skill varies between stations, there is not a clear spatial pattern. There are some seasonal variations, for example in winter and autumn the stations near the coast have mostly less skill than stations more inland. This might occur because convection over sea causes more clouds over stations near the coast.

Figure 6: CRPSS for all stations and the four seasons at lead time +12h. Only the maximum value over the methods is shown. The abbreviations of the methods are explained in Section 2.2.

3.4.2 Brier skill score

To investigate the performance of the methods for different CSI values, we plot in Figure 7 the BSS versus the CSI threshold for the four seasons at lead time +12h. The BSS increases as a function of CSI in spring and summer, while in winter it decreases again for higher CSI. In autumn the BSS is stable around 0.4 for CSI values higher than 0.3. The BSS increase in spring and summer is probably caused by the higher predictability of clear sky versus convective situations. While the differences between the methods are small for lower CSI values, it is interesting to note that for thresholds higher than 0.7 (corresponding to (near) clear sky conditions), there is a bifurcation in the skill, especially in spring and summer. The tree-based methods and QR show higher BSS than the other methods. Consequently, the global radiation under (near) clear sky conditions is better forecast by the tree-based methods and QR.

Figure 7: BSS versus CSI threshold at lead time +12h for the four seasons, averaged over all stations. The abbreviations of the methods are explained in Section 2.2.

3.4.3 Reliability diagrams

To access the reliability of the methods, in Figure 8 the reliability diagrams for CSI thresholds 0.2, 0.5 and 0.9 for lead time +12h in summer are shown. For the threshold 0.2 all methods are unreliable for high forecast probabilities, due to a low amount of cases. For the low forecast probabilities the tree-based methods are very reliable and almost on top of the diagonal line. The stepwise-based methods are slightly less reliable. For the threshold 0.5 all methods are very reliable and there is no method that clearly performs better or worse than the rest of the methods. For the threshold 0.9, all methods are reliable for the high forecast probabilities, but for lower forecast probabilities the tree-based methods are again most reliable. The parametric methods are least reliable and QR and MCQRNN are in between. In general the tree-based methods are most reliable, especially under clear sky conditions.

Figure 8: Reliability diagrams for different CSI thresholds at lead time +12h in summer, for all stations together. The abbreviations of the methods are explained in Section 2.2.

3.4.4 Potential economic value

To further study the behaviour for different CSI values, in Figure 9 we plot the potential economic value (PEV) versus the cost-loss ratio for CSI thresholds 0.2, 0.5 and 0.9 at lead time +12h in summer. For all thresholds, the methods show higher PEV values than the raw forecast and show positive PEV values over a wider cost-loss ratio range (the full range between 0 and 1 for thresholds 0.5 and 0.9). For threshold 0.2, the tree-based methods perform worse than the stepwise-based methods for cost-loss ratios above 0.4. For the threshold 0.5, all methods have equal performance. For the threshold 0.9, the tree-based methods and QR perform slightly better than the other methods for cost-loss ratios below 0.2.

Figure 9: Potential economic value versus cost-loss ratio for different CSI thresholds at lead time +12h in summer, for all stations together. RAW stands for the raw forecast and the rest of the abbreviations are explained in Section 2.2.

4 Discussion

The performance of the methods presented here depends on the lead time, the station location and the season. Nevertheless, in general all methods had similar performance. It also depends on the scoring metrics, that focus on different aspects, like accuracy and reliability, and on deterministic or probabilistic forecasts. We often saw that quantile regression and generalized random forests performed best and that the tree-based methods outperformed the stepwise-based methods in (near) clear sky conditions. In the discussion presented here we compare our results with results from other papers, where comparing methods is the primary goal, and look for differences and similarities.
David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used endogenous data consisting of past observations of global radiation and CSI as predictors for forecasting CSI. They came to the conclusion that all methods performed very similarly, in agreement with our results. They based this on verification measures including the CRPS, reliability diagrams and rank histograms. They found (weighted) quantile regression and gradient boosting decision trees as most efficient methods for generating probabilistic forecasts in terms of complexity and computational time. They concluded that neural networks are the least efficient with a large computational time and requiring sufficient scientific background for good performance.
Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and both lasso and ridge regression. They used NWP output from the ECMWF as predictors to forecast solar power production. They found gradient boosting decision trees as best performing in terms of the RMSE and mean absolute error (MAE). In our study, gradient boosting decision trees performed well, but not the best, according to the RMSE.
Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests. They used NWP output from the PEARP model at Météo France. They forecast solar power production and verify the forecasts using the CRPS and rank histograms. They found that the best performing method depends on the specific settings and concluded that quantile regression and quantile regression forests performed similar in general, which is in agreement with our results.
Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with past CSI observations as predictor set. They generate prediction intervals for the CSI and verify them using the gamma test, which is based on the mean interval length and prediction interval coverage probability. The authors found that gradient boosting decision trees performed best. In our study, quantile regression forests and gradient boosting decision trees performed almost similar.

5 Conclusions

In this paper we compared 7 regression methods. Using these methods, we produced probabilistic forecasts of global radiation with the raw forecast (the deterministic global radiation forecast from HA) together with other relevant meteorological variables as potential predictors. We saw that all potential predictors were used and that global radiation, direct radiation, relative humidity, cloud cover and cloud water were most important. The aerosol predictors from CAMS were less important than the radiation and cloud-based predictors from HA, but they were still often selected, particularly the aerosol optical depth in the stepwise-based methods. Aside from that, the tree-based methods have the advantage of using all predictors in all fits, whereas for the stepwise-based methods the amount of predictors used in a fit is limited. We verified the forecasts in a deterministic way using the root mean squared error skill score. All methods had an improvement between 2% and 10% upon the raw forecast, depending on the season and lead time. In visualizing the probabilistic information in the forecasts we looked at two interesting cases with different weather conditions: a clear sky and fully overcast day. In both cases the medians of the forecasts were close to the observations, but the uncertainty in forecast radiation differed between the methods on the clear sky day. Next, we verified the methods in a probabilistic way using the continuous ranked probability skill score, Brier skill score, reliability diagrams and the potential economic value. We saw that the performance of the methods depended on the time of the day, the forecast lead time, the geographical location, the season and the weather situation (clear sky or cloudy). In general all methods performed similarly, but depending on the verification measure there are some methods that performed better than others. According to the continuous ranked probability skill score, quantile regression and generalized random forests performed slightly better and the gamma and truncated normal distributions slightly worse than the other methods. For the Brier skill score there was not a clear best performing method, but we saw that clear sky conditions are better forecast by tree-based methods than stepwise-based methods. Also the tree-based methods are more reliable, according to the reliability diagrams. Next, the probabilistic forecasts were compared with the deterministic raw forecast using the potential economic value to show the added value of uncertainty information in the probabilistic forecasts. All methods achieved both higher potential economic values and positive potential economic values over a wider cost-loss ratio range than the raw forecast. Therefore the probabilistic forecasts are very useful for users of solar energy. They can make better decisions based on the improved accuracy and information about the uncertainty.
Future research could focus on more complex economic models for making a comparison between the methods. Another research direction is to investigate other distributions than the ones tried in this paper or other non-parametric regression methods. Furthermore, the effect of more potential predictors (for example diffuse radiation) could be investigated. Also predictors taken from the (relatively new) HA ensemble prediction system could be used, such as the ensemble mean or ensemble variance of the global radiation forecast, and would likely be better predictors than those from the deterministic HA forecasts.
The methods presented here are all based on techniques of statistical post-processing. A different, physical approach is to improve the NWP models to lead to better predictors. This could be for example improving the representation of clouds and aerosols in the model. Then the forecasts of global radiation will also improve. However, statistical post-processing is still expected to improve skill of those forecasts.

Acknowledgements

We thank Jan Fokke Meirink (KNMI) for providing the aerosol forecasts from CAMS, consisting of the aerosol optical depth, Ångström exponent and ozone. We thank Jason Frank (Utrecht University), Wim de Rooy, Jan Barkmeijer, Sander Tijm and Sibbo van der Veen (all from KNMI) for their help in the study described in this paper.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declarations of interest: none

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
352832
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description