Comparison of statistical postprocessing methods for probabilistic NWP forecasts of solar radiation
Abstract
The increased usage of solar energy places additional importance on forecasts of solar radiation. Solar panel power production is primarily driven by the amount of solar radiation and it is therefore important to have accurate forecasts of solar radiation. Accurate forecasts that also give information on the forecast uncertainties can help users of solar energy to make better solar radiation based decisions related to the stability of the electrical grid. To achieve this, we apply statistical postprocessing techniques that determine relationships between observations of global radiation (made within the KNMI network of automatic weather stations in the Netherlands) and forecasts of various meteorological variables from the numerical weather prediction (NWP) model HARMONIEAROME (HA) and the atmospheric composition model CAMS. Those relationships are used to produce probabilistic forecasts of global radiation. We compare 7 different statistical postprocessing methods, consisting of two parametric and five nonparametric methods. We find that all methods are able to generate probabilistic forecasts that improve the raw global radiation forecast from HA according to the root mean squared error (on the median) and the potential economic value. Additionally, we show how important the predictors are in the different regression methods. We also compare the regression methods using various probabilistic scoring metrics, namely the continuous ranked probability skill score, the Brier skill score and reliability diagrams. We find that quantile regression and generalized random forests generally perform best. In (near) clear sky conditions the nonparametric methods have more skill than the parametric ones.
keywords:
Model Output Statistics (MOS), probabilistic forecasting, solar radiation, Numerical Weather Prediction (NWP)1 Introduction
Forecasts of meteorological variables are produced by projecting an initial state of the atmosphere into the future using a numerical weather prediction (NWP) model. In the projection there are unavoidable errors related to uncertainty in the initial condition and the physical parameterizations. This leads to uncertainty in the forecasts that can be quantified by generating a probabilistic forecast. In the case of solar radiation, this gives useful information about the uncertainty in power production of solar panels, as it is almost directly related to the uncertainty in forecasts of solar radiation. This helps users of solar energy make better solar radiation based decisions related to the stability of the electrical grid.
A considerable amount of work has already been done on generating probabilistic forecasts of global radiation (i.e. the total shortwave radiation on a horizontal surface received from a solid angle of 2 steradians (unit: W/m2)). For example Lorenz et al. (2009) produced 95% confidence intervals on errors in the clear sky index (CSI, defined as global radiation divided by clear sky radiation) by fitting a polynomial function to the standard deviation of the errors as a function of the forecast CSI and the solar zenith angle. Verzijlbergh et al. (2015) fitted a normal distribution to the CSI errors using NWP output from the GFS model of NCEP. Other distributions were also used for forecasting solar radiation, such as the beta and two sided power distribution in Fatemi et al. (2018) and the gamma distribution in Bracale et al. (2013). They both use observations of various meteorological variables to forecast radiation.
Massidda and Marrocu (2018) produced probabilistic forecasts by fitting quantiles of the underlying distribution of radiation, called quantile regression, using deterministic and probabilistic NWP output of ECMWF. They also extended this to a method called gradient boosting decision trees. Almeida et al. (2015) use quantile regression forests to forecast quantiles of the distribution of solar power using NWP output from the WRF model run at Meteogalicia. Neural networks were used by Galván et al. (2017) to forecast confidence intervals of solar radiation using NWP output from the GEFS model of NCEP.
In recent literature, there are some reviews that give an overview of papers using postprocessing methods to produce probabilistic forecasts of solar radiation or solar power production. For example Antonanzas et al. (2016) discussed papers using neural networks, random forests, support vector machines and nearest neighbours for forecasting solar power. Van der Meer et al. (2018) reviewed papers using quantile regression, quantile regression forests, Gaussian processes, bootstrapping, lower upper bound estimate, gradient boosting, kernel density estimation, nearest neighbours and an analog ensemble for forecasting solar power. Voyant et al. (2017) discussed papers using linear regression, generalized linear models, neural networks, support vector machines, decision tree learning, nearest neighbours and Markov chains for forecasting solar radiation. This last review is oriented towards deterministic forecasts, whereas the first two focus on probabilistic forecasts.
There are relatively few papers in which a comparison among multiple regression methods on the same data set for solar radiation or solar power forecasting is a primary goal. David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used only past observations of global radiation and the CSI as predictors. Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and lasso and ridge regression. They used ECMWF output variables as predictors in the regression methods. Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests with NWP output from the PEARP model at Météo France as predictors. Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with observed CSI as the only predictor.
In this paper we generate probabilistic forecasts of global radiation by applying a statistical postprocessing technique called model output statistics (MOS, Glahn and Lowry (1972)). This consists of fitting relationships between various NWP variables and CSI observations and using the relationships for producing skillful and reliable probabilistic forecasts. We make a comparison of probabilistic forecasts using an extensive list of regression methods, consisting of both parametric methods (gamma and truncated normal distribution) and nonparametric methods. The nonparametric methods consist of quantile regression, quantile regression forests, gradient boosting decision trees, and the relatively new generalized random forests and quantile regression neural networks. The methods are chosen to include the most commonly used methods and some (relatively) new methods, that take both linear and nonlinear relationships into account. To our knowledge we are the first to make such an extensive comparison of probabilistic forecasting methods in a MOS setting. We use a large number of potential predictors, consisting of output from the highresolution nonhydrostatic NWP model HARMONIEAROME (HA, running at KNMI) and the atmospheric composition model CAMS (Copernicus Atmosphere Monitoring Service, running at ECMWF). This is a second novel aspect of this work, as previous work has not explored the benefit of including both NWP (here HA) and CAMS predictors, to our knowledge. We investigate the importance of the predictors in the fitted relationships. Additionally, the probabilistic forecasts are verified and compared with multiple scoring metrics to see which method performs best in which situation. These include the continuous ranked probability skill score, Brier skill score and reliability diagrams (e.g. Wilks (2011)). We also show the improvement with respect to the raw forecast (the global radiation forecast from HA) with the root mean squared error skill score and the potential economic value (e.g. Richardson (2000)).
2 Data and Methods
In this section we first describe the observations and model data that we used, including transformations applied to the model data. Next, we give an overview of the regression methods used to model the relationships between forecasts of weather variables (the predictors) and CSI observations (the predictand). The procedure we used for fitting is described and after that we describe verification measures to verify the methods.
2.1 Observations
The meteorological variable we study in this paper is the global radiation. We used hourly average observations of global radiation measured at 30 weather stations in the Netherlands by the KNMI network of automatic weather stations for the period of April 1, 2016 to March 16, 2018 (KNMI (2018)). The KNMI network of solar radiation measurements consists of unventilated Kipp & Zonen CM11 pyranometers. The calibration of the instruments is traceable to the World Radiometric Reference (WRR). The instruments are sampled every 12s and 1 hr averages are used for the analysis presented here. Routine quality control procedures are applied based on intercomparison of stations and clearsky comparisons. The achieved uncertainty is within the limits of “network operations” as defined by the WMO CIMO guide (8% for hourly totals, 95% confidence level; WMO (2017)). We transformed the global radiation into CSI by dividing them by the hourly average clear sky radiation. This removes the large seasonal and diurnal cycle of radiation. The clear sky radiation is calculated using the European Solar Radiation Atlas (ESRA) model (Rigollier et al. (2000)), with as inputs the solar zenith angle (Michalsky (1988)) and the Linke turbidity factor (monthly average values for De Bilt are used from Remund et al. (2003)). The locations of the stations and the seasonally averaged observed CSI are shown in Figure 1. For inland stations the seasonally averaged observed CSI is slightly lower than for coastal stations in spring and summer due to convection. In winter and autumn the pattern is quite homogeneous.
Figure 2 shows daily courses of the fraction of CSI observations strictly greater than 0.8 (corresponding to low cloud cover conditions, including clear skies) for the four seasons and for coastal (less than 25 km from the coast) and inland (more than 25 km from the coast) stations separately. At the inland stations during spring, summer and fall the fraction drops during the day a few hours after sunrise. This is likely due to convection with a (secondary) maximum at the end of the day caused by the disappearance of shallow convection. Inland stations during winter and coastal stations in all seasons show a different diurnal cycle with a similar evolution between a few hours after sunrise and a few hours before sunset and a (secondary) maximum around 12 UTC. In spring and summer the fraction of low cloud cover events is higher at the coastal stations than at the inland stations due to less convective events.
2.2 Model data
The model data that we used as the potential predictors for the regression methods consist of forecasts of atmospheric temperature, humidity and different cloud, radiation and aerosol variables (Table 2). Most of the predictors (marked with *) are NWP output of HA (Bengtsson et al. (2017)). The aerosol predictors (marked with ** ) are forecasts produced by CAMS (Flemming et al. (2017)). Because HA uses climatological averages for the aerosols, we expect the aerosol forecasts as predictors to have added value in fitting the relationships (see also Alfadda et al. (2018)).
We used hourly output from HA for lead times 048 hours at the grid points closest to the stations (in the same date range as the observations, April 1, 2016 to March 16, 2018). The CAMS lead times are 3hourly and therefore we take the lead time closest to the corresponding time of the observations. The CAMS forecasts are later available than HA and therefore we use them for lead times of 2472 hours from the previous day. We used CAMS data for the same period as the HA and observation data. More information about HA and CAMS can be found in Table 1.
HA  CAMS  
Version(s)  38  43r3, 43r1, 41r1 
Type of forecasts  Deterministic  
initial time  0 UTC  
Time domain  048 hours  2472 hours 
Time resolution  hourly  3hourly 
Spatial domain  Western Europe (800x800 grid points)  Global 
Spatial resolution  2.5km x 2.5km  x 
Vertical levels  65  60 


For each of the potential predictors with ’layers’ as a subscript, we computed them for the lower (02000 m), middle (20006000 m) and upper atmosphere (6000 m and higher), and also an aggregated version of the total atmosphere (except for temperature and relative humidity, where the total atmosphere component is missing). The HA output consists of 65 levels in the atmosphere and for the cloud water we applied a weighted average (with weights equal to the distances between the levels) to reduce these to the four layers (lower, middle, upper and total atmosphere). For temperature and humidity we used HA output from 11 pressure levels and reduced these to the three layers (lower, middle and higher part of the atmosphere) by applying a weighted average. The precipitable water is computed using temperature and humidity from the 11 pressure levels together with the method outlined by McRae (1980) and applying a weighted average.
We transformed all radiation predictors into hourly average CSI values in the same way as the observations. The rain predictor values are precipitation sums over each hour. For all other potential predictors the valid time is used.
We also calculated time/place predictors, which are used as additional potential predictors. These consist of the latitude and longitude of the station locations, the day of the year, the distance to the coast (minimum Euclidean distance to the sea), distance to water (minimum Euclidean distance to the sea or one of the lakes), distance to inland (the Euclidean distance to the intersection point of the borders of the Netherlands, Belgium and Germany) and the cosine of the solar zenith angle.
2.3 Regression methods
We generated probabilistic forecasts using regression methods that find relationships between the predictors and the predictand. The relationships help in understanding how the value of the predictand changes when any one of the predictor values change. Then, one can quantify the uncertainty in the predictand in terms of the predictors and form the probabilistic forecast. We fit for 7 different regression methods and they are described in more detail in the following subsections.
2.3.1 Parametric regression
Parametric regression methods assume a specific predictive distribution with parameters (location, scale, skewness and kurtosis). The parameters are first chosen as constant values and then fitted sequentially (one by one, each new one using the already fitted ones) with linear relationships:
(1) 
where , are the predictors and are the regression parameters. The optimal linear relationship is found by minimizing the loglikelihood function (i.e. the negative logarithm of the forecast probability density function).
We avoid overfitting by using only a selection of the predictors in the model. The selection that gives the best fit is found by forward (adding one predictor) and backward (removing one predictor) stepwise selection that aims to minimize the Akaike Information Criterion (AIC). The stepwise procedure is applied separately for each distribution parameter.
We apply this regression method in the programming language R using the gamlss package (Rigby and Stasinopoulos (2018)) for two distributions: the gamma distribution (GA) and the truncated normal distribution (NOTR). The truncated normal distribution is formed by left truncation at zero of the normal distribution (Rigby and Stasinopoulos (2005)) and has therefore, just like the gamma distribution, only probability mass for positive values, which suits for forecasting radiation.
2.3.2 Quantile regression
Quantile regression fits a number of quantiles from the underlying distribution separately by assuming the linear relationship:
(2) 
The optimal linear relationship is found by minimizing the weighted sum of residuals (WSR, defined as observations  fitted quantiles), with weights equal to and for respectively positive and negative residuals (Koenker (2005)). Regularization to prevent overfitting was not successful in our study and therefore we apply the stepwise procedure, this time with the aim of minimizing the average over the AIC values for the different quantiles. Consequently, all quantiles are fit with the same predictors. It occasionally happens that negative values are forecast and in that case we set them to zero. Also, we solve crossings between quantiles by sorting them in ascending order. Lastly, some small random noise, generated from a normal distribution with mean zero and a variance of 0.001, is added to the predictors to prevent the matrix with the predictors from becoming singular. Quantile regression (QR) is applied in R using the quantreg package (Koenker (2018)).
2.3.3 Quantile regression neural network
Quantile regression neural networks are a nonlinear extension of quantile regression. It can be visualized by an input layer (the predictors), one or more hidden layers and an output layer (the forecast for one quantile). In the hidden layers we adjust the input towards the output by multiplication with weights, adding some bias and applying a nonlinear activation function, taken here as the default . The optimal weights and bias are found by minimizing the WSR in a boosting fashion where the negative gradient is followed for a number of iterations. More details can be found in Cannon (2011).
Regularization to prevent overfitting was not successful and therefore we apply the stepwise procedure. Further, negative forecasts are set to zero. Crossings between quantiles are solved by adding the quantiles as a predictor in the input layer with a monotonically increasing constraint (Cannon (2018a)). This is called the monotone composite extension of quantile regression neural network (MCQRNN) and we apply it in R using the qrnn package (Cannon (2018b)).
2.3.4 Random forests
Random forests have binary regression trees (Breiman et al. (1984)) as building blocks. A binary regression tree is formed by splitting the data set into subsets by applying thresholds on the predictors. Each split is chosen to minimize the variance in the predictand. The splitting is repeated until some stopping criterion is reached, such as a minimum number of predictand values left in each subset. A random forest is a collection of trees (Breiman (2001)) and reduces the chance for overfitting by constructing each tree on a different (bootstrap sampled) subset of the data. To make the trees even more independent, at each split only a (bootstrap sampled) subset of the predictor set is used. Probabilistic forecasts are made by looking at the underlying distribution of the predictand values in the terminal nodes of all trees and drawing quantiles from it (Meinshausen (2006)). This method is called quantile regression forests (QRF) and is applied in R using the quantregForest package (Meinshausen (2017)).
Generalized random forests (GRF) is very similar to QRF. Each split is now chosen to maximize the difference between the underlying distributions of the predictand values (Athey et al. (2016)). Also, whereas QRF uses bootstrap sampling with replacement, GRF uses bootstrap sampling without replacement. GRF is applied in R using the grf package (Tibshirani et al. (2018)).
2.3.5 Gradient boosting decision trees
Gradient boosting decision trees boosts the fitted quantiles (separately) towards the actual quantiles of the underlying distribution by following the negative gradient of the WSR for a number of iterations. The negative gradients are estimated using regression trees. Then in each iteration the fits are updated by adding the estimated negative gradients multiplied with the learning rate (Friedman (2001)). The learning rate is generally chosen to be small () to prevent overshooting the minimum in the WSR. To reduce the chance for overfitting, each tree is constructed on a (bootstrap sampled) subset of the data (Friedman (2002)). Consequently, in each iteration we only improve the fitted quantile for the predictand values in the chosen subset. Further, negative values are set to zero and crossings between quantiles are solved by sorting them in ascending order. Gradient boosting decision trees (GBDT) is applied in R using the gbm package (Ridgeway (2018)).
2.4 Fitting procedure
We fitted relationships between the potential predictors and the predictand using the 7 methods. The fitting was done per lead time, because systematic errors increase with lead time, and for all stations at once. We split the data set into a training (for model fitting) and testing (for verification) set. The testing set consists of 1/3 of the data set containing consecutive dates and the rest of the data are in the training set. This leads to around 3200 training and 1600 testing cases for each fit when we fit to each season separately. We applied a 3fold crossvalidation. For each lead time we only take combinations of dates and stations during daylight into account, defined as a clear sky radiation higher than 20 W/m. Only for lead times +5h till +19h and +29h till +43h there are enough (defined by at least 50) combinations of dates and stations during daylight to make the fit. For each method, we generated probabilistic forecasts consisting of 49 quantiles, distributed evenly between 0.02 and 0.98 with a gap of 0.02 between them.
We can either fit the methods on the whole year or on the four meteorological seasons separately, defined by the Winter (December, January, February), the Spring (March, April, May), the Summer (June, July, August) and the Autumn (September, October, November). As shown in Bakker (2019), fitting for each season separately produces better fits and is therefore used in producing the results. Also averaging the potential predictors from HA over 3 lead times and over a 9x9 block of grid points gives better fits. Therefore, we apply both the temporal and spatial smoothing to our potential predictors. Next, the optimal hyperparameter settings for each method can be found in Bakker (2019) (only the optimal settings for MCQRNN have changed) and are summarized in Table 3.
GA  NOTR  QR  MCQRNN  

#steps ( or quantiles)  5  5  5  5 
#steps ()  1  1     
#hidden layers        1 
#deep hidden layers        0 
Iterations        10 
Penalty        0 
QRF  GRF  GBDT  

#trees  500  500  100 
Minimal nodesize  5  5  5 
Case sampling fraction  1/2  1/2  1/2 
Predictor sampling fraction  1/3  1/3   
Depth of trees      1 
Learning rate      0.1 
2.5 Scoring metrics
To quantify the performance of the regression methods, we calculated scoring metrics over the forecasts (e.g. Wilks (2011)). First, the medians of the forecasts are verified using the root mean squared error (RMSE):
(3) 
where are the CSI observations (for the same season and (valid) time as the forecasts) and the forecasts, with ’med’ denoting the 0.5 quantile. The RMSE is transformed to a skill score using the RMSE of the sample climatology (the mean of the observations):
(4) 
For verifying the probabilistic forecasts we used the continuous ranked probability score (CRPS):
(5) 
We calculated the continuous ranked probability skill score (CRPSS) using Equation (4) with the RMSE replaced by the CRPS. The sample climatology consists in this case of quantiles drawn from the observations to generate a sample climatological distribution.
For verification it is also informative to look at probabilities of not exceeding a certain threshold of CSI. Therefore we have binary observations () and forecast probabilities (), which are verified using the Brier score (BS):
(6) 
The skill is measured by the Brier skill score (BSS), calculated using Equation (4) with the RMSE replaced by the BS, where the sample climatology is defined as the mean of the binary observations.
We also produced reliability diagrams that compare forecast probabilities with observed relative frequencies. A perfect reliable forecast has a 1:1 correspondence between them.
The potential economic value measures the potential economic impact of the forecasts and is defined by a simple costloss model described in Richardson (2000). It looks at the event of not exceeding a certain threshold of CSI with the associated contingency table (Table 4). A hit and false alarm lead to some costs , a miss to a loss and with a correct rejection there are no costs or losses.
Observed  Not observed  

Forecast  Hit  False alarm 
Not forecast  Miss  Correct rejection 
The potential economic value (PEV) is calculated as:
(7) 
where and are the hit, false alarm and miss frequencies and ORF the observed relative frequency.
3 Results
In this section we first investigate how important the predictors are for the different methods. Subsequently, we show the performance of the methods according to all scoring metrics defined in Section 2.5.
3.1 Importance of predictors
We have 34 potential predictors and some of them are more important than others. In Tables 5 and 6 we list the 10 most important predictors for the stepwisebased and treebased methods, respectively. For the stepwisebased methods (GA, NOTR, QR and MCQRNN) the ranking is based on the amount of times the predictors are chosen in the procedure. For the treebased methods (QRF, GRF and GBDT) the ranking is based on the improvement the predictors have on the fitted trees, considered over all splits where the corresponding predictor is chosen. This improvement is defined slightly differently in QRF, GRF and GBDT and more details can be found in Breiman et al. (2018), Tibshirani et al. (2018) and Ridgeway (2018). The final ranking is then the average improvement over all trees in all fits. Due to the different definitions of the importance measure between the stepwisebased and treebased methods, we cannot directly compare the resulting rankings between these methods.
Rank  GA  NOTR  QR  MCQRNN 

1  
2  
3  
4  
5  
6  
7  
8  
9  
10 
Rank  QRF  GRF  GBDT 

1  
2  
3  
4  
5  
6  
7  
8  
9  
10 
We see that the raw HA global radiation forecast () is the most important predictor for almost all methods. This is what one would expect, because already accounts for all kinds of effects from other HA variables. It does however not account optimally for these effects and consequently other predictors are also important, such as direct radiation, relative humidity, cloud cover and cloud water. The relative humidity is more important than the direct radiation in the stepwisebased methods, whereas for the treebased methods it is the opposite way. This can be explained by the fact that if a stepwisebased method selects global radiation as a predictor, then the direct radiation becomes less useful as a predictor, due to the high correlation between both. In that case the relative humidity becomes more important. For the treebased methods this effect is less apparent due to the independence between different branches of the trees. Therefore, global radiation can be important in one branch and direct radiation in another branch, and consequently they are both important considering the entire tree.
After , the direct radiation and relative humidity, cloud predictors such as or follow in most methods. Sometimes the cloud predictors are even higher in the ranking. The aerosol predictors from CAMS are quite important in the stepwisebased methods, with as most important of the three. For the treebased methods the aerosol predictors are of medium importance seen over all trees, but they can still be useful in certain branches of the trees. The aerosol predictors are in general of lower importance than cloud related predictors. This can be explained by the fact that clouds block on average more incoming radiation than aerosols and are consequently more responsible for the uncertainty in forecast global radiation. For the predictors with multiple layers, the lower component is in general more important than the middle or higher component. Therefore the lower atmosphere has more influence on global radiation.
Furthermore, it is good to note that although some predictors are more important than others, they are all used (seen over all fits) in forecasting global radiation. Also, the treebased methods have the advantage over the stepwisebased methods that they make use of all predictors in all fits, whereas for the stepwisebased methods we set a limit on the amount of predictors to prevent overfitting.
3.2 Verification of deterministic forecasts
We computed the RMSE_SS for all lead times, all stations, the four seasons and the different methods and in Figure 3 we plot the RMSE_SS against the lead time for the four seasons. All methods are more skillful than the raw forecast and have therefore an increased accuracy. The RMSE_SS with respect to the raw forecast ranges between 23% in winter, 57.5% in spring and 7.510% in summer and fall. For lead times close to the night, the skill drops due to the fact that the forecasts get closer to the sample climatology (because both approach zero) and the latter becomes therefore harder to beat. The second forecast day shows lower RMSE_SS than the first day as expected.
In winter and spring the RMSE_SS is relatively stable during the day. Only QRF and MCQRNN perform slightly worse than the other methods in respectively winter and spring. In summer and autumn we see that the skill scores drop during the day, which might be caused by convection (see Figure 2) as it is difficult to forecast by NWP models. The highest skill scores are reached in the morning. In summer, MCQRNN performs slightly worse than the other methods. In autumn, all methods perform similarly according to the RMSE_SS.
3.3 Probabilistic forecasts for selected cases
Three interesting cases are shown for respectively a clear sky, fully overcast and partly overcast day at station Cabauw (see Figure 1). First, we look at the case of April 9, 2017, which was a day without clouds. We compare the raw forecast and postprocessed forecasts for lead times +6h till +18h with the observations for 618 UTC in Figure 4. Two methods are chosen, one parametric (GA) and one nonparametric (QRF), for which the median and the forecast range between the lowest and highest quantile (the gray area) are plotted. We see that the raw forecast is already very accurate and this can be seen on most clear sky days. The raw forecast has a small negative bias around 12 UTC and both GA and QRF correct this in the median, but with a small positive bias, especially for GA. Looking at the uncertainty, we see that QRF is much more certain than GA, with a bandwidth of around 100 W/m versus 200 W/m.
For a fully overcast day we take December 31, 2016. In Figure 5 we compare GA and QRF with the raw forecast for lead times +9h till +15h. We see that the raw forecast clearly underestimates the amount of radiation. Both GA and QRF correct for this bias almost completely in the median. QRF generally keeps a small negative bias, while GA has a small positive bias after 13 UTC. Both methods have bandwidths for the uncertainty ranging approximately between 50 and 100 W/m. This indicates that the methods are very certain that it will be a fully overcast day. GA is slightly more uncertain about the amount of radiation than QRF. The distributions of both GA and QRF are asymmetrical and skewed towards higher (than the median) amounts of radiation, which is (physically) realistic in this case, given that the lower bound of radiation should be strictly greater than zero in the middle of the day.
Finally, as partly overcast day we take April 23, 2017. In Figure 6 we compare GA and QRF with the raw forecast for lead times +6h till + 19h. In this case the observations are varying more frequently during the day and none of the forecasts are able to forecast these variations. However, the medians of both GA and QRF are closer to the observations than the raw forecast and the median of QRF is better than the median of GA. The uncertainty bandwidths are now indicating that every weather condition between fully overcast and completely sunny is possible. For GA, the uncertainty bandwidth reaches above the clear sky radiation (CSR), due to the assumptions made for that method.
3.4 Verification of probabilistic forecasts
For verifying the probabilistic forecasts, we show the results from multiple scoring metrics in the following subsections to compare the performance of the different methods.
3.4.1 Continuous ranked probability skill score
Starting with the CRPSS, in Figure 7 we plot the CRPSS over the lead times for the different methods in the four seasons. All methods are less skillful closer to the night than during the day due to the fact that the forecasts get closer to the sample climatology (because both approach zero) and the latter becomes therefore harder to beat. The second day shows as expected less skill than the first day. All methods have similar performance with skill scores around 0.4 on day 1 and 0.30.35 on day 2. In spring and summer the methods have more skill than in winter and autumn. In winter and spring the skill is relatively stable during the day, while in summer and autumn the skill is best in the morning and decreases during the day. The difficulty in forecasting convection might be the cause of this. In spring and summer, GA performs worse than the other methods, while in winter and autumn the difference in performance between the methods is smaller. Overall, QR and GRF perform best in terms of the average CRPSS.
It is also informative to look at the spatial patterns of the CRPSS and therefore we plot the maximum value for each station for lead time +12h in Figure 8. The maxima range between 0.3 and 0.47. We see that the method reaching the maximum varies between the seasons and the stations, only NOTR never performs best. In winter QR is most skillful at most stations, while in spring it is GRF. In summer and autumn it is less clear which method is most skillful overall. Although the skill varies between stations, there is not a clear spatial pattern. There are some seasonal variations, for example in winter and autumn the stations near the coast have mostly less skill than stations more inland. This might occur because convection over sea causes more clouds over stations near the coast.
3.4.2 Brier skill score
To investigate the performance of the methods for different CSI values, we plot in Figure 9 the BSS versus the CSI threshold for the four seasons at lead time +12h. The BSS increases as a function of CSI in spring and summer, while in winter it decreases again for higher CSI. In autumn the BSS is stable around 0.4 for CSI values higher than 0.3. The BSS increase in spring and summer is probably caused by the higher predictability of clear sky versus convective situations. While the differences between the methods are small for lower CSI values, it is interesting to note that for thresholds higher than 0.7 (corresponding to (near) clear sky conditions), there is a bifurcation in the skill, especially in spring and summer. The treebased methods and QR show higher BSS than the other methods. Consequently, the global radiation under (near) clear sky conditions is better forecast by the treebased methods and QR. For treebased methods, this can be explained by the fact that trees can create separate branches for clear sky cases. QR fits each quantile separately and can therefore better narrow the distribution in clear sky situations to predict those situations more accurately.
3.4.3 Reliability diagrams
To access the reliability of the methods, in Figure 10 the reliability diagrams for CSI thresholds 0.2, 0.5 and 0.9 for lead time +12h in summer are shown. For the threshold 0.2 all methods are unreliable for high forecast probabilities, due to a low amount of cases. For the low forecast probabilities the treebased methods are very reliable and almost on top of the diagonal line. The stepwisebased methods are slightly less reliable. For the threshold 0.5 all methods are very reliable and there is no method that clearly performs better or worse than the rest of the methods. For the threshold 0.9, all methods are reliable for the high forecast probabilities, but for lower forecast probabilities the treebased methods are again most reliable. The parametric methods are least reliable and QR and MCQRNN are in between. In general the treebased methods are most reliable, especially under clear sky conditions.
3.4.4 Potential economic value
To further study the behaviour for different CSI values, in Figure 11 we plot the potential economic value (PEV) versus the costloss ratio for CSI thresholds 0.2, 0.5 and 0.9 at lead time +12h in summer. For all thresholds, the methods show higher PEV values than the raw forecast and show positive PEV values over a wider costloss ratio range (the full range between 0 and 1 for thresholds 0.5 and 0.9). For threshold 0.2, the treebased methods perform worse than the stepwisebased methods for costloss ratios above 0.4. For the threshold 0.5, all methods have equal performance. For the threshold 0.9, the treebased methods and QR perform slightly better than the other methods for costloss ratios below 0.2.
4 Discussion
The performance of the methods presented here depends on the lead time, the station location and the season. Nevertheless, in general all methods had similar performance. It also depends on the scoring metrics, that focus on different aspects, like accuracy and reliability, and on deterministic or probabilistic forecasts. We often saw that quantile regression and generalized random forests performed best and that the treebased methods outperformed the stepwisebased methods in (near) clear sky conditions. In the discussion presented here we compare our results with results from other papers, where comparing methods is the primary goal, and look for differences and similarities.
David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used endogenous data consisting of past observations of global radiation and CSI as predictors for forecasting CSI. They came to the conclusion that all methods performed very similarly, in agreement with our results. They based this on verification measures including the CRPS, reliability diagrams and rank histograms. They found (weighted) quantile regression and gradient boosting decision trees as most efficient methods for generating probabilistic forecasts in terms of complexity and computational time. They concluded that neural networks are the least efficient with a large computational time and requiring sufficient scientific background for good performance.
Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and both lasso and ridge regression. They used NWP output from the ECMWF as predictors to forecast solar power production. They found gradient boosting decision trees as best performing in terms of the RMSE and mean absolute error (MAE). In our study, gradient boosting decision trees performed well, but not the best, according to the RMSE.
Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests. They used NWP output from the PEARP model at Météo France. They forecast solar power production and verify the forecasts using the CRPS and rank histograms. They found that the best performing method depends on the specific settings and concluded that quantile regression and quantile regression forests performed similar in general, which is in agreement with our results.
Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with past CSI observations as predictor set. They generate prediction intervals for the CSI and verify them using the gamma test, which is based on the mean interval length and prediction interval coverage probability. The authors found that gradient boosting decision trees performed best. In our study, quantile regression forests and gradient boosting decision trees performed almost similar.
5 Conclusions
In this paper we compared 7 regression methods. Using these methods, we produced probabilistic forecasts of global radiation with the raw forecast (the deterministic global radiation forecast from HA) together with other relevant meteorological variables as potential predictors. We saw that all potential predictors were used, and that global radiation, direct radiation, relative humidity, cloud cover and cloud water were most important. The aerosol predictors from CAMS were less important than the radiation and cloudbased predictors from HA, but they were still often selected, particularly the aerosol optical depth in the stepwisebased methods. Aside from that, the treebased methods have the advantage of using all predictors in all fits, whereas for the stepwisebased methods the amount of predictors used in a fit is limited. We verified the forecasts in a deterministic way using the root mean squared error skill score. All methods had an improvement between 2% and 10% upon the raw forecast, depending on the season and lead time. In visualizing the probabilistic information in the forecasts we looked at three interesting cases with different weather conditions: a clear sky, fully overcast and partly overcast day. In all cases the medians of the forecasts were close to the observations, but the uncertainty in forecast radiation differed between the methods (mostly on the clear sky and partly overcast day). Next, we verified the methods in a probabilistic way using the continuous ranked probability skill score, Brier skill score, reliability diagrams and the potential economic value. We saw that the performance of the methods depended on the time of the day, the forecast lead time, the geographical location, the season and the weather situation (clear sky or cloudy). In general all methods performed similarly, but depending on the verification measure there are some methods that performed better than others. According to the continuous ranked probability skill score, quantile regression and generalized random forests performed slightly better and the gamma and truncated normal distributions slightly worse than the other methods. For the Brier skill score there was not a clear best performing method, but we saw that clear sky conditions are better forecast by treebased methods than stepwisebased methods. Also the treebased methods are more reliable, according to the reliability diagrams. Next, the probabilistic forecasts were compared with the deterministic raw forecast using the potential economic value to show the added value of uncertainty information in the probabilistic forecasts. All methods achieved both higher potential economic values and positive potential economic values over a wider costloss ratio range than the raw forecast. Therefore the probabilistic forecasts are very useful for users of solar energy. They can make better decisions based on the improved accuracy and information about the uncertainty.
Future research could focus on more complex economic models for making a comparison between the methods. Another research direction is to investigate other distributions than the ones tried in this paper or other nonparametric regression methods. Furthermore, the effect of more potential predictors (for example diffuse radiation) could be investigated. Also predictors taken from the (relatively new) HA ensemble prediction system could be used, such as the ensemble mean or ensemble variance of the global radiation forecast, and would likely be better predictors than those from the deterministic HA forecasts.
The methods presented here are all based on techniques of statistical postprocessing. A different, physical approach is to improve the NWP models to lead to better predictors. This could be for example improving the representation of clouds and aerosols in the model. This would also improve the forecasts of global radiation. However, statistical postprocessing is still expected to improve skill of those forecasts.
Acknowledgements
We thank Jan Fokke Meirink (KNMI) for providing the aerosol forecasts from CAMS, consisting of the aerosol optical depth, Ångström exponent and ozone. We thank Jason Frank (Utrecht University), Wim de Rooy, Jan Barkmeijer, Sander Tijm and Sibbo van der Veen (all from KNMI) for their help in the study described in this paper.
This research did not receive any specific grant from funding agencies in the public, commercial, or notforprofit sectors.
Declarations of interest: none
References
 Alfadda et al. (2018) Alfadda, A., Rahman, S., Pipattanasomporn, M., 2018. Solar irradiance forecast using aerosols measurements: A data driven approach. Solar Energy 170, 924–939. https://doi.org/10.1016/j.solener.2018.05.089.
 Almeida et al. (2015) Almeida, M., Perpiñán, O., Narvarte, L., 2015. Pv power forecast using a nonparametric pv model. Solar Energy 115, 354–368. https://doi.org/10.1016/j.solener.2015.03.006.
 Antonanzas et al. (2016) Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., MartinezdePison, F., AntonanzasTorres, F., 2016. Review of photovoltaic power forecasting. Solar Energy 136, 78–111. http://dx.doi.org/10.1016/j.solener.2016.06.069.
 Athey et al. (2016) Athey, S., Tibshirani, J., Wager, S., 2016. Generalized random forests. The Annals of Statistics https://arxiv.org/pdf/1610.01271.pdf.
 Bakker (2019) Bakker, K., 2019. Improving solar radiation forecasts using advanced statistical postprocessing methods. https://dspace.library.uu.nl/handle/1874/374529.
 Bengtsson et al. (2017) Bengtsson, L., Andrae, U., Aspelien, T., Batrak, Y., Calvo, J., De Rooy, W., Gleeson, E., HansenSass, B., Homleid, M., Hortal, M., Ivarsson, K., Lenderink, G., Niemelä, S., Pagh Nielsen, K., Onvlee, J., Rontu, L., Samuelsson, P., Santos Muñoz, D., Subias, A., Tijm, S., Toll, V., Yang, X., Ødegaard Køltzow, M., 2017. The harmoniearome model configuration in the aladinhirlam nwp system. Monthly weather review 145, 1919–1935. https://doi.org/10.1175/MWRD160417.1.
 Bracale et al. (2013) Bracale, A., Caramia, P., Carpinelli, G., Rita Di Fazio, A., Ferruzzi, G., 2013. A bayesian method for shortterm probabilistic forecasting of photovoltaic generation in smart grid operation and control. Energies 6, 733–747. https://doi.org/10.3390/en6020733.
 Breiman (2001) Breiman, L., 2001. Random forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324.
 Breiman et al. (2018) Breiman, L., Cutler, A., Liaw, A., Wiener, M., 2018. randomForest: Breiman and Cutler’s Random Forests for Classification and Regression. R package version 4.614, https://CRAN.Rproject.org/package=randomForest.
 Breiman et al. (1984) Breiman, L., Friedman, J., Stone, C., Olshen, R., 1984. Classification and Regression Trees. Taylor & Francis.
 Cannon (2011) Cannon, A., 2011. Quantile regression neural networks: Implementation in r and application to precipitation downscaling. Computers & Geosciences 37, 1277–1284. https://doi.org/10.1016/j.cageo.2010.07.005.
 Cannon (2018a) Cannon, A., 2018a. Noncrossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stochastic Environmental Research and Risk Assessment https://www.researchgate.net/publication/326007117_Noncrossing_nonlinear_regression_quantiles_by_monotone_composite_quantile_regression_neural_network_with_application_to_rainfall_extremes.
 Cannon (2018b) Cannon, A., 2018b. qrnn: Quantile regression neural networks. R package version 2.0.3, https://cran.rproject.org/package=qrnn.
 David et al. (2018) David, M., Aguiar Luis, M., Lauret, P., 2018. Comparison of intraday probabilistic forecasting of solar irradiance using only endogenous data. International Journal of Forecasting 34, 527–547. https://doi.org/10.1016/j.ijforecast.2018.02.003.
 Fatemi et al. (2018) Fatemi, S., Kuh, A., Fripp, M., 2018. Parametric methods for probabilistic forecasting of solar irradiance. Renewable Energy 129, 666–676. https://doi.org/10.1016/j.renene.2018.06.022.
 Flemming et al. (2017) Flemming, J., Benedetti, A., Inness, A., Engelen, R., Jones, L., Huijnen, V., Remy, S., Parrington, M., Suttie, M., Bozzo, A., Peuch, V., Akritidis, D., Katragkou, E., 2017. The cams interim reanalysis of carbon monoxide, ozone and aerosol for 2003–2015. Atmospheric chemistry and physics 17, 1945–1983. https://doi.org/10.5194/acp1719452017.
 Friedman (2001) Friedman, J., 2001. Greedy function approximation: A gradient boosted machine. The Annals of Statistics 29, 1189–1232. https://statweb.stanford.edu/~jhf/ftp/trebst.pdf.
 Friedman (2002) Friedman, J., 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 367–378. https://doi.org/10.1016/S01679473(01)000652.
 Galván et al. (2017) Galván, I., Valls, J., Cervantes, A., Aler, R., 2017. Multiobjective evolutionary optimization of prediction intervals for solar energy forecasting with neural networks. Information Sciences 418419, 363–382. https://doi.org/10.1016/j.ins.2017.08.039.
 Glahn and Lowry (1972) Glahn, H., Lowry, D., 1972. The use of model output statistics (mos) in objective weather forecasting. Journal of applied meteorology 11, 1203–1211. https://doi.org/10.1175/15200450(1972)011%3C1203:TUOMOS%3E2.0.CO;2.
 KNMI (2018) KNMI, 2018. Hourly meteorological observations. https://projects.knmi.nl/klimatologie/uurgegevens/selectie.cgi.
 Koenker (2005) Koenker, R., 2005. Quantile Regression. Cambridge U. Press.
 Koenker (2018) Koenker, R., 2018. quantreg: Quantile Regression. R package version 5.36, https://CRAN.Rproject.org/package=quantreg.
 Lorenz et al. (2009) Lorenz, E., Hurka, J., Heinemann, D., Georg Beyer, H., 2009. Irradiance forecasting for the power prediction of gridconnected photovoltaic systems. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2, 2–10. https://ieeexplore.ieee.org/document/4897348.
 Massidda and Marrocu (2018) Massidda, L., Marrocu, M., 2018. Quantile regression postprocessing of weather forecast for shortterm solar power probabilistic forecasting. Energies 11, 1–20. https://doi.org/10.3390/en11071763.
 McRae (1980) McRae, G., 1980. A simple procedure for calculating atmospheric water vapor concentration. Journal of the Air Pollution Control Association 30, 394–394. https://doi.org/10.1080/00022470.1980.10464362.
 Meinshausen (2006) Meinshausen, N., 2006. Quantile regression forests. Journal of Machine Learning Reseach 7, 983–999. http://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf.
 Meinshausen (2017) Meinshausen, N., 2017. quantregForest: Quantile Regression Forests. R package version 1.37, https://CRAN.Rproject.org/package=quantregForest.
 Michalsky (1988) Michalsky, J., 1988. The astronomical almanac’s algorithm for approximate solar position (1950–2050). Solar Energy 40, 227–235. https://doi.org/10.1016/0038092X(88)90045X.
 Mohammed and Aung (2016) Mohammed, A., Aung, Z., 2016. Ensemble learning approach for probabilistic forecasting of solar power generation. Energies 9. https://doi.org/10.3390/en9121017.
 Remund et al. (2003) Remund, J., Wald, L., Lefèvre, M., Ranchin, T., Page, J., 2003. Worldwide linke turbidity information. https://hal.archivesouvertes.fr/hal00465791/document.
 Richardson (2000) Richardson, D., 2000. Predictability and economic value. https://www.ecmwf.int/sites/default/files/elibrary/2003/11922predictabilityandeconomicvalue.pdf.
 Ridgeway (2018) Ridgeway, G., 2018. gbm: Generalized Boosted Regression Models. R package version 2.1.4, https://CRAN.Rproject.org/package=gbm.
 Rigby and Stasinopoulos (2005) Rigby, R., Stasinopoulos, D., 2005. Generalized additive models for location, scale and shape. Applied Statistics 54, 507–554. https://doi.org/10.1111/j.14679876.2005.00510.x.
 Rigby and Stasinopoulos (2018) Rigby, R., Stasinopoulos, D., 2018. Generalized additive models for location, scale and shape,(with discussion). R package version 5.12, https://cran.rproject.org/package=gamlss.
 Rigollier et al. (2000) Rigollier, C., Bauer, O., Wald, L., 2000. On the clear sky model of the esra – european solar radiation atlas – with respect to the heliosat method. Solar Energy 68, 33–48. https://doi.org/10.1016/S0038092X(99)000559.
 Tibshirani et al. (2018) Tibshirani, J., Athey, S., Wager, S., Friedberg, R., Miner, L., Wright, M., 2018. grf: Generalized Random Forests. R package version 0.10.1, https://CRAN.Rproject.org/package=grf.
 Van der Meer et al. (2018) Van der Meer, D., Widén, J., Munkhammar, J., 2018. Review on probabilistic forecasting of photovoltaic power production and electricity consumption. Renewable and Sustainable Energy Reviews 81, 1484–1512. https://doi.org/10.1016/j.rser.2017.05.212.
 Verzijlbergh et al. (2015) Verzijlbergh, R., Heijnen, P., de Roode, S., Los, A., Jonker, H., 2015. Improved model output statistics of numerical weather prediction based irradiance forecasts for solar power applications. Solar Energy 118, 634–645. https://doi.org/10.1016/j.solener.2015.06.005.
 Voyant et al. (2018) Voyant, C., Motte, F., Notton, G., Fouilloy, A., Nivet, M.L., Duchaud, J.L., 2018. Prediction intervals for global solar irradiation forecasting using regression trees methods. Renewable Energy 126, 332–340. https://doi.org/10.1016/j.renene.2018.03.055.
 Voyant et al. (2017) Voyant, C., Notton, G., Kalogirou, S., Nivet, M., Paoli, C., Motte, F., Fouilloy, A., 2017. Machine learning methods for solar radiation forecasting: A review. Renewable Energy 105, 569–582. https://doi.org/10.1016/j.renene.2016.12.095.
 Wilks (2011) Wilks, D., 2011. Statistical Methods in the Atmospheric Sciences. volume 100. 3rd edition ed., Academic Press.
 WMO (2017) WMO, 2017. Guide to Meteorological Instruments and Methods of Observation. World Meteorological Organization. http://dx.doi.org/10.25607/OBP432.
 Zamo et al. (2014) Zamo, M., Mestre, O., Arbogast, P., Pannekoucke, O., 2014. A benchmark of statistical regression methods for shortterm forecasting of photovoltaic electricity production. part ii: Probabilistic forecast of daily production. Solar Energy 105, 804–816. https://doi.org/10.1016/j.solener.2014.03.026.