Comparison of statistical post-processing methods for probabilistic NWP forecasts of solar radiation

Comparison of statistical post-processing methods for probabilistic NWP forecasts of solar radiation

Kilian Bakker Kirien Whan Wouter Knap Maurice Schmeits maurice.schmeits@knmi.nl Royal Netherlands Meteorological Institute (KNMI), P.O. Box 201, 3730AE De Bilt, the Netherlands
Abstract

The increased usage of solar energy places additional importance on forecasts of solar radiation. Solar panel power production is primarily driven by the amount of solar radiation and it is therefore important to have accurate forecasts of solar radiation. Accurate forecasts that also give information on the forecast uncertainties can help users of solar energy to make better solar radiation based decisions related to the stability of the electrical grid. To achieve this, we apply statistical post-processing techniques that determine relationships between observations of global radiation (made within the KNMI network of automatic weather stations in the Netherlands) and forecasts of various meteorological variables from the numerical weather prediction (NWP) model HARMONIE-AROME (HA) and the atmospheric composition model CAMS. Those relationships are used to produce probabilistic forecasts of global radiation. We compare 7 different statistical post-processing methods, consisting of two parametric and five non-parametric methods. We find that all methods are able to generate probabilistic forecasts that improve the raw global radiation forecast from HA according to the root mean squared error (on the median) and the potential economic value. Additionally, we show how important the predictors are in the different regression methods. We also compare the regression methods using various probabilistic scoring metrics, namely the continuous ranked probability skill score, the Brier skill score and reliability diagrams. We find that quantile regression and generalized random forests generally perform best. In (near) clear sky conditions the non-parametric methods have more skill than the parametric ones.

keywords:
Model Output Statistics (MOS), probabilistic forecasting, solar radiation, Numerical Weather Prediction (NWP)
journal: Solar Energy

1 Introduction

Forecasts of meteorological variables are produced by projecting an initial state of the atmosphere into the future using a numerical weather prediction (NWP) model. In the projection there are unavoidable errors related to uncertainty in the initial condition and the physical parameterizations. This leads to uncertainty in the forecasts that can be quantified by generating a probabilistic forecast. In the case of solar radiation, this gives useful information about the uncertainty in power production of solar panels, as it is almost directly related to the uncertainty in forecasts of solar radiation. This helps users of solar energy make better solar radiation based decisions related to the stability of the electrical grid.
A considerable amount of work has already been done on generating probabilistic forecasts of global radiation (i.e. the total short-wave radiation on a horizontal surface received from a solid angle of 2 steradians (unit: W/m2)). For example Lorenz et al. (2009) produced 95% confidence intervals on errors in the clear sky index (CSI, defined as global radiation divided by clear sky radiation) by fitting a polynomial function to the standard deviation of the errors as a function of the forecast CSI and the solar zenith angle. Verzijlbergh et al. (2015) fitted a normal distribution to the CSI errors using NWP output from the GFS model of NCEP. Other distributions were also used for forecasting solar radiation, such as the beta and two sided power distribution in Fatemi et al. (2018) and the gamma distribution in Bracale et al. (2013). They both use observations of various meteorological variables to forecast radiation.
Massidda and Marrocu (2018) produced probabilistic forecasts by fitting quantiles of the underlying distribution of radiation, called quantile regression, using deterministic and probabilistic NWP output of ECMWF. They also extended this to a method called gradient boosting decision trees. Almeida et al. (2015) use quantile regression forests to forecast quantiles of the distribution of solar power using NWP output from the WRF model run at Meteogalicia. Neural networks were used by Galván et al. (2017) to forecast confidence intervals of solar radiation using NWP output from the GEFS model of NCEP.
In recent literature, there are some reviews that give an overview of papers using post-processing methods to produce probabilistic forecasts of solar radiation or solar power production. For example Antonanzas et al. (2016) discussed papers using neural networks, random forests, support vector machines and nearest neighbours for forecasting solar power. Van der Meer et al. (2018) reviewed papers using quantile regression, quantile regression forests, Gaussian processes, bootstrapping, lower upper bound estimate, gradient boosting, kernel density estimation, nearest neighbours and an analog ensemble for forecasting solar power. Voyant et al. (2017) discussed papers using linear regression, generalized linear models, neural networks, support vector machines, decision tree learning, nearest neighbours and Markov chains for forecasting solar radiation. This last review is oriented towards deterministic forecasts, whereas the first two focus on probabilistic forecasts.
There are relatively few papers in which a comparison among multiple regression methods on the same data set for solar radiation or solar power forecasting is a primary goal. David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used only past observations of global radiation and the CSI as predictors. Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and lasso and ridge regression. They used ECMWF output variables as predictors in the regression methods. Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests with NWP output from the PEARP model at Météo France as predictors. Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with observed CSI as the only predictor.

In this paper we generate probabilistic forecasts of global radiation by applying a statistical post-processing technique called model output statistics (MOS, Glahn and Lowry (1972)). This consists of fitting relationships between various NWP variables and CSI observations and using the relationships for producing skillful and reliable probabilistic forecasts. We make a comparison of probabilistic forecasts using an extensive list of regression methods, consisting of both parametric methods (gamma and truncated normal distribution) and non-parametric methods. The non-parametric methods consist of quantile regression, quantile regression forests, gradient boosting decision trees, and the relatively new generalized random forests and quantile regression neural networks. The methods are chosen to include the most commonly used methods and some (relatively) new methods, that take both linear and non-linear relationships into account. To our knowledge we are the first to make such an extensive comparison of probabilistic forecasting methods in a MOS setting. We use a large number of potential predictors, consisting of output from the high-resolution non-hydrostatic NWP model HARMONIE-AROME (HA, running at KNMI) and the atmospheric composition model CAMS (Copernicus Atmosphere Monitoring Service, running at ECMWF). This is a second novel aspect of this work, as previous work has not explored the benefit of including both NWP (here HA) and CAMS predictors, to our knowledge. We investigate the importance of the predictors in the fitted relationships. Additionally, the probabilistic forecasts are verified and compared with multiple scoring metrics to see which method performs best in which situation. These include the continuous ranked probability skill score, Brier skill score and reliability diagrams (e.g. Wilks (2011)). We also show the improvement with respect to the raw forecast (the global radiation forecast from HA) with the root mean squared error skill score and the potential economic value (e.g. Richardson (2000)).

The paper is structured as follows: in Section 2 we describe the research setting, consisting of the data, the regression methods, the fitting procedure and the scoring metrics. In Section 3 we present our results and discuss them in Section 4. The paper is finished with conclusions in Section 5.

2 Data and Methods

In this section we first describe the observations and model data that we used, including transformations applied to the model data. Next, we give an overview of the regression methods used to model the relationships between forecasts of weather variables (the predictors) and CSI observations (the predictand). The procedure we used for fitting is described and after that we describe verification measures to verify the methods.

2.1 Observations

The meteorological variable we study in this paper is the global radiation. We used hourly average observations of global radiation measured at 30 weather stations in the Netherlands by the KNMI network of automatic weather stations for the period of April 1, 2016 to March 16, 2018 (KNMI (2018)). The KNMI network of solar radiation measurements consists of unventilated Kipp & Zonen CM11 pyranometers. The calibration of the instruments is traceable to the World Radiometric Reference (WRR). The instruments are sampled every 12s and 1 hr averages are used for the analysis presented here. Routine quality control procedures are applied based on intercomparison of stations and clear-sky comparisons. The achieved uncertainty is within the limits of “network operations” as defined by the WMO CIMO guide (8% for hourly totals, 95% confidence level; WMO (2017)). We transformed the global radiation into CSI by dividing them by the hourly average clear sky radiation. This removes the large seasonal and diurnal cycle of radiation. The clear sky radiation is calculated using the European Solar Radiation Atlas (ESRA) model (Rigollier et al. (2000)), with as inputs the solar zenith angle (Michalsky (1988)) and the Linke turbidity factor (monthly average values for De Bilt are used from Remund et al. (2003)). The locations of the stations and the seasonally averaged observed CSI are shown in Figure 1. For inland stations the seasonally averaged observed CSI is slightly lower than for coastal stations in spring and summer due to convection. In winter and autumn the pattern is quite homogeneous.

Figure 1: The KNMI station locations and seasonally averaged observed CSI in the Netherlands. Station Cabauw is marked in blue.

Figure 2 shows daily courses of the fraction of CSI observations strictly greater than 0.8 (corresponding to low cloud cover conditions, including clear skies) for the four seasons and for coastal (less than 25 km from the coast) and inland (more than 25 km from the coast) stations separately. At the inland stations during spring, summer and fall the fraction drops during the day a few hours after sunrise. This is likely due to convection with a (secondary) maximum at the end of the day caused by the disappearance of shallow convection. Inland stations during winter and coastal stations in all seasons show a different diurnal cycle with a similar evolution between a few hours after sunrise and a few hours before sunset and a (secondary) maximum around 12 UTC. In spring and summer the fraction of low cloud cover events is higher at the coastal stations than at the inland stations due to less convective events.

Figure 2: The fraction of CSI observations versus the time of the day for the four seasons and for (a) coastal (less than 25 km from the coast) and (b) inland (more than 25 km from the coast) stations.

2.2 Model data

The model data that we used as the potential predictors for the regression methods consist of forecasts of atmospheric temperature, humidity and different cloud, radiation and aerosol variables (Table 2). Most of the predictors (marked with *) are NWP output of HA (Bengtsson et al. (2017)). The aerosol predictors (marked with ** ) are forecasts produced by CAMS (Flemming et al. (2017)). Because HA uses climatological averages for the aerosols, we expect the aerosol forecasts as predictors to have added value in fitting the relationships (see also Alfadda et al. (2018)).
We used hourly output from HA for lead times 0-48 hours at the grid points closest to the stations (in the same date range as the observations, April 1, 2016 to March 16, 2018). The CAMS lead times are 3-hourly and therefore we take the lead time closest to the corresponding time of the observations. The CAMS forecasts are later available than HA and therefore we use them for lead times of 24-72 hours from the previous day. We used CAMS data for the same period as the HA and observation data. More information about HA and CAMS can be found in Table 1.

HA CAMS
Version(s) 38 43r3, 43r1, 41r1
Type of forecasts Deterministic
initial time 0 UTC
Time domain 0-48 hours 24-72 hours
Time resolution hourly 3-hourly
Spatial domain Western Europe (800x800 grid points) Global
Spatial resolution 2.5km x 2.5km x
Vertical levels 65 60
Table 1: Information about the models HA and CAMS.
Potential predictors
Temperature*
Relative humidity*
Radiation* , , , ,
Clouds and rain* , , ,
Aerosols** , ,
Time/place , , , , , ,
(a) HA = *, CAMS = **. ’surf’ stands for the Earth’s surface and ’toa’ for the top of the atmosphere (defined as the height with air pressure equal to 0.01 hPa). The layers indicate the lower (0-2000 m), middle (2000-6000 m), higher (6000 m-TOA) and total (0 m-TOA) part of the atmosphere.
Temperature
Relative humidity
Global radiation
Direct radiation
Net clear sky (global) radiation
Cloud cover
Cloud water
Precipitable water
Vertically integrated aerosol optical depth at 500 nm
Vertically integrated Ångström exponent
Vertically integrated ozone
Latitude
Longitude
Day of the year
Distance
(b) The abbreviations of the potential predictors explained. Net clear sky stands for the incoming minus the outgoing radiation under a clear sky.
Table 2: The potential predictors

For each of the potential predictors with ’layers’ as a subscript, we computed them for the lower (0-2000 m), middle (2000-6000 m) and upper atmosphere (6000 m and higher), and also an aggregated version of the total atmosphere (except for temperature and relative humidity, where the total atmosphere component is missing). The HA output consists of 65 levels in the atmosphere and for the cloud water we applied a weighted average (with weights equal to the distances between the levels) to reduce these to the four layers (lower, middle, upper and total atmosphere). For temperature and humidity we used HA output from 11 pressure levels and reduced these to the three layers (lower, middle and higher part of the atmosphere) by applying a weighted average. The precipitable water is computed using temperature and humidity from the 11 pressure levels together with the method outlined by McRae (1980) and applying a weighted average.
We transformed all radiation predictors into hourly average CSI values in the same way as the observations. The rain predictor values are precipitation sums over each hour. For all other potential predictors the valid time is used.
We also calculated time/place predictors, which are used as additional potential predictors. These consist of the latitude and longitude of the station locations, the day of the year, the distance to the coast (minimum Euclidean distance to the sea), distance to water (minimum Euclidean distance to the sea or one of the lakes), distance to inland (the Euclidean distance to the intersection point of the borders of the Netherlands, Belgium and Germany) and the cosine of the solar zenith angle.

2.3 Regression methods

We generated probabilistic forecasts using regression methods that find relationships between the predictors and the predictand. The relationships help in understanding how the value of the predictand changes when any one of the predictor values change. Then, one can quantify the uncertainty in the predictand in terms of the predictors and form the probabilistic forecast. We fit for 7 different regression methods and they are described in more detail in the following subsections.

2.3.1 Parametric regression

Parametric regression methods assume a specific predictive distribution with parameters (location, scale, skewness and kurtosis). The parameters are first chosen as constant values and then fitted sequentially (one by one, each new one using the already fitted ones) with linear relationships:

(1)

where , are the predictors and are the regression parameters. The optimal linear relationship is found by minimizing the log-likelihood function (i.e. the negative logarithm of the forecast probability density function).
We avoid overfitting by using only a selection of the predictors in the model. The selection that gives the best fit is found by forward (adding one predictor) and backward (removing one predictor) stepwise selection that aims to minimize the Akaike Information Criterion (AIC). The stepwise procedure is applied separately for each distribution parameter.
We apply this regression method in the programming language R using the gamlss package (Rigby and Stasinopoulos (2018)) for two distributions: the gamma distribution (GA) and the truncated normal distribution (NOTR). The truncated normal distribution is formed by left truncation at zero of the normal distribution (Rigby and Stasinopoulos (2005)) and has therefore, just like the gamma distribution, only probability mass for positive values, which suits for forecasting radiation.

2.3.2 Quantile regression

Quantile regression fits a number of quantiles from the underlying distribution separately by assuming the linear relationship:

(2)

The optimal linear relationship is found by minimizing the weighted sum of residuals (WSR, defined as observations - fitted quantiles), with weights equal to and for respectively positive and negative residuals (Koenker (2005)). Regularization to prevent overfitting was not successful in our study and therefore we apply the stepwise procedure, this time with the aim of minimizing the average over the AIC values for the different quantiles. Consequently, all quantiles are fit with the same predictors. It occasionally happens that negative values are forecast and in that case we set them to zero. Also, we solve crossings between quantiles by sorting them in ascending order. Lastly, some small random noise, generated from a normal distribution with mean zero and a variance of 0.001, is added to the predictors to prevent the matrix with the predictors from becoming singular. Quantile regression (QR) is applied in R using the quantreg package (Koenker (2018)).

2.3.3 Quantile regression neural network

Quantile regression neural networks are a non-linear extension of quantile regression. It can be visualized by an input layer (the predictors), one or more hidden layers and an output layer (the forecast for one quantile). In the hidden layers we adjust the input towards the output by multiplication with weights, adding some bias and applying a non-linear activation function, taken here as the default . The optimal weights and bias are found by minimizing the WSR in a boosting fashion where the negative gradient is followed for a number of iterations. More details can be found in Cannon (2011).
Regularization to prevent overfitting was not successful and therefore we apply the stepwise procedure. Further, negative forecasts are set to zero. Crossings between quantiles are solved by adding the quantiles as a predictor in the input layer with a monotonically increasing constraint (Cannon (2018a)). This is called the monotone composite extension of quantile regression neural network (MCQRNN) and we apply it in R using the qrnn package (Cannon (2018b)).

2.3.4 Random forests

Random forests have binary regression trees (Breiman et al. (1984)) as building blocks. A binary regression tree is formed by splitting the data set into subsets by applying thresholds on the predictors. Each split is chosen to minimize the variance in the predictand. The splitting is repeated until some stopping criterion is reached, such as a minimum number of predictand values left in each subset. A random forest is a collection of trees (Breiman (2001)) and reduces the chance for overfitting by constructing each tree on a different (bootstrap sampled) subset of the data. To make the trees even more independent, at each split only a (bootstrap sampled) subset of the predictor set is used. Probabilistic forecasts are made by looking at the underlying distribution of the predictand values in the terminal nodes of all trees and drawing quantiles from it (Meinshausen (2006)). This method is called quantile regression forests (QRF) and is applied in R using the quantregForest package (Meinshausen (2017)).
Generalized random forests (GRF) is very similar to QRF. Each split is now chosen to maximize the difference between the underlying distributions of the predictand values (Athey et al. (2016)). Also, whereas QRF uses bootstrap sampling with replacement, GRF uses bootstrap sampling without replacement. GRF is applied in R using the grf package (Tibshirani et al. (2018)).

2.3.5 Gradient boosting decision trees

Gradient boosting decision trees boosts the fitted quantiles (separately) towards the actual quantiles of the underlying distribution by following the negative gradient of the WSR for a number of iterations. The negative gradients are estimated using regression trees. Then in each iteration the fits are updated by adding the estimated negative gradients multiplied with the learning rate (Friedman (2001)). The learning rate is generally chosen to be small () to prevent overshooting the minimum in the WSR. To reduce the chance for overfitting, each tree is constructed on a (bootstrap sampled) subset of the data (Friedman (2002)). Consequently, in each iteration we only improve the fitted quantile for the predictand values in the chosen subset. Further, negative values are set to zero and crossings between quantiles are solved by sorting them in ascending order. Gradient boosting decision trees (GBDT) is applied in R using the gbm package (Ridgeway (2018)).

2.4 Fitting procedure

We fitted relationships between the potential predictors and the predictand using the 7 methods. The fitting was done per lead time, because systematic errors increase with lead time, and for all stations at once. We split the data set into a training (for model fitting) and testing (for verification) set. The testing set consists of 1/3 of the data set containing consecutive dates and the rest of the data are in the training set. This leads to around 3200 training and 1600 testing cases for each fit when we fit to each season separately. We applied a 3-fold cross-validation. For each lead time we only take combinations of dates and stations during daylight into account, defined as a clear sky radiation higher than 20 W/m. Only for lead times +5h till +19h and +29h till +43h there are enough (defined by at least 50) combinations of dates and stations during daylight to make the fit. For each method, we generated probabilistic forecasts consisting of 49 quantiles, distributed evenly between 0.02 and 0.98 with a gap of 0.02 between them.
We can either fit the methods on the whole year or on the four meteorological seasons separately, defined by the Winter (December, January, February), the Spring (March, April, May), the Summer (June, July, August) and the Autumn (September, October, November). As shown in Bakker (2019), fitting for each season separately produces better fits and is therefore used in producing the results. Also averaging the potential predictors from HA over 3 lead times and over a 9x9 block of grid points gives better fits. Therefore, we apply both the temporal and spatial smoothing to our potential predictors. Next, the optimal hyper-parameter settings for each method can be found in Bakker (2019) (only the optimal settings for MCQRNN have changed) and are summarized in Table 3.

GA NOTR QR MCQRNN
#steps ( or quantiles) 5 5 5 5
#steps () 1 1 - -
#hidden layers - - - 1
#deep hidden layers - - - 0
Iterations - - - 10
Penalty - - - 0
QRF GRF GBDT
#trees 500 500 100
Minimal nodesize 5 5 5
Case sampling fraction 1/2 1/2 1/2
Predictor sampling fraction 1/3 1/3 -
Depth of trees - - 1
Learning rate - - 0.1
Table 3: The optimal hyper-parameter settings for each method. The descriptions of the hyper-parameters are found in Bakker (2019).

2.5 Scoring metrics

To quantify the performance of the regression methods, we calculated scoring metrics over the forecasts (e.g. Wilks (2011)). First, the medians of the forecasts are verified using the root mean squared error (RMSE):

(3)

where are the CSI observations (for the same season and (valid) time as the forecasts) and the forecasts, with ’med’ denoting the 0.5 quantile. The RMSE is transformed to a skill score using the RMSE of the sample climatology (the mean of the observations):

(4)

For verifying the probabilistic forecasts we used the continuous ranked probability score (CRPS):

(5)

We calculated the continuous ranked probability skill score (CRPSS) using Equation (4) with the RMSE replaced by the CRPS. The sample climatology consists in this case of quantiles drawn from the observations to generate a sample climatological distribution.
For verification it is also informative to look at probabilities of not exceeding a certain threshold of CSI. Therefore we have binary observations () and forecast probabilities (), which are verified using the Brier score (BS):

(6)

The skill is measured by the Brier skill score (BSS), calculated using Equation (4) with the RMSE replaced by the BS, where the sample climatology is defined as the mean of the binary observations.
We also produced reliability diagrams that compare forecast probabilities with observed relative frequencies. A perfect reliable forecast has a 1:1 correspondence between them.
The potential economic value measures the potential economic impact of the forecasts and is defined by a simple cost-loss model described in Richardson (2000). It looks at the event of not exceeding a certain threshold of CSI with the associated contingency table (Table 4). A hit and false alarm lead to some costs , a miss to a loss and with a correct rejection there are no costs or losses.

Observed Not observed
Forecast Hit False alarm
Not forecast Miss Correct rejection
Table 4: Contingency table corresponding to the event that the threshold of CSI does not exceed a certain value . indicates costs and losses.

The potential economic value (PEV) is calculated as:

(7)

where and are the hit, false alarm and miss frequencies and ORF the observed relative frequency.

3 Results

In this section we first investigate how important the predictors are for the different methods. Subsequently, we show the performance of the methods according to all scoring metrics defined in Section 2.5.

3.1 Importance of predictors

We have 34 potential predictors and some of them are more important than others. In Tables 5 and 6 we list the 10 most important predictors for the stepwise-based and tree-based methods, respectively. For the stepwise-based methods (GA, NOTR, QR and MCQRNN) the ranking is based on the amount of times the predictors are chosen in the procedure. For the tree-based methods (QRF, GRF and GBDT) the ranking is based on the improvement the predictors have on the fitted trees, considered over all splits where the corresponding predictor is chosen. This improvement is defined slightly differently in QRF, GRF and GBDT and more details can be found in Breiman et al. (2018), Tibshirani et al. (2018) and Ridgeway (2018). The final ranking is then the average improvement over all trees in all fits. Due to the different definitions of the importance measure between the stepwise-based and tree-based methods, we cannot directly compare the resulting rankings between these methods.

Rank GA NOTR QR MCQRNN
1
2
3
4
5
6
7
8
9
10
Table 5: The top 10 ranking of the importance of the predictors in the stepwise-based methods.
Rank QRF GRF GBDT
1
2
3
4
5
6
7
8
9
10
Table 6: The top 10 ranking of the importance of the predictors in the tree-based methods.

We see that the raw HA global radiation forecast () is the most important predictor for almost all methods. This is what one would expect, because already accounts for all kinds of effects from other HA variables. It does however not account optimally for these effects and consequently other predictors are also important, such as direct radiation, relative humidity, cloud cover and cloud water. The relative humidity is more important than the direct radiation in the stepwise-based methods, whereas for the tree-based methods it is the opposite way. This can be explained by the fact that if a stepwise-based method selects global radiation as a predictor, then the direct radiation becomes less useful as a predictor, due to the high correlation between both. In that case the relative humidity becomes more important. For the tree-based methods this effect is less apparent due to the independence between different branches of the trees. Therefore, global radiation can be important in one branch and direct radiation in another branch, and consequently they are both important considering the entire tree.
After , the direct radiation and relative humidity, cloud predictors such as or follow in most methods. Sometimes the cloud predictors are even higher in the ranking. The aerosol predictors from CAMS are quite important in the stepwise-based methods, with as most important of the three. For the tree-based methods the aerosol predictors are of medium importance seen over all trees, but they can still be useful in certain branches of the trees. The aerosol predictors are in general of lower importance than cloud related predictors. This can be explained by the fact that clouds block on average more incoming radiation than aerosols and are consequently more responsible for the uncertainty in forecast global radiation. For the predictors with multiple layers, the lower component is in general more important than the middle or higher component. Therefore the lower atmosphere has more influence on global radiation.
Furthermore, it is good to note that although some predictors are more important than others, they are all used (seen over all fits) in forecasting global radiation. Also, the tree-based methods have the advantage over the stepwise-based methods that they make use of all predictors in all fits, whereas for the stepwise-based methods we set a limit on the amount of predictors to prevent overfitting.

3.2 Verification of deterministic forecasts

We computed the RMSE_SS for all lead times, all stations, the four seasons and the different methods and in Figure 3 we plot the RMSE_SS against the lead time for the four seasons. All methods are more skillful than the raw forecast and have therefore an increased accuracy. The RMSE_SS with respect to the raw forecast ranges between 2-3% in winter, 5-7.5% in spring and 7.5-10% in summer and fall. For lead times close to the night, the skill drops due to the fact that the forecasts get closer to the sample climatology (because both approach zero) and the latter becomes therefore harder to beat. The second forecast day shows lower RMSE_SS than the first day as expected.
In winter and spring the RMSE_SS is relatively stable during the day. Only QRF and MCQRNN perform slightly worse than the other methods in respectively winter and spring. In summer and autumn we see that the skill scores drop during the day, which might be caused by convection (see Figure 2) as it is difficult to forecast by NWP models. The highest skill scores are reached in the morning. In summer, MCQRNN performs slightly worse than the other methods. In autumn, all methods perform similarly according to the RMSE_SS.

Figure 3: RMSE_SS versus lead time for the four seasons and averaged over all stations. The abbreviations of the methods are explained in Section 2.3 and RAW stands for the raw forecast.

3.3 Probabilistic forecasts for selected cases

Three interesting cases are shown for respectively a clear sky, fully overcast and partly overcast day at station Cabauw (see Figure 1). First, we look at the case of April 9, 2017, which was a day without clouds. We compare the raw forecast and post-processed forecasts for lead times +6h till +18h with the observations for 6-18 UTC in Figure 4. Two methods are chosen, one parametric (GA) and one non-parametric (QRF), for which the median and the forecast range between the lowest and highest quantile (the gray area) are plotted. We see that the raw forecast is already very accurate and this can be seen on most clear sky days. The raw forecast has a small negative bias around 12 UTC and both GA and QRF correct this in the median, but with a small positive bias, especially for GA. Looking at the uncertainty, we see that QRF is much more certain than GA, with a bandwidth of around 100 W/m versus 200 W/m.

(a) GA method
(b) QRF method
Figure 4: Clear sky case of April 9, 2017 at station Cabauw (see Figure 1). OBS stands for the observations, RAW for the raw forecast, GA for the gamma distribution and QRF for the quantile regression forest. The gray area denotes the probability distribution from the 0.02 until the 0.98 quantile.

For a fully overcast day we take December 31, 2016. In Figure 5 we compare GA and QRF with the raw forecast for lead times +9h till +15h. We see that the raw forecast clearly underestimates the amount of radiation. Both GA and QRF correct for this bias almost completely in the median. QRF generally keeps a small negative bias, while GA has a small positive bias after 13 UTC. Both methods have bandwidths for the uncertainty ranging approximately between 50 and 100 W/m. This indicates that the methods are very certain that it will be a fully overcast day. GA is slightly more uncertain about the amount of radiation than QRF. The distributions of both GA and QRF are asymmetrical and skewed towards higher (than the median) amounts of radiation, which is (physically) realistic in this case, given that the lower bound of radiation should be strictly greater than zero in the middle of the day.

(a) GA method
(b) QRF method
Figure 5: As Figure 4 but for December 31, 2016, which was a fully overcast day. CSR stands for the clear sky radiation.

Finally, as partly overcast day we take April 23, 2017. In Figure 6 we compare GA and QRF with the raw forecast for lead times +6h till + 19h. In this case the observations are varying more frequently during the day and none of the forecasts are able to forecast these variations. However, the medians of both GA and QRF are closer to the observations than the raw forecast and the median of QRF is better than the median of GA. The uncertainty bandwidths are now indicating that every weather condition between fully overcast and completely sunny is possible. For GA, the uncertainty bandwidth reaches above the clear sky radiation (CSR), due to the assumptions made for that method.

(a) GA method
(b) QRF method
Figure 6: As Figure 4 but for April 23, 2017, which was a partly overcast day. CSR stands for the clear sky radiation.

3.4 Verification of probabilistic forecasts

For verifying the probabilistic forecasts, we show the results from multiple scoring metrics in the following subsections to compare the performance of the different methods.

3.4.1 Continuous ranked probability skill score

Starting with the CRPSS, in Figure 7 we plot the CRPSS over the lead times for the different methods in the four seasons. All methods are less skillful closer to the night than during the day due to the fact that the forecasts get closer to the sample climatology (because both approach zero) and the latter becomes therefore harder to beat. The second day shows as expected less skill than the first day. All methods have similar performance with skill scores around 0.4 on day 1 and 0.3-0.35 on day 2. In spring and summer the methods have more skill than in winter and autumn. In winter and spring the skill is relatively stable during the day, while in summer and autumn the skill is best in the morning and decreases during the day. The difficulty in forecasting convection might be the cause of this. In spring and summer, GA performs worse than the other methods, while in winter and autumn the difference in performance between the methods is smaller. Overall, QR and GRF perform best in terms of the average CRPSS.

Figure 7: CRPSS versus lead time for the four seasons and averaged over all stations. The abbreviations of the methods are explained in Section 2.3.

It is also informative to look at the spatial patterns of the CRPSS and therefore we plot the maximum value for each station for lead time +12h in Figure 8. The maxima range between 0.3 and 0.47. We see that the method reaching the maximum varies between the seasons and the stations, only NOTR never performs best. In winter QR is most skillful at most stations, while in spring it is GRF. In summer and autumn it is less clear which method is most skillful overall. Although the skill varies between stations, there is not a clear spatial pattern. There are some seasonal variations, for example in winter and autumn the stations near the coast have mostly less skill than stations more inland. This might occur because convection over sea causes more clouds over stations near the coast.

Figure 8: CRPSS for all stations and the four seasons at lead time +12h. Only the maximum value over the methods is shown. The abbreviations of the methods are explained in Section 2.3.

3.4.2 Brier skill score

To investigate the performance of the methods for different CSI values, we plot in Figure 9 the BSS versus the CSI threshold for the four seasons at lead time +12h. The BSS increases as a function of CSI in spring and summer, while in winter it decreases again for higher CSI. In autumn the BSS is stable around 0.4 for CSI values higher than 0.3. The BSS increase in spring and summer is probably caused by the higher predictability of clear sky versus convective situations. While the differences between the methods are small for lower CSI values, it is interesting to note that for thresholds higher than 0.7 (corresponding to (near) clear sky conditions), there is a bifurcation in the skill, especially in spring and summer. The tree-based methods and QR show higher BSS than the other methods. Consequently, the global radiation under (near) clear sky conditions is better forecast by the tree-based methods and QR. For tree-based methods, this can be explained by the fact that trees can create separate branches for clear sky cases. QR fits each quantile separately and can therefore better narrow the distribution in clear sky situations to predict those situations more accurately.

Figure 9: BSS versus CSI threshold at lead time +12h for the four seasons, averaged over all stations. The abbreviations of the methods are explained in Section 2.3.

3.4.3 Reliability diagrams

To access the reliability of the methods, in Figure 10 the reliability diagrams for CSI thresholds 0.2, 0.5 and 0.9 for lead time +12h in summer are shown. For the threshold 0.2 all methods are unreliable for high forecast probabilities, due to a low amount of cases. For the low forecast probabilities the tree-based methods are very reliable and almost on top of the diagonal line. The stepwise-based methods are slightly less reliable. For the threshold 0.5 all methods are very reliable and there is no method that clearly performs better or worse than the rest of the methods. For the threshold 0.9, all methods are reliable for the high forecast probabilities, but for lower forecast probabilities the tree-based methods are again most reliable. The parametric methods are least reliable and QR and MCQRNN are in between. In general the tree-based methods are most reliable, especially under clear sky conditions.

Figure 10: Reliability diagrams for different CSI thresholds at lead time +12h in summer, for all stations together. The abbreviations of the methods are explained in Section 2.3.

3.4.4 Potential economic value

To further study the behaviour for different CSI values, in Figure 11 we plot the potential economic value (PEV) versus the cost-loss ratio for CSI thresholds 0.2, 0.5 and 0.9 at lead time +12h in summer. For all thresholds, the methods show higher PEV values than the raw forecast and show positive PEV values over a wider cost-loss ratio range (the full range between 0 and 1 for thresholds 0.5 and 0.9). For threshold 0.2, the tree-based methods perform worse than the stepwise-based methods for cost-loss ratios above 0.4. For the threshold 0.5, all methods have equal performance. For the threshold 0.9, the tree-based methods and QR perform slightly better than the other methods for cost-loss ratios below 0.2.

Figure 11: Potential economic value versus cost-loss ratio for different CSI thresholds at lead time +12h in summer, for all stations together. RAW stands for the raw forecast and the rest of the abbreviations are explained in Section 2.3.

4 Discussion

The performance of the methods presented here depends on the lead time, the station location and the season. Nevertheless, in general all methods had similar performance. It also depends on the scoring metrics, that focus on different aspects, like accuracy and reliability, and on deterministic or probabilistic forecasts. We often saw that quantile regression and generalized random forests performed best and that the tree-based methods outperformed the stepwise-based methods in (near) clear sky conditions. In the discussion presented here we compare our results with results from other papers, where comparing methods is the primary goal, and look for differences and similarities.
David et al. (2018) made a comparison between probabilistic forecasts using random forests, neural networks, (weighted) quantile regression, gradient boosting decision trees, recursive GARCH and the sieve bootstrap. They used endogenous data consisting of past observations of global radiation and CSI as predictors for forecasting CSI. They came to the conclusion that all methods performed very similarly, in agreement with our results. They based this on verification measures including the CRPS, reliability diagrams and rank histograms. They found (weighted) quantile regression and gradient boosting decision trees as most efficient methods for generating probabilistic forecasts in terms of complexity and computational time. They concluded that neural networks are the least efficient with a large computational time and requiring sufficient scientific background for good performance.
Mohammed and Aung (2016) made a comparison between deterministic forecasts using decision trees, nearest neighbours, gradient boosting decision trees, random forests and both lasso and ridge regression. They used NWP output from the ECMWF as predictors to forecast solar power production. They found gradient boosting decision trees as best performing in terms of the RMSE and mean absolute error (MAE). In our study, gradient boosting decision trees performed well, but not the best, according to the RMSE.
Zamo et al. (2014) made a comparison between probabilistic forecasts using quantile regression and quantile regression forests. They used NWP output from the PEARP model at Météo France. They forecast solar power production and verify the forecasts using the CRPS and rank histograms. They found that the best performing method depends on the specific settings and concluded that quantile regression and quantile regression forests performed similar in general, which is in agreement with our results.
Lastly, Voyant et al. (2018) made a comparison between probabilistic forecasts using quantile regression forests and gradient boosting decision trees with past CSI observations as predictor set. They generate prediction intervals for the CSI and verify them using the gamma test, which is based on the mean interval length and prediction interval coverage probability. The authors found that gradient boosting decision trees performed best. In our study, quantile regression forests and gradient boosting decision trees performed almost similar.

5 Conclusions

In this paper we compared 7 regression methods. Using these methods, we produced probabilistic forecasts of global radiation with the raw forecast (the deterministic global radiation forecast from HA) together with other relevant meteorological variables as potential predictors. We saw that all potential predictors were used, and that global radiation, direct radiation, relative humidity, cloud cover and cloud water were most important. The aerosol predictors from CAMS were less important than the radiation and cloud-based predictors from HA, but they were still often selected, particularly the aerosol optical depth in the stepwise-based methods. Aside from that, the tree-based methods have the advantage of using all predictors in all fits, whereas for the stepwise-based methods the amount of predictors used in a fit is limited. We verified the forecasts in a deterministic way using the root mean squared error skill score. All methods had an improvement between 2% and 10% upon the raw forecast, depending on the season and lead time. In visualizing the probabilistic information in the forecasts we looked at three interesting cases with different weather conditions: a clear sky, fully overcast and partly overcast day. In all cases the medians of the forecasts were close to the observations, but the uncertainty in forecast radiation differed between the methods (mostly on the clear sky and partly overcast day). Next, we verified the methods in a probabilistic way using the continuous ranked probability skill score, Brier skill score, reliability diagrams and the potential economic value. We saw that the performance of the methods depended on the time of the day, the forecast lead time, the geographical location, the season and the weather situation (clear sky or cloudy). In general all methods performed similarly, but depending on the verification measure there are some methods that performed better than others. According to the continuous ranked probability skill score, quantile regression and generalized random forests performed slightly better and the gamma and truncated normal distributions slightly worse than the other methods. For the Brier skill score there was not a clear best performing method, but we saw that clear sky conditions are better forecast by tree-based methods than stepwise-based methods. Also the tree-based methods are more reliable, according to the reliability diagrams. Next, the probabilistic forecasts were compared with the deterministic raw forecast using the potential economic value to show the added value of uncertainty information in the probabilistic forecasts. All methods achieved both higher potential economic values and positive potential economic values over a wider cost-loss ratio range than the raw forecast. Therefore the probabilistic forecasts are very useful for users of solar energy. They can make better decisions based on the improved accuracy and information about the uncertainty.
Future research could focus on more complex economic models for making a comparison between the methods. Another research direction is to investigate other distributions than the ones tried in this paper or other non-parametric regression methods. Furthermore, the effect of more potential predictors (for example diffuse radiation) could be investigated. Also predictors taken from the (relatively new) HA ensemble prediction system could be used, such as the ensemble mean or ensemble variance of the global radiation forecast, and would likely be better predictors than those from the deterministic HA forecasts.
The methods presented here are all based on techniques of statistical post-processing. A different, physical approach is to improve the NWP models to lead to better predictors. This could be for example improving the representation of clouds and aerosols in the model. This would also improve the forecasts of global radiation. However, statistical post-processing is still expected to improve skill of those forecasts.

Acknowledgements

We thank Jan Fokke Meirink (KNMI) for providing the aerosol forecasts from CAMS, consisting of the aerosol optical depth, Ångström exponent and ozone. We thank Jason Frank (Utrecht University), Wim de Rooy, Jan Barkmeijer, Sander Tijm and Sibbo van der Veen (all from KNMI) for their help in the study described in this paper.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declarations of interest: none

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
388243
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description