ShortTerm Predictability of Photovoltaic Production over Italy
Abstract
Photovoltaic (PV) power production increased drastically in Europe throughout the last years. About the 6% of electricity in Italy comes from PV and for an efficient management of the power grid an accurate and reliable forecasting of production would be needed. Starting from a dataset of electricity production of 65 Italian solar plants for the years 20112012 we investigate the possibility to forecast daily production from one to ten days of lead time without using on site measurements. Our study is divided in two parts: an assessment of the predictability of meteorological variables using weather forecasts and an analysis on the application of datadriven modelling in predicting solar power production. We calibrate a SVM model using available observations and then we force the same model with the predicted variables from weather forecasts with a lead time from one to ten days. As expected, solar power production is strongly influenced by cloudiness and clear sky, in fact we observe that while during summer we obtain a general error under the 10% (slightly lower in south Italy), during winter the error is abundantly above the 20%.
keywords:
Photovoltaic system; solar power forecasting; renewable energy modelling; Solar irradiance;1 Introduction
Europe is experiencing a growing penetration of photovoltaic (PV) production, in particular Italy that in 2012 had almost PV plants (16.4 GW of total installed power) gse2012 (), 44% more than 2011. Modelling of daily electricity generation of a PV power system can be useful for an effective management and balancing of a power grid, supporting realtime operations especially in countries with a lot of solar energy potential. Forecasting the expected PV power production could in fact help to deal with its intermittency, mainly due to weather conditions. Moreover, shortterm forecasting information can also be valuable for electric market operators.
Production of a PV plant can be modelled in two ways: with a mathematical model and a datadriven approach, the latter often called blackbox modelling. Both the approaches have their pros and cons, the former can be more accurate but in addition to weather variables (incoming solar radiation, air temperature, wind speed, etc.) it needs solar panel characteristics (technology, area, orientation, etc.). The blackbox approach does not require information about the typology of PV panel but it needs long timeseries of input and output variables to calibrate a reliable model. In our work, we use a Support Vector Machine (SVM, briefly introduced later in Sec. 4) to perform the prediction of daily production using both solar radiation and temperature information. The choice of a blackbox approach is due to the absence of both detailed information about solar panel characteristics and onsite measurements of solar irradiance and air temperature.
SVMs have been already used for similar applications, Zeng & Qiao zeng13 () tested a SVMbased approach using data from three different sites outperforming both autoregressive and neural networkbased models; Bouzerdoum et al. bouzerdoum13 () proposed a hybrid SARIMASVM approach which performed better than both the single models in predicting hourly power output of a small PV plant. More in general, blackbox methods are common for forecasting applications related to solar power and solar radiation (e.g., see Pedro & Coimbra pedro12 ()).
Our work is based on daily power production data of 65 gridconnected PV systems on Italy during the period 20112012. For each plant a SVM model has been built and tested with the best available weather observations of solar radiation and air temperature, respectively provided by CMSAF satellite and weather stations. Then, the same SVM models are used for forecasting power production using as inputs data the weather forecasts of solar radiation and temperature.
In the next section, we introduce and describe weather and production data for modelling and forecasting parts, respectively presented in Section 4 and 5. For a better comprehension of the forecasting results, we also analyse the predictability of solar radiation and temperature provided by weather forecasts in Sections 3.1 and 3.2. The final section provides a summary and conclusion.
2 Data
In this work a datadriven approach has been chosen, mainly due to the unavailability of detailed data about power plants and weather measurements. The effectiveness of a datadriven approach, as the name suggests, strongly relies in the appropriateness and quality of input/output data. Input data are here weather variables, solar radiation and air temperature, while the output variable is the electricity production. Solar radiation is converted into electricity by photovoltaic modules and for this reason the choice of surface incoming solar radiation as model input is obvious. Air temperature is also an important variable: solar panels efficiency is sensitive to module temperature, depending of the specific equipment when it exceeds a threshold (generally about C) the panel efficiency begins to drop. For an improved modelling of the module temperature the cooling effect of the wind also should be taken into account (as described in Schwingshackl et al. schwingshackl13 ()) and its inclusion is left for future work.
2.1 Meteorological Data
Solar radiation measurements used in this paper are obtained from the Satellite Application Facility on Climate Monitoring (CMSAF) schulz09 (), part of EUMETSAT’s SAF Network. Considered variable is the surface incoming shortwave (SIS) radiation on the Meteosat (MSG) full disk. In Figure (a)a is visible the average daily solar radiation and its coefficient of variation (Figure (b)b), i.e. the ratio between standard deviation and average.
For the air temperature, we instead consider the EOBS gridded dataset haylock08 (), a landonly highresolution temperature dataset obtained interpolating on a regular grid the available meteorological stations (4200 stations at the latest release made available in October 2013).
Weather forecast of solar radiation and temperature data are provided by the ECMWF Integrated Forecasting System (IFS) which runs twice per day with a resolution of 16 km.
Observed  Forecast  

2m temperature  EOBS ( 25 km)  ECMWF IFS ( 16 km) 
Downward solar radiation  CMSAF ( 5 km)  ECMWF IFS ( 16 km) 
In Table 1 are summarised all the data sources used in this paper.
2.2 Production Data
In this work we consider 65 different PV power plants located in different Italian regions. We divided the plants in two groups: North and South. In the first group (North) we have all the PV plants above the latitude, 34 PV plants with a total of 127 MW of installed capacity. Remaining plants are in the other group (South), 31 PV plants with a total of 288 MW.
For each plant we have a timeseries of daily power production of variable length, between 18 and 24 months (550–731 daily samples).
3 Daily Predictability of Meteorological Data
In this section we analyze the capability of the ECMWF numerical weather prediction model to forecasts the two main predictors for solar power production: solar radiation and air temperature. Both the meteorological variables are provided by the ECMWF global forecast model, which data is available on grid and with a time step of 3 hours up to ten days in advance.
An assessment on the forecasting skills of ECMWF model can be found in Richardson et al. richardson13 (). Other studies on the use of solar radiation forecasts can be found in Lorenz et al. lorenz09 () and Mathiesen & Kleissl mathiesen11 ().
3.1 Solar Radiation
ECMWF operational deterministic forecasts are issued every day and they provide hourly estimation of several variables up to ten days. We used the surface solar radiation downwards variable, i.e. the incident shortwave radiation, accumulated over the day.
We compared the forecasts with the values measured by CMSAF satellite data for the years 20112012.
In Figure 2 we can observe the spatial correlation between forecasts and satellite data on the entire domain as a function of the lead time of forecast.
To better give an idea of the forecast quality, in Figure 3 we show an example of a specific day forecast with three different lead times: one day (Fig. (b)b, correlation of 0.93), five days (Fig. (c)c, correlation 0.90) and ten days (Fig. (d)d, correlation 0.67).
Solar radiation exhibits a clear seasonal cycle and for this reason absolute error measures (e.g. RMSE) might be misleading. We decide to use a percentage error measures, the Median Absolute Percentage Error (MdAPE) defined as:
(1) 
where is the observed value and the estimation at time .
Figure 4 illustrates the MdAPE of the predicted solar radiation with respect to the latitude for three lead times (1, 5, 10 days, the other lead times have been omitted for sake of clarity). It is evident how the prediction error is related to the lead time, with one day the average MdAPE on the entire domain (3050 latitude) is , with five days is and at ten days is .
We can observe how the performance of the forecast decreases at the high latitudes, due to the higher weather variability as also shown in Fig. (b)b.
According to the North/South classification proposed in Section 2.2, in Figure 5 we show the density plot of solar radiation provided by CMSAF (satellite) and by the forecast at one, five and ten days of lead time. Looking at the density plot for the North Italy (Fig. (a)a), we can quickly see the difference among the three lead times in describing the two peaks, especially for the minor one. Observing the density comparison for the South Italy (Fig. (b)b) we instead see how the three lead times show a similar distribution. It can be seen also that for the South Italy the forecasts tend to underestimate the highest peak.
3.2 Air Temperature
As for the downwards solar radiation, we analyze the predictability of air temperature provided by ECWMF deterministic forecasts by comparing it with the observations. As stated in Section 2.1, we used as observation the EOBS dataset for the years 20112012.
Figure 6 shows descriptive statistics of observed temperature over Italy for the years 20112012. The coefficient of variation (Figure (b)b) clearly follows Italian orography, with the higher variability of temperature mostly in the mountain areas. The density plot of observed and predicted temperature (Fig. 7) shows a higher correspondence of forecasts with respect to the similar plot for solar radiation in Figure 5.
4 Modelling PV production using satellite data
To perform a forecast of the solar power production we first need to find an accurate relationship between daily meteorological variables (here solar radiation and temperature) and power production. We need to find a function for each PV plant with the following form:
(2) 
with the predicted power output and SSR and T respectively the surface solar radiation and the ambient temperature available for the ith PV plant. This function aims to model the relationship between the weather variables and the electricity produced, trying to minimise the error between observed and estimated values. A blackbox approach will focus at the same time on the minimization of the modelling error and on the maximization of the generalization, i.e. the capability of giving consistent outputs with unseen inputs. Given the absence of onsite measurements, here we consider as inputs the bilinear interpolation among the four nearest grid points of solar radiation and temperature data.
Although the photovoltaic process is nonlinear, it is a good practice to start with the simplest model for the function, a linear regression model with the following form:
(3) 
Minimizing the error through Ordinary Least Squares, we obtain an average MdAPE of 12.4% on crossvalidation. A kfold (with ) crossvalidation procedure here is used: as first step we divide the available dataset in subsamples of equal size, and then for times the chosen model is calibrated using subsets and then tested on the remaining one. At the end of the steps, the crossvalidation error is given as the average of all the obtained errors.
Afterwards, we use a nonlinear model, a Support Vector Machine (SVM).
SVMs were developed by Cortes & Vapnik cortes95 (); vapnik00 () for binary classification and then extended to regression problems (Support Vector Regression). The idea behind the support vectorbased methods is to use a nonlinear mapping to project the data into a higher dimensional space where solving the classification/regression task is easier than in the original space.
In our case, we used a Support Vector Regression method called SVR drucker97 (), which tries to find a function that has at most deviation from the target values. A SVR model has three parameters: the regularization parameter , the value, and the width of the radial kernel .
For each PV plant we chose the optimal parameters of the SVR model applying a grid search among 75 combinations of , ) and . After the parameters’ selection, as for the linear models, we compute the crossvalidation error. We obtain an average MdAPE of 7.6%, about the 40% lower than in the linear case. This improvement is expected, given the highest modelling power due to the inherent nonlinearity of SVR with respect to linear regression.
5 Shortterm forecast of solar power production
In this section we assess the forecasting skill using the SVM models created in the previous section driven by predicted weather variables instead of observations.
As summarised in Table 1 and explained in Section 2.1, for the prediction we use the meteorological data coming from the ECMWF operational forecasts. As for the modelling part, for each PV plant we use bilinear interpolation of the nearest four grid points as input variables.
For each day of lead time we show the error of the power production in Figure (a)a while the correlation between predicted and observed output is shown in Figure (b)b. The minimum error is with one day of lead time () and it grows steadily up to with ten days of lead time. In all the cases the prediction of the PV plants in the South of Italy is more accurate than in the North, and we observe that the interquartile range also increases with the lead time, evidencing the higher uncertainty due to the weather forecasts at bigger lead times. Looking at the correlation we can see that with one day of lead time for both the cases it is in the range while at ten days it drastically decreases below .
The error analysis can be improved grouping the errors by season, as in Figure 10. In this figure is clearly evident the difference of errors between spring /summer, where it is common to have clear sky in most of the country, and autumn/winter, where the errors reach about the of MdAPE.
Finally, the plot shown in Figure 11 makes evident how in both the cases the prediction densities of the three lead times provided look very similar, evidencing a general tendency to underestimate high yields.
6 Conclusions
In this paper, we have shown an assessment about the shortterm predictability of photovoltaic daily power production over Italy without the use of onsite measurements. A detailed analysis of the weather forecast performances of solar radiation and temperature has been performed, in order to get a deeper understanding of the solar PV forecast performances.
Using a Support Vector Machine model, we have analysed the modelling error of power production using solar radiation and temperature observations, respectively from satellite and weather stations. We have compared the prediction error obtained using weather forecasts as inputs for lead times between one and ten days.
The results can be outlined as follows:

Without using onsite measurements and using instead meteorological information provided by satellite and weather stations interpolated on the PV plant location, we obtain an average crossvalidation percentage error (MdAPE) of 12.4% using a linear model and 7.6% a SVM.

Solar power production modelling on Italy was found to be more accurate during summer than in the rest of the year: the error is below the 5% when we use observed meteorological data as predictors and below the 12% for the entire prediction range when we use forecasted predictors.

The prediction results for the PV plants in the South Italy were comfortably superior than those in the North, mainly due to the lower weather variability in the southern part of the country.
Uncertainty due to the absence of information related to local phenomena (e.g. orography, shading effects) becomes certainly critical in predicting PV power production, especially for the higher lead times. The uncertainty due to weather forecasts can be estimated observing the “distance” between the modelling (Fig. 8) and prediction (Fig. 9, 10) errors. The former in fact represents the error due to model limitations and observation errors due to interpolation. When we apply the same model for the forecasting, we add then the error due to the weather predictions, the same error discussed in Sections 3.1 and 3.2. Figure 12 tries to represent this “uncertainty propagation” showing the relationship between the PV production error (the same as in Fig. (a)a) and the forecast error of the used meteorological predictors (solar radiation and temperature).
These results demonstrate the potentiality in using blackbox approach in spite of the absence of onsite measurements.
7 Acknowledgments
We thank TERNA for providing photovoltaic data. EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) intermediate products were used by permission of Deutscher Wetterdiens. We also acknowledge the EOBS dataset from the EUFP6 project ENSEMBLES (http://ensembleseu.metoffice.com) and the data providers in the ECA&D project (http://www.ecad.eu).
References
 [1] M. Bouzerdoum, A. Mellit, and A. Massi Pavan. A hybrid model (SARIMA–SVM) for shortterm power forecasting of a smallscale gridconnected photovoltaic plant. Solar Energy, 98:226–235, 2013.
 [2] C. Cortes and V. Vapnik. Supportvector networks. Machine learning, 20(3):273–297, 1995.
 [3] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155–161, 1997.
 [4] GSE. Rapporto Statistico 2012 Solare Fotovoltaico (Italian). http://www.gse.it/it/Statistiche/RapportiStatistici/Pagine/default.aspx, May 2013.
 [5] M.R. Haylock, N. Hofstra, A.M.G. Klein Tank, E.J. Klok, P.D. Jones, and M. New. A European daily highresolution gridded data set of surface temperature and precipitation for 1950–2006. Journal of Geophysical Research: Atmospheres (1984–2012), 113(D20), 2008.
 [6] E. Lorenz, J. Hurka, D. Heinemann, and H.G. Beyer. Irradiance forecasting for the power prediction of gridconnected photovoltaic systems. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of, 2(1):2–10, 2009.
 [7] P. Mathiesen and J. Kleissl. Evaluation of numerical weather prediction for intraday solar forecasting in the continental United States. Solar Energy, 85(5):967–977, 2011.
 [8] Hugo T.C. Pedro and Carlos F.M. Coimbra. Assessment of forecasting techniques for solar power production with no exogenous inputs. Solar Energy, 86(7):2017–2028, 2012.
 [9] D.S. Richardson, J. Bidlot, L. Ferranti, T. Haiden, T. Hewson, M. Janousek, F. Prates, and F. Vitart. Evaluation of ECMWF forecasts, including 20122013 upgrades. Technical Memorandum 710, ECMWF, November 2013.
 [10] J. Schulz, P. Albert, H.D. Behr, D. Caprion, H. Deneke, S. Dewitte, B. Dürr, P. Fuchs, A. Gratzki, P. Hechler, et al. Operational climate monitoring from space: the EUMETSAT Satellite Application Facility on Climate Monitoring (CMSAF). Atmospheric Chemistry & Physics, 9(5), 2009.
 [11] C. Schwingshackl, M. Petitta, J.E. Wagner, G. Belluardo, D. Moser, M. Castelli, M. Zebisch, and A. Tetzlaff. Wind effect on PV module temperature: analysis of different techniques for an accurate estimation. Energy Procedia, 40:77–86, 2013.
 [12] V. Vapnik. The Nature of Statistical Learning Theory. springer, 2000.
 [13] J. Zeng and W. Qiao. Shortterm solar power prediction using a support vector machine. Renewable Energy, 52:118–127, 2013.