An Unsupervised Deep Learning Approach for Scenario Forecasts
Abstract
In this paper, we propose a novel scenario forecasts approach which can be applied to a broad range of power system operations (e.g., wind, solar, load) over various forecasts horizons and prediction intervals. This approach is modelfree and datadriven, producing a set of scenarios that represent possible future behaviors based only on historical observations and point forecasts. It first applies a newlydeveloped unsupervised deep learning framework, the generative adversarial networks, to learn the intrinsic patterns in historical renewable generation data. Then by solving an optimization problem, we are able to quickly generate large number of realistic future scenarios. The proposed method has been applied to a wind power generation and forecasting dataset from national renewable energy laboratory. Simulation results indicate our method is able to generate scenarios that capture spatial and temporal correlations. Our code and simulation datasets are freely available online.
I Introduction
The integration of high penetration of renewable generation into power systems calls for a growing need to model the uncertain and intermittent characteristics of these resources. An important method used in characterizing the behavior of renewable resources is scenario generation, where a set of possible future realizations are provided for the system operator. Compared to deterministic point forecasts or probabilistic forecasts [1], scenario forecasts could not only inform users of the uncertainty about the future, but also reflect the temporal dependence of renewable power generation [2, 3]. The information provided by these generated scenarios is valuable for a host of decisionmaking and stochastic optimization problems, such as the economic dispatch of renewables [4, 5], unit commitment [6, 7] and many others. Therefore, in recent years, many algorithms have been introduced for various applications, from load forecasting to wind and solar power generations.
One of the biggest challenges of scenario forecasts is the difficulty of modeling and learning the underlying stochastic processes that drives renewable power generation [8]. Previous statistical or physical methods like firstorder autoregressive model [9], ensemble methods and Gaussian Copula [2, 10, 11] either required strong statistical assumptions or detailed physical measurements and modeling. What’s more, most of these methods focus on capturing the marginal distribution of each individual time slots of the forecasting horizon, while paying less attention to the temporal correlations in the scenarios [12].
In [13], an unsupervised machine learning algorithm using Generative Adversarial Networks (GANs) [14] was introduced to directly generate realistic scenarios based only on historical data, without the need to fit an explicit model. In this paper, we extend the algorithm to the scenario forecasting problem. Compared to the work in [13], this paper focuses on generating scenarios conditioned on a given central forecast. Our proposed method is entirely modelfree and data driven. Based on deep learning, the GANs used in our proposed method are unsupervised learners who can directly learn and generate timeseries which hold same properties as the training timeseries. The following optimization step would help us find group of scenarios based on point forecasts. Our approach can be used for a variety of scenario forecasts problems, e.g., wind and solar generation, and is easy to adjust the forecast horizon (e.g. ranging from from 12 hours to 12 days) with little tuning. Specifically, we make the following contributions:

Based on any provided point forecast method along with historical observations, our method is able to generate a group of shortterm forecasting scenarios representing the temporal correlations and fluctuation distribution.

The proposed approach can generate scenarios without forecast horizon or number restrictions.

The proposed approach is free of statistical assumptions and ready to use in real power generation processes.
Our proposed method for scenarios forecasts contains the following two components:
I1 Generative Adversarial Networks
A generative adversarial networks (GANs) is composed of two deep neural networks, the generator and the discriminator, who play a zerosum game. GANs are provided with past observations of renewables generation processes. Suppose these samples are drawn from an unknown underlying “true” distribution. The generator has access to a welldefined noise distribution (e.g., Gaussian) and can draw i.i.d. samples from this distribution. The generator’s goal is to find a function that transforms a vector from the known noise distribution to a sample following the same distribution as past observations. The discriminator’s goal is to distinguish the generated data and the true historical data. By training the generator and the discriminator to an equilibrium, the discriminator can no longer distinguish between generated and historical data, which means the generator can produce realistic timeseries samples as if they are coming from the true distribution [14, 15]. Section. II describes this approach in more detail.
I2 Optimization of scenarios forecasts
We are interested in forecasting a group of scenarios which could inform system operators the possible future realizations of power generation process. In Section. III we detail the setup of the optimization problem with a pretrained GANs model. We also show how the optimization problem can be solved iteratively to obtain highquality scenario forecasts. Some generated scenarios with different forecast horizons are shown in Fig. 2.
Numerical simulation are performed and evaluated in Section.IV. We will show our generated scenarios not only satisfy the needs of reliability and accurateness as a forecasting tool, they also capture the temporal dynamics of power generation. We also make our code for the proposed method freely available^{1}^{1}1https://github.com/chennnnnyize/ScenarioForecastsGAN, which can meet the needs for an efficient computation tool for generating reliable and accurate scenarios.
Ii Generative Adversarial Networks
In this section, we describe the setup for GANs [14]. We first formulate the training objectives for the discriminator and the generator respectively, and show GANs is a good fit to generate a potentially unlimited number of renewable power production timeseries. In Section III we illustrate how this timeseries producer can be served in an optimization problem to find desired scenarios forecasts.
The architecture of GANs we use is shown in Fig. 1a. Assume observations for times of renewable power production are available for each power plant , . Denote the true distribution of the observation as , which is unknown and maybe difficult to model because of complex spatial and temporal correlations. Suppose we have access to a group of noise vector input under a known distribution that is easily sampled from (e.g., jointly Gaussian or uniform). Given a sample drawn from , our goal is to find a function such that after transformation, follows . This is accomplished by simultaneously training two deep neural networks: the generator network and the discriminator network . Here, and denote the weights of two neural networks, respectively. For convenience, we sometimes suppress the symbol .
Generator: During the training process, the generator is trained to take a batch of inputs from the noisy distribution , and by taking a series of upsampling operations by neurons of different functions, and to output realistic timeseries samples. Ideally, they should appear as if drawn from . Therefore, after training finishes, the mapping should follow the true data distribution .
Discriminator: The discriminator is trained simultaneously with the generator. It takes input samples either coming from real historical data or coming from the generator. By taking a series of operations of downsampling using another deep neural network, it outputs a continuous value that measures to what extent the input samples belong to . The discriminator can be expressed as , where may come from or . The discriminator is trained to learn to distinguish between from , and thus to maximize the difference between ( from real data) and .
With the objectives for the discriminator and the generator defined, we can now formulate loss function for the generator and for the discriminator to train to optimize the performance of them (i.e., update neural networks’ weights based on the losses). A small reflects that is as realistic as possible from the discriminator’s perspective, e.g., the generated scenarios are “looking like” historical scenarios to the discriminator. Similarly, a small indicates discriminator is good at telling the difference between generated scenarios and historical scenarios, which means there is a large difference between and . Following this guideline and the loss defined in [15], we define and as:
(1a)  
(1b) 
In the above, the expectations are taken as empirical averages based either on the historical data or on the generated data. Note the functions and are parametrized by the weights of two distinct deep neural networks.
We can now combine (1a) and (1b) to construct the minimax game value function for these two players:
(2) 
where is the negative of .
During first few training iterations, just generates timeseries samples totally different from samples in , and after learning from those samples coming from , the discriminator is able to reject with high confidence. In that case, is small, and , are both large. The generator gradually learns to generate more realistically looking samples, while at the same time the discriminator is also trained to distinguish these newly fed generated samples from . As training moves on and moves close the the equilibrium, is able to generate samples that look as realistic as real power generation timeseries corresponding to a small value, while is unable to distinguish from with large . Eventually, we are able to learn an unsupervised representation of the probability distribution of renewables timeseries. By sampling , we get that appears “as if” it was sampled from the true distribution.
More formally, the minimax objective (2) of the game can be interpreted as the dual of the socalled Wasserstein distance (EarthMover distance) [16]. The Wasserstein distance between two distribution and measures the effort (or “cost”) needed to transport to . It is shown in [15] that we are precisely trying to get two distributions, and to be close to each other by defined loss for and in (1a) and (1b) respectively.
Note that unlike previous approaches for generating scenarios given historical observations, which all involve the modeling of renewables generation stochastic processes [2, 10, 11], by using GANs we bypass the step of learning or modeling explicitly. The training algorithm of GANs for generating renewables timeseries is summarized in Algorithm 1. In Fig. 3, we show the evolution of the loss function of the discriminator during training process and how the generated samples learn to mimic real historical data. In comparison, we show that the discriminator can always reliably detect that the scenarios generated using a Gaussian copula method [2] from the true realizations.
Iii Scenarios Forecasts Using GANs
In this Section, we show by formulating the scenario forecasts as an optimization problem, a trained GANs can be used to generate a group of scenarios given past observations and point forecasts.
Iiia Mathematical Formulation
For a typical renewable power generation site, assume at timestep , we have records for actual past power outputs with . Meanwhile, we have some forecasting method to obtain the point forecasts for each lookahead time given . This forecast is denoted by , where is the forecasting horizon. Based on the historical information and the point forecast, we are interested in generating a group of scenarios , which represent the possible variations around the point forecast and accurately reflect the temporal dynamics of future generation. Note we focus on the scenario forecasting problem, so the central point forecast can be provided by any method.
Assume we have trained a GANs model based on the set of observations. Given some input noise , generates a possible realization without regarding to the historically observed data and the point forecast . Therefore, we need to constrain the possible to satisfy two conditions: 1) the part of from time index to should be close to the historical data ; 2) the part of from time index to should be realistic and respect the point forecast.
To describe these conditions, we introduce two projection operators that separate a vector into two parts. Given , we denote two projection operations and to extract former and latter dimensions of , respectively.
Meanwhile, we want to constrain generated scenarios do not conflict with the information provided by point forecasts (e.g., information from numerical weather prediction (NWP)). Then we can constrain latter part of so that they do not conflict with the given . Then to ensure that the first part of resembles , we use the following cost function:
(3) 
To ensure the generated scenarios are realistic, we add a loss term where is the discriminator output (recall larger discriminator ouput indicates more realistic samples). Finally, we use the point forecast by defining a prediction interval that the generated scenarios should lie in [17]. We describe this interval with an upper bound and a lower bound , controlled by a parameter (can be interpreted as the prediction confidence):
(4) 
Using all of the objectives and constraints above, given the observation and forecast vector pair , , and GANs pretrained model , , the scenario forecasts problem can be formulated as a constrained optimization problem:
(5)  
(5a)  
(5b) 
where is a weighting parameter; (5a) constrains to be within the domain of , which we take to be a hypercube ; (5b) constrains the generators’ output to be within the given prediction intervals given . By solving above optimization problem, we can obtain a forecasting scenario .
Since both of the objective and constraints in (5) are nonconvex, to deal with the inequality constraints (5b), we propose to substitute it into the main objective with two log barriers. Then the optimization problem is reformulated as
(6a)  
(6b) 
where is the weighting parameter for log barriers.
In next subsection, we will illustrate how we are able to find a group of solutions for problem defined in (6) given the fixed, differentiable, yet nonconvex function .
IiiB Forecasting Scenarios with Pretrained GANs
Because of the highly nonconvex nature of and , there exist many local optima in (5). The key to finding a group of solutions to (5) exploits that fact. Figure 4 shows the landscape of solutions in a onedimensional illustration. To ensure that we reach a good local optimum, we add momentum to the gradient descents algorithm [18] to skip from saddle points and shallow local optima.^{2}^{2}2There is a growing body of literature on the local optima of nonconvex functions and interested readers can refer to [18] and the references within.
Since there are multiple local optima to (5), we can start at different initial points and find distinct forecasting scenario by solving (6) using gradient descents with momentum (MomentumGD). As the training loss defined in (2) incurs to generate diverse modes given different , we are able to obtain a group of distinct yet realistic scenarios with different initial starting values.
In order to obtain good starting points for which do not fall outside of the log barriers in (6), we first solve the following subsidiary problem:
(7)  
(7a)  
(7b) 
where is sampled uniformly at random from . In order that we can always obtain a good initial , we set in (7) to be slightly smaller than in main objective function (6). In Algorithm 2 we summarize our approach for generating a group of scenarios provided with a pretrained GANs weights as well as pairing historical measurements and point forecasts.
Iv Simulation Results
In this section, we study the performance of the proposed method for scenario forecasts over various forecasting horizons. We focus on wind power production, and show the scenarios generated by our method not only conform to the statistical properties of real measurements, but also capture the spatial and temporal correlations. We also show that our approach is flexible to generate scenarios for varying prediction intervals as well as prediction horizons. All experiments are implemented using Python 2.7 with deep learning opensource package TensorFlow [19]. The GANs training procedure is accelerated by two Nvidia Geforce GTX TITAN X GPUs.
Both the generator and the discriminator deep neural networks are composed of two convolutional layers and two fullyconnected layers. All models in this paper are trained using RmsProp optimizer [20], which is a selfadaptive gradient descent algorithm. Weights for neurons in both neural networks were initialized from a centered normal distribution with standard deviation of . Batch normalization is adopted before every layer except the input layer to stabilize learning by normalizing the input of every layer to have zero mean and unit variance. With exception of the output layer, rectified linear units (ReLU) has been used as the activation function in the generator, while LeakyReLU is used in the discriminator. We observed in Algorithm 1, a setting of could get the fastest convergence rate for . Once the discriminator has converged to similar outputs value for and , the generator was able to generate realistic power generation samples.
Iva Description of Data
In order to test the performance of our proposed framework for scenario generation, we set up our numerical simulations based on wind power data published by the NREL Wind Integration National Data Set (WIND) Toolkit ^{3}^{3}3https://www.nrel.gov/grid/windintegrationdata.html. Actual power measurements have a resolution of minutes. The dataset also contains deterministic, dayahead forecasts along with estimated and forecast quantiles. The detailed NWPbased forecasts method is described in [21]. We construct our dataset by aggregating wind turbines’ records from Jan.1st, 2007 to Dec. 31st, 2013. All these wind turbines are located in WA, USA and are of geographical proximity. Selected wind turbines have a nominal capacity of . In total there are measurements, and we split of daily samples as the real training data for our GANs model, while the remaining samples are only used to test the performance of the proposed scenario forecasts method. All wind power measurements and forecasts are normalized to .
IvB Validation Framework
The validation of the quality of generated scenarios is more complex than evaluating the performance of point forecasts. On the one hand, generated scenarios should be realistic enough to reflect the interdependence structure of forecasting values at different prediction horizons; on the other hand they should represent all the possible future realizations given past observations at certain wind farm.
First, we examine generated scenarios’ temporal statistics. We calculate and compare samples’ autocorrelation with respect to lookahead time :
(8) 
where represents sample either of generated scenarios or realizations with mean and variance .
We also make use of the Pearson’s correlation coefficient, which is a standard method to evaluate the linear relationship of timeseries at various lookahead times. Given the set of generated scenarios or realizations , each term in the Pearson’s correlation matrix denotes the Pearson correlation for lead time and , and is calculated by
(9) 
where is the covariance of and .
In order to verify the group of generated scenarios are able to represent possible future realizations, the scenarios should be able to cover the actual value of power generation (reliable), while at the same time distance between generated scenarios should be small (sharp). We make use of the Continuous Ranked Probability Score (CPRS) [22], which is a negativelyoriented score (smaller scores are better) that jointly evaluates the reliability and sharpness of generated scenarios. The score at lead time is defined as follows:
(10) 
where is the total number of evaluated scenarios, is the cumulative distribution for normalized generated scenarios’ value at lead time , and is the indicator function to compare the normalized scenarios and measurements. Since we are not using quantile statistics to calculate , we use the discretevalued to calculate (10).
IvC Simulation Results
Recall Fig. 3 showed the output evolution of , where and converged after about training iterations. In addition, we evaluate the quality of a generated timeseries from empirical Gaussian Copula method [2]. In this case, is able to distinguish the generated samples from real measurements. This observation suggests that eventhough Gaussian Copula method tries to model the interdependence structure for timeseries, the generated scenarios are still different from actual realizations.
We then validate if scenarios coming from our method have similar temporal correlation as the actual wind power values. In Fig. 5 we plot the colormap for the covariance matrix of a group of wind turbines’ hour actual measurements, along with forecasting scenarios with scenarios for each realization. The and axes are for the prediction horizon . Similar covariance matrix element values indicate that without any model assumptions being made during training process, our proposed scenario generation method is able to capture the temporal dependency accurately.
Fig. 2 shows a group of generated scenarios with forecasts lead time ranging from to . We show that by only changing projection length , our approach is able to conveniently generate reliable and sharp scenarios for different forecast horizons. Meanwhile, these scenarios’ autocorrelation plots cover the realizations, which indicate the generated scenarios are able to represent the temporal dependence of any length.
In Fig. 6 we specifically select one hour sample whose point forecast is deviating a lot from the actual measurements. By selecting different prediction intervals, our proposed method could reflect the tradeoff between reliability and sharpness. When the interval level is , generated scenarios are close to point forecasts, yet fail to cover the realizations; while when , generated scenarios could cover the actual power production values, but are less concentrated.
The performance of the proposed method is also demonstrated by the CPRS score. Results for our approach and Gaussian copula method are plotted in Fig. 7. Both approaches use the same training dataset to get the timeseries generator or to find the estimate of the covariance matrix, and are tested on the standalone testing samples. The proposed method has better performance at different lead time compared to Gaussian Copula. Since point forecasts normally accumulate larger errors with longer forecast horizons, both methods have growing CPRS values with respect to forecasting horizons.
V Conclusion
In this paper we proposed a datadriven unsupervised machine learning approach for forecasting scenarios of renewables power generation processes. The proposed method is flexible and easily implemented in problems with high penetration of renewables. Numerical results show that comparing with existing scenario generation approaches, the proposed method is able to generate realistic, high quality scenarios capturing spatiotemporal behaviors of renewables without any explicit model construction.
References
 [1] M. Lei, L. Shiyan, J. Chuanwen, L. Hongling, and Z. Yan, “A review on the forecasting of wind speed and generated power,” Renewable and Sustainable Energy Reviews, vol. 13, no. 4, pp. 915–920, 2009.
 [2] P. Pinson, H. Madsen, H. A. Nielsen, G. Papaefthymiou, and B. Klöckl, “From probabilistic forecasts to statistical scenarios of shortterm wind power production,” Wind energy, vol. 12, no. 1, pp. 51–62, 2009.
 [3] Y. Wang, Y. Liu, and D. S. Kirschen, “Scenario reduction with submodular optimization,” IEEE Transactions on Power Systems, vol. 32, no. 3, pp. 2479–2480, 2017.
 [4] C. Tang, Y. Wang, J. Xu, Y. Sun, and B. Zhang, “Economic dispatch considering spatial and temporal correlations of multiple renewable power plants,” arXiv preprint arXiv:1707.00237, 2017.
 [5] Y. Gu and L. Xie, “Stochastic lookahead economic dispatch with variable generation resources,” IEEE Transactions on Power Systems, vol. 32, no. 1, pp. 17–29, 2017.
 [6] Y. Feng, I. Rios, S. M. Ryan, K. Spürkel, J.P. Watson, R. J.B. Wets, and D. L. Woodruff, “Toward scalable stochastic unit commitment. part 1: load scenario generation,” Energy Systems, vol. 6, no. 3, pp. 309–329, 2015.
 [7] G. Osório, J. LujanoRojas, J. Matias, and J. Catalão, “A new scenario generationbased method to solve the unit commitment problem with high penetration of renewable energies,” International Journal of Electrical Power & Energy Systems, vol. 64, pp. 1063–1072, 2015.
 [8] G. Giebel, R. Brownsword, G. Kariniotakis, M. Denhard, and C. Draxl, “The stateoftheart in shortterm prediction of wind power: A literature overview,” ANEMOS. plus, Tech. Rep., 2011.
 [9] R. Barth, L. Söder, C. Weber, H. Brand, and D. J. Swider, “Methodology of the scenario tree tool,” Wilmar Deliverable, vol. 6, 2006.
 [10] S. Delikaraoglou and P. Pinson, “Highquality wind power scenario forecasts for decisionmaking under uncertainty in power systems,” in 13th International Workshop on LargeScale Integration of Wind Power into Power Systems as well as on Transmission Networks for Offshore Wind Power (WIW 2014), 2014.
 [11] T. Wang, H.D. Chiang, and R. Tanabe, “Toward a flexible scenario generation tool for stochastic renewable energy analysis,” in Power Systems Computation Conference (PSCC), 2016. IEEE, 2016, pp. 1–7.
 [12] X.Y. Ma, Y.Z. Sun, and H.L. Fang, “Scenario generation of wind power based on statistical uncertainty and variability,” IEEE Transactions on Sustainable Energy, vol. 4, no. 4, pp. 894–904, 2013.
 [13] Y. Chen, Y. Wang, D. Kirschen, and B. Zhang, “Modelfree renewable scenario generation using generative adversarial networks,” arXiv preprint arXiv:1707.09676, 2017.
 [14] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [15] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, 2017.
 [16] C. Villani, Optimal transport: old and new. Springer Science & Business Media, 2008, vol. 338.
 [17] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, and K. P. Wong, “Probabilistic forecasting of wind power generation using extreme learning machine,” IEEE Transactions on Power Systems, vol. 29, no. 3, pp. 1033–1044, 2014.
 [18] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in International conference on machine learning, 2013, pp. 1139–1147.
 [19] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
 [20] T. Tieleman and G. Hinton, “Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012.
 [21] B.M. Hodge, “Final report on the creation of the wind integration national dataset (wind) toolkit and api: October 1, 2013september 30, 2015,” NREL (National Renewable Energy Laboratory (NREL), Golden, CO (United States)), Tech. Rep., 2016.
 [22] P. Pinson and R. Girard, “Evaluating the quality of scenarios of shortterm wind power generation,” Applied Energy, vol. 96, pp. 12–20, 2012.