Virtual Battery Parameter Identification using Transfer Learning based Stacked Autoencoder
Recent studies have shown that the aggregated dynamic flexibility of an ensemble of thermostatic loads can be modeled in the form of a virtual battery. The existing methods for computing the virtual battery parameters require the knowledge of the first-principle models and parameter values of the loads in the ensemble. In real-world applications, however, it is likely that the only available information are end-use measurements such as power consumption, room temperature, device on/off status, etc., while very little about the individual load models and parameters are known. We propose a transfer learning based deep network framework for calculating virtual battery state of a given ensemble of flexible thermostatic loads, from the available end-use measurements. This proposed framework extracts first order virtual battery model parameters for the given ensemble. We illustrate the effectiveness of this novel framework on different ensembles of ACs and WHs.
In recent years, deep learning has been gaining popularity among researchers due to its inherent nature of imitating human brain learning process. Specifically, several research works show the applicability of deep learning in capturing representative information from raw dataset using multiple nonlinear transformations as shown in . Deep learning based methods can model high level abstractions in data utilizing multiple processing layers, compared to shallow learning methods. In , deep learning methods are used for simplifying a learning task from input examples. Based on scientific knowledge in the area of biology, cognitive humanoid autonomous methods with deep-learning-architecture have been proposed and applied over the years [11, 19, 30, 4, 28]. Deep learning replaces handcrafted feature extraction by learning unsupervised features as shown in . However, although there exists great potential, application of these deep network based frameworks are relatively sparse in power system applications.
The last few years have seen a significant increase in integration of renewables into the electricity grid, and the intermittent nature of renewables causes uncertainty in power generation. Moreover, there is increased visibility into the operations of thermostatically controllable loads (TCLs) such as air conditioners (ACs) and electric water heaters (WHs) due to advancements in power electronics, communication capabilities that enable remote monitoring/controlling of TCLs. With increased renewable penetration, these advancements allow TCLs to provide grid ancillary services in the form of demand response, such as frequency regulation [21, 6, 2, 16, 9, 29, 17, 22]. The thermostatic loads such as ACs and electric WHs can be characterized as ‘energy-driven’ loads, because end-use quality of service is reliant on the energy consumption over a duration. Availability of these loads to address demand response is strongly influenced by their dynamics. While modeling and coordinating each load individually in a power systems network is intractable for a grid operator, a reduced-order model to characterize the dynamic flexibility of an ensemble of loads is more amenable to be integrated into grid operation and planning tools. To this effect, recent studies have proposed the ‘virtual battery’ (or, ‘generalized battery’) models for an ensemble of flexible thermostatic loads.
Most recent works on virtual battery (VB) model identification can be seen in [20, 9, 15, 10]. These works include characterizing a VB model for a wide range of systems from small residential TCLs to complex systems such as commercial building heating, ventilating, and air-conditioning loads. Similar to a real battery, the VB also has self-dissipation, energy capacity, and power (charge/discharge) limits as parameters. Existing state of the art methods on identification of VB parameters for an ensemble of thermostatic loads can be either via closed-form (albeit, ad-hoc) expressions or via optimization-based techniques. Both of these require the knowledge of the individual load models and parameters. In real-world applications, however, it is expected that very little about the individual load models and parameters are known. The only available information are the end-use measurements such as power consumption, room temperature, device on/off status, etc. In this work, we propose an alternative, data-driven, deep neural network framework of characterizing the aggregate flexibility of ensemble of ACs and WHs using the available end-use measurements. Since the training of the deep networks is an offline process requiring high computational efforts, it is not desirable to retrain the network if number of TCLs in the ensemble change over time due to changing availability of end-use appliances. Henceforth, we propose a transfer learning based approach to identify the VB parameters of the new ensemble with minimal retraining.
The main contribution of this paper is to develop a novel transfer learning based stacked autoencoder framework for representing virtual battery state of a given ensemble of TCLs. We also propose a novel one-dimensional convolution based LSTM network for calculating rest of the VB parameters, utilizing the previously calculated VB state from the stacked autoencoder. Finally, the VB parameters are numerically identified for two different ensembles of ACs and WHs.
The organization of this paper is as follows. In Section 2 we introduce stacked autoencoder along with the respective optimization problem for its training. In Section 3, we introduce the training objective of a LSTM network, and we describe the proposed one-dimensional convolution operation in Section 4. Transfer learning for the stacked autoencoder is introduced in Section 5. In Section 6, we introduce the dynamics and the power limit calculations for any given ensemble of ACs and WHs. In Section 7-8, we introduce a novel training process for training the convolution based LSTM network along with a detailed description of the dataset and whole deep network based framework. Numerical results are discussed in Section 9.
2 Description of Stacked Autoencoder
Autoencoder  (AE) is a type of deep neural network which gets trained by restricting the output values to be equal to the input values. This also indicates both input and output spaces have same dimensionality. The reconstruction error between the input and the output of network is used to adjust the weights of each layers. Therefore, the features learned by AE can well represent the input data space. Moreover, the training of AE is unsupervised, since it does not require label information.
2.1 General setup for AE
We have considered a supervised learning problem with a training set of (input,output) pairs , that is sampled from an unknown distribution . is a dimensional vector in . is a lower () dimensional representation of . is linked to by a deterministic mapping , where is a vector of trainable parameters. Now, we briefly mention the terminology associated with AE.
Encoder: Encoder involves a deterministic mapping which transforms an input vector into hidden representation .
where is a set of parameters, and is weight matrix, is a bias vector of dimension , and is an activation function.
Decoder: The hidden dimensional representation is mapped to dimension , using mapping and represented as . The mapping is called the decoder. Similar to , is also an affine nonlinear mapping defined as
where . Also, is not an exact reconstruction of , but rather in probabilistic terms as the parameters of a distribution that may generate with high probability. Now, we can equate the encoded and decoded outputs as . Now, the reconstruction error to be optimized is . For real valued , , where represents euclidean distance between two variables. In other words, we will use the squared error objective for training our autoencoder. For this current work, we will use affine and linear encoder along with affine and linear decoder with squared error loss.
Furthermore, autoencoder training consists of minimizing the reconstruction error, by carrying out the optimization
where denotes the expectation, and is the empirical distribution defined by samples in . Now for the loss function defined before, the optimization problem can be rewritten as
Intuitively, objective of training an autoencoder is to minimize the reconstruction error amounts by maximizing lower bound on the shared information between and hidden representation .
2.2 Stacked Autoencoder (SAE)
We utilize the proposed AE and stack them to initialize a deep network as the similar way of deep belief networks  or ordinary AEs (,, ). Once the AEs have been properly stacked, the inner most encoding layer output as shown in schematic, Figure 1, is considered as a virtual battery representation of the ensemble of TCLs. Furthermore, the number of layers of the stacked AEs are designed based on the reconstruction error for AE. Keeping in mind, a sudden change in dimension in both encoding and decoding layers can cause a difficulty in minimizing the reconstruction error in . The parameters of all layers is fine-tuned using a stochastic gradient descent approach .
3 Description of Long-Short-Term-Memory network
We have used long short-term memory LSTM  to learn the long range temporal dependencies of virtual battery state of a given ensemble (encoded representation of the TCL states). For a LSTM cell with N memory units (see Figure 2), at each time step, the evolution of its parameters is determined by
where the and terms are the respective rectangular input and square recurrent weight matrices, are peephole weight vectors from the cell to each of the gates, denotes sigmoid activation functions (applied element-wise) and the , , and equations denote the input, forget, and output gates, respectively; is the input to the cell . The output of a LSTM cell is and denotes pointwise vector products. The forget gate facilitates resetting the state of the LSTM, while the peephole connections from the cell to the gates enable accurate learning of timings.
The goal of a LSTM cell training is to estimate the conditional probability where consists of a concatenated set of variables (virtual battery state of previous time steps and control input) and consists of virtual battery state of current time step. The proposed LSTM calculates this conditional probability by first obtaining fixed dimensional representation of the input given by the hidden state . Subsequently, the conditional probability of is calculated by using the hidden state representation . Now given a training dataset with input and output , training of the proposed LSTM is done by maximizing the log probability of the training objective
where denotes training set. After successful training, the forecasting is done by translating the trained LSTM as
where is the LSTM based prediction of output dataset .
4 Description of ConvNet
A simple convolution neural network (ConvNet) is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. In this work, we used two type of layers to build ConvNet architectures: a Convolution Layer and a Pooling Layer. We stack these two layers alternately to form a ConvNet. We can write the output of a one-dimensional ConvNet as follows:
One-dimensional convolution layer
Accepts a volume of size , where is the batch size of the training data set
Requires four hyperparameters: number of filters , their spatial extent , length of stride , and the amount of zero padding
Produces a volume of size , where
Each filter in a convolution layer introduces weights, and in total weights and biases
Accepts a volume of size
Requires two hyperparameters: spatial extent , and stride
Produces a volume of size , where
Introduces zero weights and biases, since the pooling layer computes a fixed function of the input.
Next, for our proposed ConvNet, we will outline the architecture as applied to predict VB state and simultaneously learn and estimate VB parameters.
Input , where denotes the batch size of our training process (two hours, with second resolution).
The convolutional (CONV) layer computes the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected in the input volume. We have multiple CONV layers in our proposed ConvNet, each with a different filter size . For an input layer of size , the output of CONV layer will be .
The rectified linear unit (RELU) layer will apply an element-wise activation function, such as thresholding at zero. This leaves the size of the output volume unchanged to .
The pooling (POOL) layer will perform a down-sampling operation along the spatial dimension (width, height), resulting in an output volume such as , with a filter size of .
5 Transfer learning via Net2Net for SAE
The structure (input node numbers) for the proposed SAE depends on the number of TCLs in the ensemble. This requires retraining of the SAE if we change the number of TCL in the ensemble. We are proposing to use the developed Net2Net strategy  where there is a change in number or type of TCLs in the ensemble. In order to explain this idea in the context of VB state modeling, we define ”source system” () as an ensemble of devices (where is a defined integer)
We are proposing to combine two Net2Net strategies, namely Net2WiderNet and Net2DeeperNet
As the name suggests, Net2DeeperNet allows to transform a network into a deeper one. Mathematically, Net2DeeperNet replaces one layer with two layers, i.e., gets replaced by . The new weight matrix is initialized as identity matrix and get updated in the training process. Moreover, we need to ensure that for all , in order to ensure Net2DeeperNet can successfully replace the original network with deeper ones.
Net2WiderNet allows a layer to be replaced with a wider layer, meaning a layer that has more neurons (can be also narrower if needed). Suppose that layer and layer are both fully connected layers, and layer uses an elementwise non-linearity. To widen layer , we replace with . If layer has inputs and outputs, and layer has outputs, then and . Net2WiderNet allows to replace layer with a layer that has outputs, with . We will introduce a random mapping function , that satisfies
The new weights in the network for target is given by
Here, the first columns of are copied directly into . Columns through of are created by choosing a random sample as defined in . The random selection is performed with replacement, so each column of is copied potentially many times. For weights in , we must account for the replication by dividing the weight by replication factor given by , so all the units have exactly the same value as the unit in the network in source .
Next, we discuss the application of this transfer learning to identify Virtual Battery (VB) parameters corresponding to an ensemble of ACs or WHs. We begin with the description of the VB model.
6 Virtual Battery Model
We use the following first order system model to describe the dynamics of a VB system,
where denotes the state of charge of the VB at time with the initial state of charge; denotes the self-dissipation rate; while the lower and upper energy limits of the VB are denoted by and , respectively. The regulation signal, acts as an input to the VB and must always lie within the power limits and . This simple first-order VB model can be applied to characterize the aggregated flexibility of many building loads and TCLs [9, 10, 15]. Note here that, unlike the typical assumption of symmetrical energy limits (i.e. ) in the existing VB identification methods, we allow the lower and upper energy limits to be different. Overall, the vector denotes the group of VB parameters.
In order to identify the VB parameters, synthetic data are generated concerning the performance of an ensemble of TCLs in responding to a frequency regulation service request. Specifically the synthetic data are generated using simulation models of TCLs described in this section and the regulation signals from PJM . This time series evolution of each device with respect to each regulation signal is then used to learn the first order VB system. For the sake of completeness, we also describe briefly the hybrid dynamical models of individual TCLs, used in simulations to generate synthetic data for training and testing of the transfer learning based deep network. Note, however, that the network itself is agnostic of the load models and parameters.
6.1 AC Model
|-||temperature inside the room|
|-||outside air temperature|
|-||set point temperature|
|-||dead band temperature|
|-||thermal capacitance of the room|
|-||thermal resistance of the room|
|-||power drawn by the AC: when ‘off’; when ‘on’.|
6.2 WH Model
|-||water temperature inside the tank|
|-||water temperature set-point|
|-||inlet water temperature|
|-||dead band temperature|
|-||thermal conductance of the tank|
|-||power drawn by the WH: when ‘off’; when ‘on’.|
7 Two-stepped training process
Before proceeding into the detailed training process for VB state prediction, we introduce few notations for clarity. and denote the functional representation of Convolution based LSTM network after first and second training process, respectively. and denote input and output to the Convolution based LSTM network, during first step of training process. is used as a historical window for prediction. denotes the prediction of VB state after second step of training process.
7.1 First step of training process
This step of the training process involved an unsupervised learning, by utilizing historical VB state and regulation signal of size (i.e., number of historical data points consisting VB state and regulation signal at each data point). Given VB state , and regulation signal , input to the unsupervised learning is defined as, , where for all , and output to the unsupervised learning is defined as . The objective of first step of the training process is to learn while minimizing the loss, i.e., .
7.2 Second step of training process
The purpose of having second step of training process is to mitigate the effect of error accumulation on the forecasting value, over time. For the second step training process, prediction of VB state from previous time step gets used in the next step, along with other historic VB state (we are using number of historical data points). Upon continuing to forecast in the future, the historic data window keeps getting filled with forecast from previous iterations. This cause a forecasting error accumulation, which results in a divergence of predicted VB state from the actual state magnitude. Algorithm 1 which comprises the second step, defines a novel way of mitigating this aforementioned error accumulation.
8 Data and Proposed Method Description
8.1 Data description
We propose VB state and subsequently VB parameter mentioned in Section 6 for an ensemble of homogeneous devices (AC, where each device is satisfying the dynamics in Eq. (8) and WH, where each device is governed by the dynamics in Eq. (9)). The regulation signals from PJM  are considered and scaled appropriately to match the ensemble of ACs and WHs. The devices in each ensemble has to change their state in order to follow a regulation signal. In doing so, while keeping the aggregate power of the ensemble close to the regulation signal, the switching action of the ensemble should not violate the temperature constraints of individual devices. The switching strategy is determined by the solution of an optimization problem (similar to as shown in ).
For ensemble of AC devices, a combination of ACs is considered and the ON and OFF devices at every time instance are identified by solving an optimization problem. This generates the temperature state of each device, , at each time iteration, for distinct regulation signals. If the ensemble fails to track a regulation signal, then the time-series data is considered up to the point where tracking fails. The outside air temperature and user set-point for each device are assumed to be same in this analysis.
The power limits of the ensemble are computed through a one-sided binary search algorithm as described in . The AC ensemble is simulated for hours with sec time resolution, for each regulation signal. For some regulation signals the ensemble violate the power limits and before the hours running time and the temperature of each AC is only considered, until when the ensemble satisfies the power limit. Finally for making the suitable dataset for applying SAE, we stack temperature of each AC devices, followed by temperature set points for each devices, load efficiency and thermal capacity of each AC devices, by column, and stack the data points for each regulation signals by row. For the selected ensemble, these stacking result in a dataset of dimension . To obtain the input stack to SAE for an ensemble of WHs, a similar approach as described above for AC devices is followed. While generating this data, it is assumed that water flow into the WHs is at a medium rate.
8.2 Method description
The SAE introduced in Section 2 is trained on the dataset described in Section 8.1 for an ensemble of 100 devices. The objective of the training of SAE is to represent the given dimensional dataset into a dimensional encoded space, and subsequently transforms the dimensional encoded representation back into the dimensional original data space, with tolerable loss. The selected layer dimension of proposed SAE is ----------, where all the activation functions are linear. Moreover, the variables in input dimensions are not normalized, to represents the VB state dependency on the input variables. That also motivates the necessity of having unbounded linear activation functions, throughout the proposed SAE.
Moreover, when more AC devices are added to the given AC devices ensemble, we leverage the proposed Net2Net framework introduced in Section 5, for the retraining and subsequent representation of the VB state for the dataset representing the new ensemble. Obviously the robust way is to retrain the proposed SAE architecture from scratch for the new dataset, but that includes higher computation cost and time. We can utilize the already trained network on AC devices ensemble, for the new ensemble dataset, which results in significant savings of computation cost and time.
Finally, we introduced convolution based LSTM network, for forecasting the VB state evolution, given any regulation signal. Given the SAE, only able to represent VB state, for the given time the state of TCL is available, it is required to utilize a deep network for predicting time evolution of VB state, for the ensemble of TCLs. Simultaneously, this proposed convolution based LSTM network can be used to estimate the remaining unknown parameter in the vector , which represents all the parameters related to VB. In the next section, we demonstrate the amalgamation of two proposed deep network framework for an ensemble of , AC devices, and , WHs.
9 Results and Discussion
We evaluate the performance of our proposed deep network framework on four different ensembles, namely ensemble of and AC devices and ensemble of and WH devices. Table 1 shows the effectiveness of utilizing proposed transfer learning instead of retraining the deep network for every ensemble. Table 1 shows an average computation time
Finally, this proposed framework is shown to be generalizable for different type of TCLs, and also shown to be transferable for different number of devices in any given ensemble. Our future work will involve applying this framework for heterogeneous ensembles.
- thanks: The authors would like to thank U.S. Department of Energy for supporting this work under the ENERGISE program (contract DE-AC02-76RL01830).
This work has been accepted in 17th IEEE International Conference on Machine Learning and Applications, Orlando, 2018. Copyright 2018 by the author(s).
- Throughout this paper, bold faced symbols denote vectors
- For clarity we discuss Net2Net in the context of homogeneous device ensemble.
- Net2WiderNet and Net2DeeperNet were first introduced by Chen et al. in 
- We used Epoch (number of training iterations) from Table 1 as a measure of computation time.
- Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 37–49, 2012.
- Saeid Bashash and Hosam K Fathy. Modeling and control insights into demand-side energy management through setpoint control of thermostatic loads. In American Control Conference (ACC), 2011, pages 4546–4553. IEEE, 2011.
- Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pages 153–160, 2007.
- Yoshua Bengio and Honglak Lee. Editorial introduction to the neural networks special issue on deep learning of representations. Neural Networks, 64(C):1–3, 2015.
- Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
- Duncan S Callaway and Ian A Hiskens. Achieving controllability of electric loads. Proceedings of the IEEE, 99(1):184–199, 2011.
- Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641, 2015.
- Ruisheng Diao, Shuai Lu, Marcelo Elizondo, Ebony Mayhorn, Yu Zhang, and Nader Samaan. Electric water heater modeling and control strategies for demand response. In Power and Energy Society General Meeting, 2012 IEEE, pages 1–8. IEEE, 2012.
- He Hao, Borhan M Sanandaji, Kameshwar Poolla, and Tyrone L Vincent. Aggregate flexibility of thermostatically controlled loads. IEEE Transactions on Power Systems, 30(1):189–198, 2015.
- He Hao, Di Wu, Jianming Lian, and Tao Yang. Optimal coordination of building loads and energy storage for power grid and end user services. IEEE Transactions on Smart Grid, 2017.
- Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.
- Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Justin T Huges, Alejandro D Domínguez-García, and Kameshwar Poolla. Virtual battery models for load flexibility from commercial buildings. In 2015 48th Hawaii International Conference on System Sciences (HICSS), pages 2627–2635. IEEE, 2015.
- Justin T Hughes, Alejandro D Domínguez-García, and Kameshwar Poolla. Identification of virtual battery models for flexible loads. IEEE Transactions on Power Systems, 31(6):4660–4669, 2016.
- Stephan Koch, Johanna L Mathieu, and Duncan S Callaway. Modeling and control of aggregated heterogeneous thermostatically controlled loads for ancillary services. In Proc. PSCC, pages 1–7, 2011.
- Soumya Kundu, Nikolai Sinitsyn, Scott Backhaus, and Ian Hiskens. Modeling and control of thermostatically controlled loads. arXiv preprint arXiv:1101.2157, 2011.
- Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, and Pascal Lamblin. Exploring strategies for training deep neural networks. Journal of machine learning research, 10(Jan):1–40, 2009.
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
- Johanna L Mathieu, Maryam Kamgarpour, John Lygeros, Göran Andersson, and Duncan S Callaway. Arbitraging intraday wholesale energy market prices with aggregations of thermostatic loads. IEEE Transactions on Power Systems, 30(2):763–772, 2015.
- Angel Molina-Garcia, François Bouffard, and Daniel S Kirschen. Decentralized demand-side contribution to primary frequency control. IEEE Transactions on Power Systems, 26(1):411–419, 2011.
- Sai Pushpak Nandanoori, Soumya Kundu, Draguna Vrabie, Karan Kalsi, and Jianming Lian. Prioritized threshold allocation for distributed frequency response. In Accepted in Conference on Control Technology and Applications (CCTA). IEEE, 2018.
- Cristian Perfumo, Ernesto Kofman, Julio H Braslavsky, and John K Ward. Load management: Model-based control of aggregate power for populations of thermostatically controlled loads. Energy Conversion and Management, 55:36–48, 2012.
- PJM. http://www.pjm.com.
- M Ranzato, PE Taylor, JM House, RC Flagan, Yann LeCun, and Pietro Perona. Automatic recognition of biological particles in microscopic images. Pattern recognition letters, 28(1):31–39, 2007.
- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
- Hyun Ah Song and Soo-Young Lee. Hierarchical representation using nmf. In International Conference on Neural Information Processing, pages 466–473. Springer, 2013.
- Gül Varol and Albert Ali Salah. Efficient large-scale action recognition in videos using extreme learning machines. Expert Systems with Applications, 42(21):8274–8282, 2015.
- Wei Zhang, Jianming Lian, Chin-Yao Chang, and Karanjit Kalsi. Aggregated modeling and control of air conditioning loads for demand response. IEEE Transactions on Power Systems, 28(4):4655–4664, 2013.
- Xiao-Lei Zhang and Ji Wu. Denoising deep neural networks based voice activity detection. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 853–857. IEEE, 2013.