Designing Near-Optimal Policies for Energy Management in a Stochastic Environment

Designing Near-Optimal Policies for Energy Management in a Stochastic Environment

Chaitanya Poolla11footnotemark: 1 Abraham K. Ishihara22footnotemark: 2 Rodolfo Milito33footnotemark: 3

With the rapid growth in renewable energy and battery storage technologies, there exists significant opportunity to improve energy efficiency and reduce costs through optimization. However, optimization algorithms must take into account the underlying dynamics and uncertainties of the various interconnected subsystems in order to fully realize this potential. To this end, we formulate and solve an energy management optimization problem as a Markov Decision Process (MDP) consisting of battery storage dynamics, a stochastic demand model, a stochastic solar generation model, and an electricity pricing scheme. The stochastic model for predicting solar generation is constructed based on weather forecast data from the National Oceanic and Atmospheric Administration (NOAA). A near-optimal policy design is proposed based on stochastic dynamic programming. Simulation results are presented in the context of a storage and solar-integrated residential building environment. Results indicate that the near-optimal policy significantly reduces the operating costs compared to several heuristic alternatives. The framework proposed can be used to aid the design and evaluation of energy management policies with configurable demand-supply-storage parameters while accounting for weather-induced uncertainties.

Solar Energy; Energy Storage; Energy Management; Microgrid; Smart Building; Markov Decision Process
journal: Applied Energychaitanya_affiliationchaitanya_affiliationfootnotetext: Chaitanya Poolla is with Intel Corporation. This work was done while he was a graduate student at the department of Electrical and Computer Engineering, Carnegie Mellon University (SV).abe_affiliationabe_affiliationfootnotetext: Abraham K. Ishihara is with the department of Electrical and Computer Engineering, Carnegie Mellon University (SV).rodolfo_affiliationrodolfo_affiliationfootnotetext: Rodolfo Milito is with Starflow Networks, Inc. This work was done while he was affiliated with Cisco Systems, Inc.

1 Introduction

The current electric grid was originally designed to support unidirectional power-flows emanating from a few generating stations that supplied large number of consumers via transmission and distribution networks. Recently, the growth in distributed renewable power generation sarzynski2009impact () zervos2013renewables () has increased the need for the grid to accommodate bidirectional power-flows. While greater renewable penetration is desirable from a sustainability perspective, the fluctuations associated with it pose significant grid stability challenges battaglini2009development ()painuly2001barriers (). Furthermore, the need to improve efficiency, reliability, security, and reduce the overall carbon footprint requires the present grid to transition toward the next generation, often termed the ”smart grid” smartgrid ().

In this paradigm smartgrid (), information from various smart sensors are integrated via communication networks to enable intelligent real-time power-flow decisions. This allows for several improvements in the power grid including (i) reduced operation and maintenance costs via automation, (ii) increased efficiency by minimizing grid losses, (iii) increased renewable penetration via real time power management and integrated energy storage Rodrigues2014265 (), and (iv) increased reliability. Thus, the smart grid is envisioned to integrate several forward-looking technologies while maintaining full backward compatibility with the existing grid without compromising grid stability farhangi2010 ().

The functional representation of a grid includes an interconnected network of nodes that facilitate generation, transmission, and distribution of electric power at a regional scale or larger. These nodes include distributed sources, loads, energy storage, and agents for making decisions. A functional equivalent of a grid at a smaller scale is known as a microgrid, which can be regarded as a plug-and-play unit of the grid. The microgrid is capable of functioning either as a connected unit or a disconnected unit in relation to the rest of the grid. The former mode of operation is known as the grid-connected mode and the latter mode is known as the islanded mode. The functional equivalent of a microgrid facilitating power-flows within a network at a smaller scale, such as those within a single building, may be called a nanogrid mitra2010 (). A smart444In what follows, the word ’smart’ qualifying a microgrid or a nanogrid indicates that the decisions within the corresponding grid are made based on real-time information about its nodes. nanogrid thus facilitates informed decision-making for effective energy management. In this work, we restrict our focus to the nanogrid, which represents the power system present within a single building.

The problem of making decisions for energy management is central to any network involving power-flows, ranging from the large-scale grid down to the relatively small-scale microgrids and nanogrids ZIA20181033 (). This problem reduces to determining power-flows between various nodes in the network while simultaneously achieving pre-specified objectives. In the real world, these flows are influenced not only by network-specific factors but also by external environment-specific factors such as the weather AGUERAPEREZ2018265 () hernandez2012study (). For example, consider a residential building with uncontrolled demand equipped with renewable generation and energy storage. The power-flow decision problem seeks to determine the power-flows between these nodes by accounting for the factors influencing the network. As an example, a certain decision may result in directing the power to charge the energy storage unit instead of exporting power to the grid, on the account of factors pertaining to poor irradiance conditions and high electricity prices in the immediate future. Such decisions may incur a relatively lower electricity cost compared to decisions always resulting in exporting renewable output to the grid. Consequently, it is of interest to determine a decision that incurs the least possible cost. Such a cost-minimizing decision is henceforth known as an ’optimal’ decision.

In this work, we are concerned with the problem of devising optimal policies555We reserve the word policy to refer to a rule that maps points in the observation space to points in the decision space. in the presence of weather-related uncertainties. Although related problems have been studied in recent literature AGUERAPEREZ2018265 () wang2014dynamic () kanchev2011energy () song2013development () sanseverino2011execution () AMROLLAHI201766 (), this problem has not been solved in its present form to the best of the authors’ knowledge. In Wang et al wang2014dynamic (), controllable solar generation and load models were employed within a deterministic prediction framework to obtain power flows using model prediction control (MPC) techniques. In Kanchev et al kanchev2011energy (), non-controllable deterministic demand-supply models were considered, but without the notion of optimality. However, in our case the renewable supply and demand are modeled as non-controllable stochastic processes under the assumptions that the solar power output operates at its maximum power point and that the demand is required to be met unconditionally. In song2013development () a discrete time PV-integrated energy storage model is considered whose state is modeled as a Markov chain to determine the storage capacity. However, this work aims to solve for the optimal energy management policies within a Markov Decision Process (MDP) framework involving a utility pricing scheme. Further, unlike the energy management problems dealt with in kanchev2011energy () and sanseverino2011execution () where the dynamics of the energy storage were not explicitly considered, we regard the energy storage as a dynamical system along with its charging and efficiency constraints. Also, we employ cyclo-stationary Markov chains to model the time-dependent transitions of the solar and load stochastic processes unlike the conventionally used stationary Markov chain modelsardakanian2011markovian (), Muselli2001675 (). In AGUERAPEREZ2018265 (), the importance of a weather forecast-oriented approach for energy management was highlighted based on a review of the past literature. The outcome of the work resulted in recommendations for incorporating weather forecast data into energy management systems. Our focus is in employing a weather forecast-based approach for solar power prediction into a stochastic demand-supply-storage framework with the objective of designing cost-minimizing policies. A component size optimization problem is solved in AMROLLAHI201766 () with a demand response intervention using mixed integer programming in a deterministic setting. However, here we treat a weather forecast-integrated optimal decision problem within a stochastic environment modeled by a Markov decision process.

The main contributions of this work are (1) an integrated energy management framework consisting of energy storage dynamics, a weather forecast-driven stochastic solar generation model, a building data-driven stochastic residential demand model, and a utility pricing scheme, (2) heuristic policy design 666Heuristic policies are also referred as naive policies in this work. The reader is referred to section 3 for more details. for energy management, and (3) an MDP formulation and solution for designing near optimal energy management policies in an uncertain demand-supply-storage environment.

The remainder of this paper is organized as follows: Section 2 describes the models of the various components in a nanogrid along with its interconnections, Section 3 describes the formulation of naive policies based on heuristic considerations. The notion of cost is introduced in Section 4 and an optimal policy is sought within a Markov Decision Process framework as described in section 5. The simulated performance of various policies are presented and discussed in Section 6 followed by conclusions in Section 7. The treatment of the state and the decision constraints for determining the feasible decision space in the MDP are described in A.

2 Model Description

We are interested in analyzing the impact of power-flow decisions on energy costs within a smart building. Consequently, we develop a model of the nanogrid in the context of a smart building. This model includes the building power load, a renewable solar generator, an energy storage unit, a decision-making unit (DMU), and their interconnections. Such a model enables us to (1) study the evolution of the nanogrid in time and (2) create a framework to evaluate decisions driving power-flows within the nanogrid. In what follows, we describe the underlying aspects of the nanogrid model. The schematic of such a nanogrid connected to the grid is depicted in Figure 1.

Figure 1: Schematic of a grid-connected nanogrid

2.1 Representing Time

Let the time interval of interest be denoted by , where and denote the start and end times respectively. Also, let the sequence of points denote a partition of the interval such that the following properties hold true:


where represents the uniform width of the subintervals . Let the set of these subintervals in be denoted by . In addition to , we also construct an interval to represent time in the past for the purpose of integrating weather data from the past. Similar to the above definitions over the interval , we construct a partition and the set of corresponding subintervals over the interval . For the decision problem described in Section 3, each point in the sequence represents a decision epoch, at which instant a decision-rule (policy) is formulated. This rule, formulated at , will continue to reflect in the decisions implemented during the interval . Further, let the width of each subinterval be uniform such that . For the purpose of computation, we set seconds so that the time instants in and are one hour apart in time.

2.2 Photovoltaic Generation

In the nanogrid model, let the power output from the solar generation be represented by a discrete time stochastic process with discrete states . The probability distribution of each random variable is assumed calculable at every time instant . We also assume this process to be first order Markovian and cyclo-stationary with a time period of 24 hours. Since the solar generation output is influenced by factors such as the panel characteristics and environmental conditions, a model that maps environment variables to the power output is employed sera2007pv () poolla2014neural (). In the implementation, the distributions of the random variables are obtained from weather distributions, which are constructed using time series data from the National Oceanic and Atmospheric Administration (NOAA).

2.2.1 Integrating NOAA Forecasts

NOAA provides a wide variety of weather forecast products varying in spatio-temporal resolution, prediction horizon, and update frequency noaa_ncep (). In this work, forecast data from the NOAA North American Mesoscale Forecast System (NAM) archives are leveraged to infer probability distributions of the measured solar irradiance as described below:

Let denote the irradiance data obtained from sensor measurement at a time instant in the past and let denote the forecast value available at (based on the archived data from an -hours ahead forecast). Using the historical sensor data and the forecast archives , the error can be computed as shown in 2.


For this work, the sensor data from Carnegie Mellon University (CMU) in Moffett Field, California was available at a sub-hourly granularity. However, the data from the NAM model was available only at a granularity of 6 hours since the NAM model runs 4 times per day at hr UTC (i.e. hr PST) forecasting up to 84 hours ahead. Every NAM model run also provides a 0-hr ahead forecast, which is regarded as the estimate of the truth provided by the NAM model based on a variety of sensor measurements. Thus the 0-hr ahead forecast may serve as a proxy of the true value for locations where sensor measurements are unavailable.

We note that obtained from equation 2 is available only when since the temporal resolution of NAM data is 6 hours. For the purpose of inference, we assume that the data is available at all instants in the past by enforcing a first-order hold over the data . Thus, the error measurements are well defined for every .

Given the error population , let denote the dataset available at the time instant containing the error data , where . Here denotes the periodicity factor, which determines the grouping of the elements into for each in . In other words, the dataset is a collection of all the error data such that the corresponding time instants and leave the same remainder when divided by .

Based on the error datasets constructed above, we proceed to infer the error distributions. Let the error generating process be represented by a stochastic process . Let its distribution be denoted by , where refers to the underlying dataset used for the inference. We also assume that is periodic with a time period of . Since the distribution is assumed to be periodic, the underlying dataset at a given time instant in the future can be represented by a dataset in the past if and only if the following hold true:


where, is the time instant in the past whose error dataset is used to infer the error distribution of at .

Given the characterization of the stochastic process as described above, we employ a linear signal-noise regression model to predict the measured value as shown below. While this model is used to demonstrate the proof of concept, it may be noted that similar regression models incorporating weather forecasts can be used poolla2018solar ().


In this manner, the distribution of the estimated irradiance measurements is computed based on the error distribution of the stochastic process and the deterministic forecast . Thus the seasonal component in the estimation of is introduced by .

Similar to the irradiance estimation performed above, other weather variables may be used to obtain the distributions of various stochastic processes representing different aspects of weather such as temperature, cloudiness etc. With the knowledge of these weather fluctuation distributions, models that map the environment states to the solar power output can be used to obtain the solar power distribution. Although any appropriate model can be used, we resort to a linear model for the proof-of-concept illustration. The resulting solar power distributions are denoted by , where represents the time instant and represents the -states of the photovoltaic generation.

2.3 Load

In the building power system, loads could broadly be categorized into Heating Ventilation and Air Conditioning (HVAC), lighting, and plug loads. The overall load is then the aggregate over all individual loads at every time instant . In this work, hourly load data associated with San Francisco residential buildings were obtained from the openEI database openei_data (). Similar to the photovoltaic generation model, the load is modeled as a discrete time discrete state Markov chain with -states . The state transition probabilities of the model are assumed to be cyclo-stationary with a time period of 24 hours. Let the state distributions be denoted by , where denotes a state of the load variable ().

2.4 Energy Storage

The objective of furthering renewable penetration and simultaneously maintaining grid stability can be practically realized by incorporating energy storage systems Rodrigues2014265 (). In a more localized environment like a micro/nano grid, storage systems also help achieve self-reliance independent of the main grid and enable reduction in cost of electricity consumption. In this work, the nanogrid storage system is represented by a dynamical system with capacity . There are several approaches to modeling the storage dynamics with varying levels of complexity he2011evaluation (). To illustrate the concept of energy management in the context of a nanogrid, we resort to a linear dynamic representation of the storage system similar to wang2014dynamic ():


where, represents the energy content (state) of the energy storage at and represents the net power output from the storage system from the current time instant until the next instant .The power flow constraints associated with the storage system are represented by . Further, the self-discharge losses Salameh124547 () are represented by the storage efficiency factor and the closed-circuit losses by the factor 777Note that during charging and during the discharge phase. In other words, more power is discharged from the storage system compared to the storage output and less power is used to charge the storage system compared to the storage input due to losses incurred. erdinc2009dynamic ().

2.5 Grid Transactions

The electric grid facilitates the transfer of power from the suppliers to the consumers over transmission and distribution networks. In this work, we consider the transactions between the grid and the nanogrid to be lossless and unconstrained but associated with a cost to illustrate the concept. The decisions driving these transactions are enabled by a Decision-Making Unit (DMU). A functional representation of the nanogrid along with its DMU is shown in Figure 1. The notations for the transactions and the associated pricing scheme used in the model are described below.

Let the power flow from the grid to the nanogrid DMU at any be denoted by 888We adopt a sign convention which considers the power flows into the DMU as positive.. The associated monetary cost incurred by the nanogrid depends on the quantity of power transacted as well as the pricing scheme enforced by the local utility company or the energy market federal2012energy (). We consider a deterministic pricing scheme where the purchase prices () are based on schedule E6 pge_residential_e6 () and the selling prices () are based on E-SRG PPA pge_srg_ppa (). Thus, the monetary cost incurred due to the grid transactions over the horizon can be written as:


where, positive indicates the cost to be paid by the nanogrid to the grid on the account of the grid transactions .

2.6 Nanogrid Topology

In a distributed power system consisting of several generation units, loads, storage systems and the main grid, the interconnection topologies play a critical role in routing power effectively. From a decision-making perspective, there exist interconnected designs with varying levels of decentralization, each differing in autonomy and complexity. In the nanogrid under consideration, we employ a centrally interconnected design wherein all the nanogrid components are connected via the DMU as depicted in Figure 1.

2.7 Nanogrid DMU

The nanogrid decision-making unit provides the interface between the nanogrid and the main grid to enable power transactions. In general, smartgrid power systems consist of two interdependent networks, (1) A communication network for routing information and (2) A power network for routing power. Depending on the states of the storage system, the generation, the load, and the electricity pricing, the DMU centrally computes the power flow decisions in the network. It is assumed here that the information required for making power flow decisions is available to the DMU instantaneously via the communication network.

The DMU determines the power flows and in the power system based on various considerations. Let the generation output and the load at an instant be denoted by and load respectively. Here and represent the realizations of the stochastic process and at the time in the real world. Given the power flows and , the decisions computed at the DMU (, ) must result in the power balance at every time instant as shown:


2.8 System Parameters

Decision-making within the nanogrid model requires information about the system parameters. Let the system parameters at be denoted by . Also, let be classified into the deterministic parameters of the system (denoted by ) and the distribution of the random parameters the system (denoted by ). The deterministic parameters consist of and the distribution parameters of the stochastic processes consist of . Thus, the system parameters can be written as .

3 Policy Formulation

The nanogrid model above enables structured power-flow decision-making. Specifically, the information about the state is useful in determining the decision pair (). The mapping between the state space and the decision space of the system is provided by a policy, denoted by . The nanogrid system policy consists of a collection of two policies, namely the storage transaction policy and the grid transaction policy . Thus, the policy map between the states () and the decisions (,) can be written as:


where, the admissible decision space is represented by and denotes the storage state space. It must be noted that the admissible decision space consists of the decisions which, when implemented do not result in the violation of the system constraints (Refer to A for details). Also, the two decisions ( and ) are not independent. Once one is chosen, the other is determined by the power balance (equation 7). In what follows, we describe heuristics-based policies, also known here as ’naive’ policies:

  1. Policy 1: Exhaustive Storage Dependence Policy (): Given the non-controllable supply and demand , the policy is designed to bridge the demand-supply gap by depending on the storage transactions . When the storage resource can no longer be used to maintain power balance, the policy resorts to the grid transactions to bridge the demand-supply gap. The decision-making process for this policy () is illustrated by the flowchart in Figure 2.

  2. Policy 2: Finite Horizon Lookahead Policy with a three-hour lookahead (): This policy is designed to make informed decisions based on the present nanogrid state as well as the expected state over a finite horizon (future). For numerical evaluation, a three hour ahead horizon is chosen. With these considerations, four scenarios are possible: (i) excess supply in the present and excess supply cumulatively expected over the horizon, (ii) excess supply in the present and deficit supply cumulatively expected over the horizon, (iii) deficit supply in the present and excess supply cumulatively expected over the horizon, and (iv) deficit supply in the present and deficit supply cumulatively expected over the horizon. When there is excess supply in the present and excess supply is cumulatively expected over the horizon, the storage resource is charged before depending on the grid for power balance, thereby accommodating excess generation during the present and over the future horizon relying primarily on the storage resource. In the scenario with deficit supply in the present and deficit supply cumulatively expected over the horizon, the storage resource is half-discharged to meet the deficit before depending on the grid. The rationale underlying this mechanism is the policy-maker utilizes half the storage resource to meet the present deficit and the rest is retained to meet the deficit expected over the future horizon. In the remaining scenarios, the policy relies on the generation and the grid for power balance, leaving the storage resource unused. The decision flow corresponding to the policy is illustrated by the flowchart in Figure 3.

Figure 2: Flow Chart depicting the Exhaustive Storage Dependence Policy
Figure 3: Flow Chart depicting the three-hour Lookahead Policy

4 Optimal Power Flow Problem

Given heuristics-based policies such as the ones described above, it is important to compare the performance of these policies based on cost metrics, one of which was described in equation 6. This allows the policy-maker to order and choose policies based on their cost. In this work, we attempt to design policies that result in decisions minimizing the nanogrid cost. We consider the scenario where the cost metric describes the expected monetary cost over a finite horizon . Thus, the cost metric depends on the costs associated with both the grid transactions as well as the expected storage state at the end of the horizon. The storage state at the end of the horizon is unrealized at all instants and is hence regarded as a random variable . The probability distribution of can, however, be computed with the knowledge of (i) probabilistic supply and load distributions and respectively over the instants , and (ii) the storage dynamics described in Equation 5. Therefore, the cost metric over the horizon999In the dynamic programming formulation of decision-making problems, the cost metric over the horizon is also known as the cost-to-go from stage to . can be written as:


where, denotes the indicator function, denotes the expectation operator with respect to the solar and load distributions, and denotes the system parameters at the time instant . While the above definition of cost captures the essential monetary costs, we modify the cost function to force the optimal decision-maker to ensure the storage is completely charged by the end of the horizon. This is achieved by introducing a multiplier in the terminal cost, thereby transforming it into . Thus, the redefined cost over the horizon becomes:


Given the above cost metric , the problem of interest is to compute the optimal policy that minimizes 101010Though the optimal policy minimizes , which does not represent the monetary cost, we evaluate the policies in Section 6 based on their monetary cost . over the entire horizon .

5 Optimal Policy Computation

In order to solve for the optimal policy, we first note that the states (), the decisions , the Markovian state transitions (Equation 5), and the cost metric (Equation 10) together constitute a Markov decision process (MDP) framework. Within this framework, the policy-maker enforces a decision on the system and the system state responds by randomly transitioning into a new state while incurring a transition cost .

The problem of the policy-maker is to obtain the policy (), which, when enforced, results in decisions that minimize the cost over the optimization horizon . The optimal policy can be obtained by solving the following optimization problem:


where, denotes the random variable representing the uncertain state of the storage in the future. The corresponding optimal cost then becomes:


where, is known as the value of the state at the time instant . The function , known as the value function maps the states and parameters at the time instant to a real value as shown:


In other words, the value function at the instant describes the optimal cost-to-go from stage through the final stage .

We address the above discrete time stochastic dynamic optimization problem by applying the principle of optimality and solving the resulting sub-problems using a dynamic programming (DP) formulation. The sequence of solutions to these sub-problems are obtained using the backward induction method, wherein the updated values at the previous time step are obtained by solving the Bellman equation. In our work, the Bellman equation is solved by interpolated value function approximations. In what follows, we describe the details of the DP formulation and the solution methodology to arrive at near-optimal solutions to the problem described in Equation 5. By applying the principle of optimality, we can transform the decision-making problem into a sequence of sub-problems as shown:


Using the definition of the value function in equation 13, we rewrite the equation 14 as:


Continuing the course of breaking down into sub-problems, we arrive at the following generalized equation:


which describes a recursive relationship between the values at consecutive time steps. This equation, known as the Bellman equation, offers a recursive method to compute the values at the current instant of time based on the knowledge of values at the next of time. We apply backward induction to solve for the values in the Bellman equation 16, starting with the end of the horizon and solving backwards in time.

To solve for the value function as well as the optimal decisions, we resort to a numerical approach. First, we observe that the domain of the value function is the continuous state space . We proceed to quantize the state space only for the purposes of computing the values and the optimal decisions, thereby introducing sub-optimality into the solution. Let the discretized version of the continuous state space be represented by the finite sequence .

At the end of the horizon , the values are computed for the quantized state space :


At every previous time step , the decisions resulting in the minimum expected cost-to-go are computed by solving equation 16 across the feasible decision space (see A). Thus the result of feasible decision space constraints, discretizing the state space, and approximating the value function is a suboptimal solution which provides a sequence of suboptimal decisions that constitutes the non-stationary near-optimal policy .

6 Results and Discussion

We analyzed the weather forecast data from the NAM model in light of both the sensor measurements as well as the true data as provided by NAM (0-hr ahead forecast). Results from the error distribution were used to determine the probability distributions of the solar generation process. The results from the NAM model are summarized below.

6.1 Weather Results

To analyze the similarity between the true data provided by NAM (0-hr ahead forecast) and the sensor measurements, we compared both these datasets over a period of two months (Aug - Oct 2014). Example comparisons at 11 PM PT and 5 PM are plotted in Figures 5 and 5, respectively. These figures show that the NOAA NAM model output and the sensor measurements exhibit similar trends. Further, the Root Mean Square (RMS) errors were computed based on percentage errors between NOAA and sensor measured data. The mean error was found to be along with a standard deviation of . These error characteristics can be attributed to several factors including coarse spatial granularity, lack of local information about shadows or dust patterns, modeling error, and low update frequency of the NAM model (four per day).

Figure 4: NOAA and Sensor irradiance data comparison at 11 AM, RMS Accuracy =
Figure 5: NOAA and Sensor irradiance data comparison at 5 PM, RMS Accuracy =

Since the NAM model forecasts up to 84 hours ahead, we compared the accuracy of the forecasts against several prediction window lengths assuming the 0-hr ahead forecasts represented the ground truth. The comparison depicting the absolute RMS errors between the NAM truth (0-hr ahead) and the NAM forecasts up to 84 hours ahead is shown in Figure 6. The corresponding percentage accuracies relative to the NAM truth data is indicated in green. The overall accuracy across various prediction window lengths was found consistent with a mean of and a standard deviation of . Thus the use of 84-hour ahead NAM forecast appears to provide accuracies comparable to that of a 6-hour ahead forecast for the case under consideration.

Figure 6: Comparison between NAM truth and -hour ahead forecasts:

Based on the error distributions obtained from the irradiance archives, we estimated the distributions of the measured irradiance state with the knowledge of the forecasts (equation 4).

6.2 Supply and Demand Model results

Based on the distribution of the measured irradiance estimates described above, the distributions of the stochastic solar generation were obtained. Assuming an installation capacity of under standard conditions, the expected solar generation was computed based on the above computed distribution and is shown in Figure 8. As aforementioned, the NAM model provides data at a temporal granularity of 6 hours (4 per day). However, the forecast as well as the error datasets were reconstructed at an hourly granularity based on a first-order hold. In this manner, we obtained the hourly resolution depicted in the Figure 8.

Figure 7: Expected Solar Generation based on NOAA NAM irradiance estimates,
Figure 8: Expected Hourly Residential Load at San Francisco openei_data (),

The distribution for the load stochastic process was computed similarly from the power consumption data of typical residential building in San Francisco provided by the openEI database openei_data (). The expected values of the stochastic process based are depicted in Figure 8. While the load variation appears relatively smooth due to coarse temporal granularity, it must be noted that the daily load curves representing building demand vary abruptly in real-time mathieu2011quantifying ().

6.3 Nanogrid Simulation

We simulated the nanogrid model under the action of both the naive and the near-optimal policies. The parameters of the energy storage system were obtained from the specification sheet in motors2015powerwall (). For the ease of analysis, the storage factors were set to in the simulation (though it is does not occur in practice).

The expected state evolution under the action of policy is shown in Figure 10. We note that the policy was designed to primarily rely on the storage resource. In other words, we expect the policy to charge the storage device in the case of excess solar generation and discharge during deficit generation prior to any dependence on the grid. The same behavior can be observed from the bottom subplot of Figure 10. There exists deficit generation between the time periods and hours during which policy discharges the storage as evident from the positive output power and battery decrements in the other subplots. Similarly, excess generation during hours results in the policy driving decisions to charge the storage resource.

Figure 9: System under the action of Policy 1
Figure 10: System under the action of Policy 2

In case of the finite horizon look ahead policy design, the decisions are influenced by both the current demand-supply offset as well as the expected offset over the three-hour finite horizon as described in the Figure 3. These decision-making factors are shown in the bottom subplots of Figure 10. To analyze the results, we observe three scenarios in relation to the sign of the imbalance (solar-load) between the present and expected future. Specifically, the first scenario consists of hour time-interval during which excess generation is expected in the present and the future. In this case, the look ahead policy decides to charge the storage as evident in the Power output from Battery kW subplot in Figure 10. Similarly, in the second scenario ( hours) both the present and the future expect a generation deficit, in which case, the policy decides to discharge the battery to address the deficit, and this behavior is evident in the above-mentioned subplot. In the third scenario, wherein the sign of the present imbalance differs from that of the expected future ( hours), the policy is designed to not change the storage state, and this behavior can be observed from the same subplot in 10.

The near-optimal policy described in Section 5 was computed by backward induction. By using the values computed at the forward time instant , near-optimal decisions and current values at that minimize the expected cost were computed using the Bellman equation 16. The near-optimal policy along with its impact on state evolution is shown in figure 12. These near-optimal decisions depend on several factors that non-trivially combine together to result in the least expected cost-to-go over the optimization horizon in the future. These factors include the solar generation, demand, cost price, selling price, the current state of the system, and the distributions of the solar and load processes based on the weather forecasts.

It can be observed that the near-optimal policy makes the decision to sell power to the grid when selling prices are highest ( hours). Further, the policy ensures full charge both by the end of the optimization horizon (prior to the 23rd hour) as well as by noon (12th hour). The former can be attributed to the large value of terminal cost that would be incurred if the battery is not fully charged. However, in case of the latter the charging process occurs gradually up to the 12th hour despite the usual cost price. This motivation can be understood by noting that the policy decides to charge the battery by the 12th hour anticipating a maximal profit (minimal cost) in discharging it into the grid when the selling prices are the highest during hours. Therefore, the additional complexity involved in computing the near-optimal policies results in smart cost-considerate decisions with the objective of realizing the nearly least (near-optimal) cost.

Figure 11: System under the action of near-optimal Policy
Figure 12: Cost-to-go under various Policies

The cost-to-go values corresponding to the near-optimal and the naive policies are shown in figure 12. It can be observed that the value function in the top subplot (shown in green) is not necessarily monotonic due to the reversal of the instantaneous grid transaction costs (shown in bottom subplot), on the account of switching between purchasing and selling power. Further, we note that the values of the value function are greater in magnitude than their heuristic counterparts over certain intervals.

This does not go against the definition of the value function or the principle of optimality for the following reasons. The value function is a function of the state and therefore its value can only be compared against the costs arising from the implementation of the other policies from the same state. However, as evident from Figures 10, 10, and 12, the states at each point in time are not necessarily the same in the state evolution under the action of each policy. Therefore, the principle of optimality and the definition value function are not violated on this account. For the correct comparison of the values from the value function to the costs-to-go from other policies, all the initial conditions of the storage state were set to , and hence as we expect from the principle of optimality the computed near-optimal cost-to-go at the start is less than the costs-to-go incurred by the naive policies. Therefore, the cost results from the simulation are consistent with the definition of the value function and the principle of optimality.

The performance of the policies were simulated over a month and are compared to the monthly average residential electricity bill in San Francisco. These results are shown in the bar chart 13 wherein the red bar depicts the average monthly residential electric bill in San Francisco with a value of based on the pricing data from sfo_elec_price (). The orange bar depicts the cost based on simulating the system model with only the load, solar, and the pricing models. The resulting cost was evalulated to be . Though the system model is not an accurate representation of the underlying reality, this comparison indicates that the results obtained by the simulating the model with the chosen parameters are in reasonable agreement (within error margin) with the monthly average electricity cost in San Francisco.

After including the storage resource, the monthly operational costs under the action of policy 1 and policy 2 were found comparable to each other at and respectively. As expected, the near-optimal policy outperforms its heuristic counterparts, resulting in a cost of (profit of ) over the month. In other words, the near-optimal policy is expected to generate operating rewards while the naive policies incur relatively tangible operational costs. However, the reader may note that apart from the modeling inaccuracies, the other significant costs incurred in installation and maintenance must be considered for an accurate overall cost comparison to a monthly electricity bill. Nevertheless, the above results indicate the payback period using near-optimal policies is expected to be tangibly lesser than that of their heuristic counterparts.

Figure 13: Monthly Cost Comparison

7 Conclusion

The problem of designing optimal policies for energy management was treated in the context of uncertain renewable supply and demand. The uncertain power flows from the solar and load sources were modeled as first order Markov processes in discrete time and space. Probability distributions of these solar and load processes were obtained using historical weather data and residential demand data, respectively. Specifically, the power distributions of the solar Markov model were obtained using the forecast and error distributions inferred from the NOAA NAM data. The power distributions of the load Markov model were inferred from the openEI residential dataset for San Francisco. Along with a first order dynamical storage model, the demand-supply-storage framework was formulated within which decision problems were examined. Naive policies were proposed based on heuristic considerations and their performance was evaluated using a cost function with appropriate pricing models. Thereafter, the optimal decision problem was posed and a cost minimizing policy was sought within the Markov Decision Process (MDP) framework using stochastic dynamic programming (SDP). The SDP approach was implemented by discretizing the state space, constraining the computable feasible decision space (A), and approximating the value function, thereby resulting in sub-optimality of the solution. The near-optimal policy was thereafter computed using backward induction. Resulting simulations suggest that the near-optimal policy outperforms naive policies by with respect to operating costs over a monthly optimization horizon. Future work can investigate the use of continuous state-time formulations, nonlinear battery models, stochastic pricing schemes, partially controllable supply-demand processes, partially observable storage states, grid constraints, and problems pertaining to a network of interconnected nanogrids.


The authors would like to thank Cisco Systems, Inc. for its support.

Appendix A Handling constraints in the decision problem

The constraints in the decision problem include the following: (i) the power limits during the storage charge-discharge process, (ii) energy storage state limits, and the (iii) power balance constraint (equation 7).

The decisions that do not violate the above constraints are referred to here as the feasible decisions. Similarly, the policies that result in such feasible decisions shall be known as the feasible policies. We now attempt to determine the feasible decision space, as it is required to solve the Bellman equation 16. In what follows, and represent a realization of the stochastic processes and respectively at the time instant and represents the storage state at .

a.1 Handling State Constraints

Given the dynamics of the battery (equation 5) and the state constraint kWh at , the following can be stated by taking advantage of the discrete time formulation. If the state constraint needs to be satisfied at any time step assuming it holds at , then the following holds true:


Let the decision constraint in equation A.1 be written as . It is easy to observe that, if and , then the above derivation implies that the state constraint at is satisfied.

a.2 Handling Power Constraints

Since the storage power flow must be within the limits kW, we can use equation 7 to make the following claim:


Let the decision constraint in equation A.2 be written as . It is easy to observe that, if , then the battery power constraints are satisfied as shown above.

Since the equations A.1 and A.2 constrain the same decision variable , the feasible decision space can be obtained by the intersection of these constrained spaces. Let the intersection be represented by . Thus can be written as,


From equation 20, we observe the following:

  1. The decision space is guaranteed to have a positive Lebesgue measure, since and but both cannot the value 0 simultaneously. Thus the existence of a non-empty feasible decision space is guaranteed by definition.

  2. The bounds of the decision space depend on the realizations and of the stochastic processes and respectively. However, during the optimal policy design phase, the realizations of the stochastic processes and are unknown until the time instant occurs in the real world.

  3. Despite the guaranteed existence of a feasible decision space, it is unknown on the account of uncertainty in the generation and load processes.

In order to eliminate the dependence of the feasible decision space on the unknown solar and load in the real world at , we use the information about the bounds on the realizations of the stochastic processes and . Let the range space of these random variables at time instant be represented by: (i) for the solar generation, and (ii) for the load demand. Since the solar generation and load are bounded in the real world, the bounds are physically well-defined. Using these bounds, we construct a subset of the feasible space and call it the computable feasible decision space () as follows:


It is easy to verify that is constructed by the intersection of the feasible decision spaces across all sample paths with non-zero probability. In other words, , where .

Though the computable feasible decision space introduces sub-optimality, it is nevertheless a sufficient condition to ensure that the system constraints are upheld under all possible realizations of the stochastic processes. However, its existence is contingent on the measure of being well-defined. Therefore, the necessary conditions for the existence of a computable feasible decision space of non-zero measure are:


It is easy to verify that, if , then the inequation 21 holds true. Hence, the inequation provides a stronger condition for the existence of a non-empty feasible decision space . Progressively stronger sufficiency conditions can be derived as follows:


Note that the left-hand side (LHS) of equation 23 represents the maximum possible demand-supply offset gap at the instant , the LHS of equation 24 represents the maximum demand-supply sum at the instant , and the LHS of A.2 maximum possible demand-supply over the horizon . In each of these sufficiency conditions, the right-hand side represents a time-independent expression dependent on a subset of the storage parameters .

Equations 21-A.2 represent the worst case sufficiency conditions that ensure that the existence of a corresponding computable feasible decision space despite the unknown realizations of the solar and load processes. Further, given any grid transaction decision , it is ensured that the corresponding feasible battery decision space is the same as the space defined by the battery constraint since is equivalent to equation A.2 by definition.

Let the parameters satisfying the sufficiency condition111111Since the conditions represented by equations 21-A.2 are progressively stronger, satisfying any of these equations ensures that equation 21 is satisfied. in equation 21 belong to the space , where refers to the dimension of . We call as the computable feasible configuration space of the system for which a computable feasible decision space exists. Therefore, satisfying the sufficiency condition guarantees the existence of such computable feasible decision spaces which are required to design the near-optimal policy. Only decisions belonging to the computable feasible decision space are considered admissible for computing the near-optimal policy, hence we also refer to this decision space as the admissible decision space for the optimal decision-making problem.

In summary, a feasible decision space is guaranteed to exist for every realization of the stochastic processes (equation 20), but is unknown at the time of designing the optimal policy. This is because such design (equation 16) involves computing the expected state at the next instant of time , and thus requires that all possible realizations of be pre-computed, accounting for every possible realization of the load and solar stochastic processes. In the pre-computation step, the realization of (= ) can be represented as a function of the decision . Thus the state constraints translate into corresponding control constraints given the realizations of the load and solar stochastic processes. In conclusion, designing a near-optimal policy using stochastic dynamic programming is feasible only if the inequalities A.2-A.2 are satisfied, thereby ensuring a non-empty admissible decision space ().


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide s