Designing NearOptimal Policies for Energy Management in a Stochastic Environment
Abstract
With the rapid growth in renewable energy and battery storage technologies, there exists significant opportunity to improve energy efficiency and reduce costs through optimization. However, optimization algorithms must take into account the underlying dynamics and uncertainties of the various interconnected subsystems in order to fully realize this potential. To this end, we formulate and solve an energy management optimization problem as a Markov Decision Process (MDP) consisting of battery storage dynamics, a stochastic demand model, a stochastic solar generation model, and an electricity pricing scheme. The stochastic model for predicting solar generation is constructed based on weather forecast data from the National Oceanic and Atmospheric Administration (NOAA). A nearoptimal policy design is proposed based on stochastic dynamic programming. Simulation results are presented in the context of a storage and solarintegrated residential building environment. Results indicate that the nearoptimal policy significantly reduces the operating costs compared to several heuristic alternatives. The framework proposed can be used to aid the design and evaluation of energy management policies with configurable demandsupplystorage parameters while accounting for weatherinduced uncertainties.
keywords:
Solar Energy; Energy Storage; Energy Management; Microgrid; Smart Building; Markov Decision Process1 Introduction
The current electric grid was originally designed to support unidirectional powerflows emanating from a few generating stations that supplied large number of consumers via transmission and distribution networks. Recently, the growth in distributed renewable power generation sarzynski2009impact () zervos2013renewables () has increased the need for the grid to accommodate bidirectional powerflows. While greater renewable penetration is desirable from a sustainability perspective, the fluctuations associated with it pose significant grid stability challenges battaglini2009development ()painuly2001barriers (). Furthermore, the need to improve efficiency, reliability, security, and reduce the overall carbon footprint requires the present grid to transition toward the next generation, often termed the ”smart grid” smartgrid ().
In this paradigm smartgrid (), information from various smart sensors are integrated via communication networks to enable intelligent realtime powerflow decisions. This allows for several improvements in the power grid including (i) reduced operation and maintenance costs via automation, (ii) increased efficiency by minimizing grid losses, (iii) increased renewable penetration via real time power management and integrated energy storage Rodrigues2014265 (), and (iv) increased reliability. Thus, the smart grid is envisioned to integrate several forwardlooking technologies while maintaining full backward compatibility with the existing grid without compromising grid stability farhangi2010 ().
The functional representation of a grid includes an interconnected network of nodes that facilitate generation, transmission, and distribution of electric power at a regional scale or larger. These nodes include distributed sources, loads, energy storage, and agents for making decisions. A functional equivalent of a grid at a smaller scale is known as a microgrid, which can be regarded as a plugandplay unit of the grid. The microgrid is capable of functioning either as a connected unit or a disconnected unit in relation to the rest of the grid. The former mode of operation is known as the gridconnected mode and the latter mode is known as the islanded mode. The functional equivalent of a microgrid facilitating powerflows within a network at a smaller scale, such as those within a single building, may be called a nanogrid mitra2010 (). A smart^{4}^{4}4In what follows, the word ’smart’ qualifying a microgrid or a nanogrid indicates that the decisions within the corresponding grid are made based on realtime information about its nodes. nanogrid thus facilitates informed decisionmaking for effective energy management. In this work, we restrict our focus to the nanogrid, which represents the power system present within a single building.
The problem of making decisions for energy management is central to any network involving powerflows, ranging from the largescale grid down to the relatively smallscale microgrids and nanogrids ZIA20181033 (). This problem reduces to determining powerflows between various nodes in the network while simultaneously achieving prespecified objectives. In the real world, these flows are influenced not only by networkspecific factors but also by external environmentspecific factors such as the weather AGUERAPEREZ2018265 () hernandez2012study (). For example, consider a residential building with uncontrolled demand equipped with renewable generation and energy storage. The powerflow decision problem seeks to determine the powerflows between these nodes by accounting for the factors influencing the network. As an example, a certain decision may result in directing the power to charge the energy storage unit instead of exporting power to the grid, on the account of factors pertaining to poor irradiance conditions and high electricity prices in the immediate future. Such decisions may incur a relatively lower electricity cost compared to decisions always resulting in exporting renewable output to the grid. Consequently, it is of interest to determine a decision that incurs the least possible cost. Such a costminimizing decision is henceforth known as an ’optimal’ decision.
In this work, we are concerned with the problem of devising optimal policies^{5}^{5}5We reserve the word policy to refer to a rule that maps points in the observation space to points in the decision space. in the presence of weatherrelated uncertainties. Although related problems have been studied in recent literature AGUERAPEREZ2018265 () wang2014dynamic () kanchev2011energy () song2013development () sanseverino2011execution () AMROLLAHI201766 (), this problem has not been solved in its present form to the best of the authors’ knowledge. In Wang et al wang2014dynamic (), controllable solar generation and load models were employed within a deterministic prediction framework to obtain power flows using model prediction control (MPC) techniques. In Kanchev et al kanchev2011energy (), noncontrollable deterministic demandsupply models were considered, but without the notion of optimality. However, in our case the renewable supply and demand are modeled as noncontrollable stochastic processes under the assumptions that the solar power output operates at its maximum power point and that the demand is required to be met unconditionally. In song2013development () a discrete time PVintegrated energy storage model is considered whose state is modeled as a Markov chain to determine the storage capacity. However, this work aims to solve for the optimal energy management policies within a Markov Decision Process (MDP) framework involving a utility pricing scheme. Further, unlike the energy management problems dealt with in kanchev2011energy () and sanseverino2011execution () where the dynamics of the energy storage were not explicitly considered, we regard the energy storage as a dynamical system along with its charging and efficiency constraints. Also, we employ cyclostationary Markov chains to model the timedependent transitions of the solar and load stochastic processes unlike the conventionally used stationary Markov chain modelsardakanian2011markovian (), Muselli2001675 (). In AGUERAPEREZ2018265 (), the importance of a weather forecastoriented approach for energy management was highlighted based on a review of the past literature. The outcome of the work resulted in recommendations for incorporating weather forecast data into energy management systems. Our focus is in employing a weather forecastbased approach for solar power prediction into a stochastic demandsupplystorage framework with the objective of designing costminimizing policies. A component size optimization problem is solved in AMROLLAHI201766 () with a demand response intervention using mixed integer programming in a deterministic setting. However, here we treat a weather forecastintegrated optimal decision problem within a stochastic environment modeled by a Markov decision process.
The main contributions of this work are (1) an integrated energy management framework consisting of energy storage dynamics, a weather forecastdriven stochastic solar generation model, a building datadriven stochastic residential demand model, and a utility pricing scheme, (2) heuristic policy design ^{6}^{6}6Heuristic policies are also referred as naive policies in this work. The reader is referred to section 3 for more details. for energy management, and (3) an MDP formulation and solution for designing near optimal energy management policies in an uncertain demandsupplystorage environment.
The remainder of this paper is organized as follows: Section 2 describes the models of the various components in a nanogrid along with its interconnections, Section 3 describes the formulation of naive policies based on heuristic considerations. The notion of cost is introduced in Section 4 and an optimal policy is sought within a Markov Decision Process framework as described in section 5. The simulated performance of various policies are presented and discussed in Section 6 followed by conclusions in Section 7. The treatment of the state and the decision constraints for determining the feasible decision space in the MDP are described in A.
2 Model Description
We are interested in analyzing the impact of powerflow decisions on energy costs within a smart building. Consequently, we develop a model of the nanogrid in the context of a smart building. This model includes the building power load, a renewable solar generator, an energy storage unit, a decisionmaking unit (DMU), and their interconnections. Such a model enables us to (1) study the evolution of the nanogrid in time and (2) create a framework to evaluate decisions driving powerflows within the nanogrid. In what follows, we describe the underlying aspects of the nanogrid model. The schematic of such a nanogrid connected to the grid is depicted in Figure 1.
2.1 Representing Time
Let the time interval of interest be denoted by , where and denote the start and end times respectively. Also, let the sequence of points denote a partition of the interval such that the following properties hold true:
(1) 
where represents the uniform width of the subintervals . Let the set of these subintervals in be denoted by . In addition to , we also construct an interval to represent time in the past for the purpose of integrating weather data from the past. Similar to the above definitions over the interval , we construct a partition and the set of corresponding subintervals over the interval . For the decision problem described in Section 3, each point in the sequence represents a decision epoch, at which instant a decisionrule (policy) is formulated. This rule, formulated at , will continue to reflect in the decisions implemented during the interval . Further, let the width of each subinterval be uniform such that . For the purpose of computation, we set seconds so that the time instants in and are one hour apart in time.
2.2 Photovoltaic Generation
In the nanogrid model, let the power output from the solar generation be represented by a discrete time stochastic process with discrete states . The probability distribution of each random variable is assumed calculable at every time instant . We also assume this process to be first order Markovian and cyclostationary with a time period of 24 hours. Since the solar generation output is influenced by factors such as the panel characteristics and environmental conditions, a model that maps environment variables to the power output is employed sera2007pv () poolla2014neural (). In the implementation, the distributions of the random variables are obtained from weather distributions, which are constructed using time series data from the National Oceanic and Atmospheric Administration (NOAA).
2.2.1 Integrating NOAA Forecasts
NOAA provides a wide variety of weather forecast products varying in spatiotemporal resolution, prediction horizon, and update frequency noaa_ncep (). In this work, forecast data from the NOAA North American Mesoscale Forecast System (NAM) archives are leveraged to infer probability distributions of the measured solar irradiance as described below:
Let denote the irradiance data obtained from sensor measurement at a time instant in the past and let denote the forecast value available at (based on the archived data from an hours ahead forecast). Using the historical sensor data and the forecast archives , the error can be computed as shown in 2.
(2) 
For this work, the sensor data from Carnegie Mellon University (CMU) in Moffett Field, California was available at a subhourly granularity. However, the data from the NAM model was available only at a granularity of 6 hours since the NAM model runs 4 times per day at hr UTC (i.e. hr PST) forecasting up to 84 hours ahead. Every NAM model run also provides a 0hr ahead forecast, which is regarded as the estimate of the truth provided by the NAM model based on a variety of sensor measurements. Thus the 0hr ahead forecast may serve as a proxy of the true value for locations where sensor measurements are unavailable.
We note that obtained from equation 2 is available only when since the temporal resolution of NAM data is 6 hours. For the purpose of inference, we assume that the data is available at all instants in the past by enforcing a firstorder hold over the data . Thus, the error measurements are well defined for every .
Given the error population , let denote the dataset available at the time instant containing the error data , where . Here denotes the periodicity factor, which determines the grouping of the elements into for each in . In other words, the dataset is a collection of all the error data such that the corresponding time instants and leave the same remainder when divided by .
Based on the error datasets constructed above, we proceed to infer the error distributions. Let the error generating process be represented by a stochastic process . Let its distribution be denoted by , where refers to the underlying dataset used for the inference. We also assume that is periodic with a time period of . Since the distribution is assumed to be periodic, the underlying dataset at a given time instant in the future can be represented by a dataset in the past if and only if the following hold true:
(3) 
where, is the time instant in the past whose error dataset is used to infer the error distribution of at .
Given the characterization of the stochastic process as described above, we employ a linear signalnoise regression model to predict the measured value as shown below. While this model is used to demonstrate the proof of concept, it may be noted that similar regression models incorporating weather forecasts can be used poolla2018solar ().
(4) 
In this manner, the distribution of the estimated irradiance measurements is computed based on the error distribution of the stochastic process and the deterministic forecast . Thus the seasonal component in the estimation of is introduced by .
Similar to the irradiance estimation performed above, other weather variables may be used to obtain the distributions of various stochastic processes representing different aspects of weather such as temperature, cloudiness etc. With the knowledge of these weather fluctuation distributions, models that map the environment states to the solar power output can be used to obtain the solar power distribution. Although any appropriate model can be used, we resort to a linear model for the proofofconcept illustration. The resulting solar power distributions are denoted by , where represents the time instant and represents the states of the photovoltaic generation.
2.3 Load
In the building power system, loads could broadly be categorized into Heating Ventilation and Air Conditioning (HVAC), lighting, and plug loads. The overall load is then the aggregate over all individual loads at every time instant . In this work, hourly load data associated with San Francisco residential buildings were obtained from the openEI database openei_data (). Similar to the photovoltaic generation model, the load is modeled as a discrete time discrete state Markov chain with states . The state transition probabilities of the model are assumed to be cyclostationary with a time period of 24 hours. Let the state distributions be denoted by , where denotes a state of the load variable ().
2.4 Energy Storage
The objective of furthering renewable penetration and simultaneously maintaining grid stability can be practically realized by incorporating energy storage systems Rodrigues2014265 (). In a more localized environment like a micro/nano grid, storage systems also help achieve selfreliance independent of the main grid and enable reduction in cost of electricity consumption. In this work, the nanogrid storage system is represented by a dynamical system with capacity . There are several approaches to modeling the storage dynamics with varying levels of complexity he2011evaluation (). To illustrate the concept of energy management in the context of a nanogrid, we resort to a linear dynamic representation of the storage system similar to wang2014dynamic ():
(5) 
where, represents the energy content (state) of the energy storage at and represents the net power output from the storage system from the current time instant until the next instant .The power flow constraints associated with the storage system are represented by . Further, the selfdischarge losses Salameh124547 () are represented by the storage efficiency factor and the closedcircuit losses by the factor ^{7}^{7}7Note that during charging and during the discharge phase. In other words, more power is discharged from the storage system compared to the storage output and less power is used to charge the storage system compared to the storage input due to losses incurred. erdinc2009dynamic ().
2.5 Grid Transactions
The electric grid facilitates the transfer of power from the suppliers to the consumers over transmission and distribution networks. In this work, we consider the transactions between the grid and the nanogrid to be lossless and unconstrained but associated with a cost to illustrate the concept. The decisions driving these transactions are enabled by a DecisionMaking Unit (DMU). A functional representation of the nanogrid along with its DMU is shown in Figure 1. The notations for the transactions and the associated pricing scheme used in the model are described below.
Let the power flow from the grid to the nanogrid DMU at any be denoted by ^{8}^{8}8We adopt a sign convention which considers the power flows into the DMU as positive.. The associated monetary cost incurred by the nanogrid depends on the quantity of power transacted as well as the pricing scheme enforced by the local utility company or the energy market federal2012energy (). We consider a deterministic pricing scheme where the purchase prices () are based on schedule E6 pge_residential_e6 () and the selling prices () are based on ESRG PPA pge_srg_ppa (). Thus, the monetary cost incurred due to the grid transactions over the horizon can be written as:
(6) 
where, positive indicates the cost to be paid by the nanogrid to the grid on the account of the grid transactions .
2.6 Nanogrid Topology
In a distributed power system consisting of several generation units, loads, storage systems and the main grid, the interconnection topologies play a critical role in routing power effectively. From a decisionmaking perspective, there exist interconnected designs with varying levels of decentralization, each differing in autonomy and complexity. In the nanogrid under consideration, we employ a centrally interconnected design wherein all the nanogrid components are connected via the DMU as depicted in Figure 1.
2.7 Nanogrid DMU
The nanogrid decisionmaking unit provides the interface between the nanogrid and the main grid to enable power transactions. In general, smartgrid power systems consist of two interdependent networks, (1) A communication network for routing information and (2) A power network for routing power. Depending on the states of the storage system, the generation, the load, and the electricity pricing, the DMU centrally computes the power flow decisions in the network. It is assumed here that the information required for making power flow decisions is available to the DMU instantaneously via the communication network.
The DMU determines the power flows and in the power system based on various considerations. Let the generation output and the load at an instant be denoted by and load respectively. Here and represent the realizations of the stochastic process and at the time in the real world. Given the power flows and , the decisions computed at the DMU (, ) must result in the power balance at every time instant as shown:
(7) 
2.8 System Parameters
Decisionmaking within the nanogrid model requires information about the system parameters. Let the system parameters at be denoted by . Also, let be classified into the deterministic parameters of the system (denoted by ) and the distribution of the random parameters the system (denoted by ). The deterministic parameters consist of and the distribution parameters of the stochastic processes consist of . Thus, the system parameters can be written as .
3 Policy Formulation
The nanogrid model above enables structured powerflow decisionmaking. Specifically, the information about the state is useful in determining the decision pair (). The mapping between the state space and the decision space of the system is provided by a policy, denoted by . The nanogrid system policy consists of a collection of two policies, namely the storage transaction policy and the grid transaction policy . Thus, the policy map between the states () and the decisions (,) can be written as:
(8) 
where, the admissible decision space is represented by and denotes the storage state space. It must be noted that the admissible decision space consists of the decisions which, when implemented do not result in the violation of the system constraints (Refer to A for details). Also, the two decisions ( and ) are not independent. Once one is chosen, the other is determined by the power balance (equation 7). In what follows, we describe heuristicsbased policies, also known here as ’naive’ policies:

Policy 1: Exhaustive Storage Dependence Policy (): Given the noncontrollable supply and demand , the policy is designed to bridge the demandsupply gap by depending on the storage transactions . When the storage resource can no longer be used to maintain power balance, the policy resorts to the grid transactions to bridge the demandsupply gap. The decisionmaking process for this policy () is illustrated by the flowchart in Figure 2.

Policy 2: Finite Horizon Lookahead Policy with a threehour lookahead (): This policy is designed to make informed decisions based on the present nanogrid state as well as the expected state over a finite horizon (future). For numerical evaluation, a three hour ahead horizon is chosen. With these considerations, four scenarios are possible: (i) excess supply in the present and excess supply cumulatively expected over the horizon, (ii) excess supply in the present and deficit supply cumulatively expected over the horizon, (iii) deficit supply in the present and excess supply cumulatively expected over the horizon, and (iv) deficit supply in the present and deficit supply cumulatively expected over the horizon. When there is excess supply in the present and excess supply is cumulatively expected over the horizon, the storage resource is charged before depending on the grid for power balance, thereby accommodating excess generation during the present and over the future horizon relying primarily on the storage resource. In the scenario with deficit supply in the present and deficit supply cumulatively expected over the horizon, the storage resource is halfdischarged to meet the deficit before depending on the grid. The rationale underlying this mechanism is the policymaker utilizes half the storage resource to meet the present deficit and the rest is retained to meet the deficit expected over the future horizon. In the remaining scenarios, the policy relies on the generation and the grid for power balance, leaving the storage resource unused. The decision flow corresponding to the policy is illustrated by the flowchart in Figure 3.
4 Optimal Power Flow Problem
Given heuristicsbased policies such as the ones described above, it is important to compare the performance of these policies based on cost metrics, one of which was described in equation 6. This allows the policymaker to order and choose policies based on their cost. In this work, we attempt to design policies that result in decisions minimizing the nanogrid cost. We consider the scenario where the cost metric describes the expected monetary cost over a finite horizon . Thus, the cost metric depends on the costs associated with both the grid transactions as well as the expected storage state at the end of the horizon. The storage state at the end of the horizon is unrealized at all instants and is hence regarded as a random variable . The probability distribution of can, however, be computed with the knowledge of (i) probabilistic supply and load distributions and respectively over the instants , and (ii) the storage dynamics described in Equation 5. Therefore, the cost metric over the horizon^{9}^{9}9In the dynamic programming formulation of decisionmaking problems, the cost metric over the horizon is also known as the costtogo from stage to . can be written as:
(9) 
where, denotes the indicator function, denotes the expectation operator with respect to the solar and load distributions, and denotes the system parameters at the time instant . While the above definition of cost captures the essential monetary costs, we modify the cost function to force the optimal decisionmaker to ensure the storage is completely charged by the end of the horizon. This is achieved by introducing a multiplier in the terminal cost, thereby transforming it into . Thus, the redefined cost over the horizon becomes:
(10) 
Given the above cost metric , the problem of interest is to compute the optimal policy that minimizes ^{10}^{10}10Though the optimal policy minimizes , which does not represent the monetary cost, we evaluate the policies in Section 6 based on their monetary cost . over the entire horizon .
5 Optimal Policy Computation
In order to solve for the optimal policy, we first note that the states (), the decisions , the Markovian state transitions (Equation 5), and the cost metric (Equation 10) together constitute a Markov decision process (MDP) framework. Within this framework, the policymaker enforces a decision on the system and the system state responds by randomly transitioning into a new state while incurring a transition cost .
The problem of the policymaker is to obtain the policy (), which, when enforced, results in decisions that minimize the cost over the optimization horizon . The optimal policy can be obtained by solving the following optimization problem:
(11) 
where, denotes the random variable representing the uncertain state of the storage in the future. The corresponding optimal cost then becomes:
(12) 
where, is known as the value of the state at the time instant . The function , known as the value function maps the states and parameters at the time instant to a real value as shown:
(13) 
In other words, the value function at the instant describes the optimal costtogo from stage through the final stage .
We address the above discrete time stochastic dynamic optimization problem by applying the principle of optimality and solving the resulting subproblems using a dynamic programming (DP) formulation. The sequence of solutions to these subproblems are obtained using the backward induction method, wherein the updated values at the previous time step are obtained by solving the Bellman equation. In our work, the Bellman equation is solved by interpolated value function approximations. In what follows, we describe the details of the DP formulation and the solution methodology to arrive at nearoptimal solutions to the problem described in Equation 5. By applying the principle of optimality, we can transform the decisionmaking problem into a sequence of subproblems as shown:
(14) 
Using the definition of the value function in equation 13, we rewrite the equation 14 as:
(15) 
Continuing the course of breaking down into subproblems, we arrive at the following generalized equation:
(16) 
which describes a recursive relationship between the values at consecutive time steps. This equation, known as the Bellman equation, offers a recursive method to compute the values at the current instant of time based on the knowledge of values at the next of time. We apply backward induction to solve for the values in the Bellman equation 16, starting with the end of the horizon and solving backwards in time.
To solve for the value function as well as the optimal decisions, we resort to a numerical approach. First, we observe that the domain of the value function is the continuous state space . We proceed to quantize the state space only for the purposes of computing the values and the optimal decisions, thereby introducing suboptimality into the solution. Let the discretized version of the continuous state space be represented by the finite sequence .
At the end of the horizon , the values are computed for the quantized state space :
(17) 
At every previous time step , the decisions resulting in the minimum expected costtogo are computed by solving equation 16 across the feasible decision space (see A). Thus the result of feasible decision space constraints, discretizing the state space, and approximating the value function is a suboptimal solution which provides a sequence of suboptimal decisions that constitutes the nonstationary nearoptimal policy .
6 Results and Discussion
We analyzed the weather forecast data from the NAM model in light of both the sensor measurements as well as the true data as provided by NAM (0hr ahead forecast). Results from the error distribution were used to determine the probability distributions of the solar generation process. The results from the NAM model are summarized below.
6.1 Weather Results
To analyze the similarity between the true data provided by NAM (0hr ahead forecast) and the sensor measurements, we compared both these datasets over a period of two months (Aug  Oct 2014). Example comparisons at 11 PM PT and 5 PM are plotted in Figures 5 and 5, respectively. These figures show that the NOAA NAM model output and the sensor measurements exhibit similar trends. Further, the Root Mean Square (RMS) errors were computed based on percentage errors between NOAA and sensor measured data. The mean error was found to be along with a standard deviation of . These error characteristics can be attributed to several factors including coarse spatial granularity, lack of local information about shadows or dust patterns, modeling error, and low update frequency of the NAM model (four per day).
Since the NAM model forecasts up to 84 hours ahead, we compared the accuracy of the forecasts against several prediction window lengths assuming the 0hr ahead forecasts represented the ground truth. The comparison depicting the absolute RMS errors between the NAM truth (0hr ahead) and the NAM forecasts up to 84 hours ahead is shown in Figure 6. The corresponding percentage accuracies relative to the NAM truth data is indicated in green. The overall accuracy across various prediction window lengths was found consistent with a mean of and a standard deviation of . Thus the use of 84hour ahead NAM forecast appears to provide accuracies comparable to that of a 6hour ahead forecast for the case under consideration.
Based on the error distributions obtained from the irradiance archives, we estimated the distributions of the measured irradiance state with the knowledge of the forecasts (equation 4).
6.2 Supply and Demand Model results
Based on the distribution of the measured irradiance estimates described above, the distributions of the stochastic solar generation were obtained. Assuming an installation capacity of under standard conditions, the expected solar generation was computed based on the above computed distribution and is shown in Figure 8. As aforementioned, the NAM model provides data at a temporal granularity of 6 hours (4 per day). However, the forecast as well as the error datasets were reconstructed at an hourly granularity based on a firstorder hold. In this manner, we obtained the hourly resolution depicted in the Figure 8.
The distribution for the load stochastic process was computed similarly from the power consumption data of typical residential building in San Francisco provided by the openEI database openei_data (). The expected values of the stochastic process based are depicted in Figure 8. While the load variation appears relatively smooth due to coarse temporal granularity, it must be noted that the daily load curves representing building demand vary abruptly in realtime mathieu2011quantifying ().
6.3 Nanogrid Simulation
We simulated the nanogrid model under the action of both the naive and the nearoptimal policies. The parameters of the energy storage system were obtained from the specification sheet in motors2015powerwall (). For the ease of analysis, the storage factors were set to in the simulation (though it is does not occur in practice).
The expected state evolution under the action of policy is shown in Figure 10. We note that the policy was designed to primarily rely on the storage resource. In other words, we expect the policy to charge the storage device in the case of excess solar generation and discharge during deficit generation prior to any dependence on the grid. The same behavior can be observed from the bottom subplot of Figure 10. There exists deficit generation between the time periods and hours during which policy discharges the storage as evident from the positive output power and battery decrements in the other subplots. Similarly, excess generation during hours results in the policy driving decisions to charge the storage resource.
In case of the finite horizon look ahead policy design, the decisions are influenced by both the current demandsupply offset as well as the expected offset over the threehour finite horizon as described in the Figure 3. These decisionmaking factors are shown in the bottom subplots of Figure 10. To analyze the results, we observe three scenarios in relation to the sign of the imbalance (solarload) between the present and expected future. Specifically, the first scenario consists of hour timeinterval during which excess generation is expected in the present and the future. In this case, the look ahead policy decides to charge the storage as evident in the Power output from Battery kW subplot in Figure 10. Similarly, in the second scenario ( hours) both the present and the future expect a generation deficit, in which case, the policy decides to discharge the battery to address the deficit, and this behavior is evident in the abovementioned subplot. In the third scenario, wherein the sign of the present imbalance differs from that of the expected future ( hours), the policy is designed to not change the storage state, and this behavior can be observed from the same subplot in 10.
The nearoptimal policy described in Section 5 was computed by backward induction. By using the values computed at the forward time instant , nearoptimal decisions and current values at that minimize the expected cost were computed using the Bellman equation 16. The nearoptimal policy along with its impact on state evolution is shown in figure 12. These nearoptimal decisions depend on several factors that nontrivially combine together to result in the least expected costtogo over the optimization horizon in the future. These factors include the solar generation, demand, cost price, selling price, the current state of the system, and the distributions of the solar and load processes based on the weather forecasts.
It can be observed that the nearoptimal policy makes the decision to sell power to the grid when selling prices are highest ( hours). Further, the policy ensures full charge both by the end of the optimization horizon (prior to the 23rd hour) as well as by noon (12th hour). The former can be attributed to the large value of terminal cost that would be incurred if the battery is not fully charged. However, in case of the latter the charging process occurs gradually up to the 12th hour despite the usual cost price. This motivation can be understood by noting that the policy decides to charge the battery by the 12th hour anticipating a maximal profit (minimal cost) in discharging it into the grid when the selling prices are the highest during hours. Therefore, the additional complexity involved in computing the nearoptimal policies results in smart costconsiderate decisions with the objective of realizing the nearly least (nearoptimal) cost.
The costtogo values corresponding to the nearoptimal and the naive policies are shown in figure 12. It can be observed that the value function in the top subplot (shown in green) is not necessarily monotonic due to the reversal of the instantaneous grid transaction costs (shown in bottom subplot), on the account of switching between purchasing and selling power. Further, we note that the values of the value function are greater in magnitude than their heuristic counterparts over certain intervals.
This does not go against the definition of the value function or the principle of optimality for the following reasons. The value function is a function of the state and therefore its value can only be compared against the costs arising from the implementation of the other policies from the same state. However, as evident from Figures 10, 10, and 12, the states at each point in time are not necessarily the same in the state evolution under the action of each policy. Therefore, the principle of optimality and the definition value function are not violated on this account. For the correct comparison of the values from the value function to the coststogo from other policies, all the initial conditions of the storage state were set to , and hence as we expect from the principle of optimality the computed nearoptimal costtogo at the start is less than the coststogo incurred by the naive policies. Therefore, the cost results from the simulation are consistent with the definition of the value function and the principle of optimality.
The performance of the policies were simulated over a month and are compared to the monthly average residential electricity bill in San Francisco. These results are shown in the bar chart 13 wherein the red bar depicts the average monthly residential electric bill in San Francisco with a value of based on the pricing data from sfo_elec_price (). The orange bar depicts the cost based on simulating the system model with only the load, solar, and the pricing models. The resulting cost was evalulated to be . Though the system model is not an accurate representation of the underlying reality, this comparison indicates that the results obtained by the simulating the model with the chosen parameters are in reasonable agreement (within error margin) with the monthly average electricity cost in San Francisco.
After including the storage resource, the monthly operational costs under the action of policy 1 and policy 2 were found comparable to each other at and respectively. As expected, the nearoptimal policy outperforms its heuristic counterparts, resulting in a cost of (profit of ) over the month. In other words, the nearoptimal policy is expected to generate operating rewards while the naive policies incur relatively tangible operational costs. However, the reader may note that apart from the modeling inaccuracies, the other significant costs incurred in installation and maintenance must be considered for an accurate overall cost comparison to a monthly electricity bill. Nevertheless, the above results indicate the payback period using nearoptimal policies is expected to be tangibly lesser than that of their heuristic counterparts.
7 Conclusion
The problem of designing optimal policies for energy management was treated in the context of uncertain renewable supply and demand. The uncertain power flows from the solar and load sources were modeled as first order Markov processes in discrete time and space. Probability distributions of these solar and load processes were obtained using historical weather data and residential demand data, respectively. Specifically, the power distributions of the solar Markov model were obtained using the forecast and error distributions inferred from the NOAA NAM data. The power distributions of the load Markov model were inferred from the openEI residential dataset for San Francisco. Along with a first order dynamical storage model, the demandsupplystorage framework was formulated within which decision problems were examined. Naive policies were proposed based on heuristic considerations and their performance was evaluated using a cost function with appropriate pricing models. Thereafter, the optimal decision problem was posed and a cost minimizing policy was sought within the Markov Decision Process (MDP) framework using stochastic dynamic programming (SDP). The SDP approach was implemented by discretizing the state space, constraining the computable feasible decision space (A), and approximating the value function, thereby resulting in suboptimality of the solution. The nearoptimal policy was thereafter computed using backward induction. Resulting simulations suggest that the nearoptimal policy outperforms naive policies by with respect to operating costs over a monthly optimization horizon. Future work can investigate the use of continuous statetime formulations, nonlinear battery models, stochastic pricing schemes, partially controllable supplydemand processes, partially observable storage states, grid constraints, and problems pertaining to a network of interconnected nanogrids.
Acknowledgments
The authors would like to thank Cisco Systems, Inc. for its support.
Appendix A Handling constraints in the decision problem
The constraints in the decision problem include the following: (i) the power limits during the storage chargedischarge process, (ii) energy storage state limits, and the (iii) power balance constraint (equation 7).
The decisions that do not violate the above constraints are referred to here as the feasible decisions. Similarly, the policies that result in such feasible decisions shall be known as the feasible policies. We now attempt to determine the feasible decision space, as it is required to solve the Bellman equation 16. In what follows, and represent a realization of the stochastic processes and respectively at the time instant and represents the storage state at .
a.1 Handling State Constraints
Given the dynamics of the battery (equation 5) and the state constraint kWh at , the following can be stated by taking advantage of the discrete time formulation. If the state constraint needs to be satisfied at any time step assuming it holds at , then the following holds true:
(18) 
Let the decision constraint in equation A.1 be written as . It is easy to observe that, if and , then the above derivation implies that the state constraint at is satisfied.
a.2 Handling Power Constraints
Since the storage power flow must be within the limits kW, we can use equation 7 to make the following claim:
(19) 
Let the decision constraint in equation A.2 be written as . It is easy to observe that, if , then the battery power constraints are satisfied as shown above.
Since the equations A.1 and A.2 constrain the same decision variable , the feasible decision space can be obtained by the intersection of these constrained spaces. Let the intersection be represented by . Thus can be written as,
(20) 
From equation 20, we observe the following:

The decision space is guaranteed to have a positive Lebesgue measure, since and but both cannot the value 0 simultaneously. Thus the existence of a nonempty feasible decision space is guaranteed by definition.

The bounds of the decision space depend on the realizations and of the stochastic processes and respectively. However, during the optimal policy design phase, the realizations of the stochastic processes and are unknown until the time instant occurs in the real world.

Despite the guaranteed existence of a feasible decision space, it is unknown on the account of uncertainty in the generation and load processes.
In order to eliminate the dependence of the feasible decision space on the unknown solar and load in the real world at , we use the information about the bounds on the realizations of the stochastic processes and . Let the range space of these random variables at time instant be represented by: (i) for the solar generation, and (ii) for the load demand. Since the solar generation and load are bounded in the real world, the bounds are physically welldefined. Using these bounds, we construct a subset of the feasible space and call it the computable feasible decision space () as follows:
(21) 
It is easy to verify that is constructed by the intersection of the feasible decision spaces across all sample paths with nonzero probability. In other words, , where .
Though the computable feasible decision space introduces suboptimality, it is nevertheless a sufficient condition to ensure that the system constraints are upheld under all possible realizations of the stochastic processes. However, its existence is contingent on the measure of being welldefined. Therefore, the necessary conditions for the existence of a computable feasible decision space of nonzero measure are:
(22)  
It is easy to verify that, if , then the inequation 21 holds true. Hence, the inequation provides a stronger condition for the existence of a nonempty feasible decision space . Progressively stronger sufficiency conditions can be derived as follows:
(23)  
(24)  
(25) 
Note that the lefthand side (LHS) of equation 23 represents the maximum possible demandsupply offset gap at the instant , the LHS of equation 24 represents the maximum demandsupply sum at the instant , and the LHS of A.2 maximum possible demandsupply over the horizon . In each of these sufficiency conditions, the righthand side represents a timeindependent expression dependent on a subset of the storage parameters .
Equations 21A.2 represent the worst case sufficiency conditions that ensure that the existence of a corresponding computable feasible decision space despite the unknown realizations of the solar and load processes. Further, given any grid transaction decision , it is ensured that the corresponding feasible battery decision space is the same as the space defined by the battery constraint since is equivalent to equation A.2 by definition.
Let the parameters satisfying the sufficiency condition^{11}^{11}11Since the conditions represented by equations 21A.2 are progressively stronger, satisfying any of these equations ensures that equation 21 is satisfied. in equation 21 belong to the space , where refers to the dimension of . We call as the computable feasible configuration space of the system for which a computable feasible decision space exists. Therefore, satisfying the sufficiency condition guarantees the existence of such computable feasible decision spaces which are required to design the nearoptimal policy. Only decisions belonging to the computable feasible decision space are considered admissible for computing the nearoptimal policy, hence we also refer to this decision space as the admissible decision space for the optimal decisionmaking problem.
In summary, a feasible decision space is guaranteed to exist for every realization of the stochastic processes (equation 20), but is unknown at the time of designing the optimal policy. This is because such design (equation 16) involves computing the expected state at the next instant of time , and thus requires that all possible realizations of be precomputed, accounting for every possible realization of the load and solar stochastic processes. In the precomputation step, the realization of (= ) can be represented as a function of the decision . Thus the state constraints translate into corresponding control constraints given the realizations of the load and solar stochastic processes. In conclusion, designing a nearoptimal policy using stochastic dynamic programming is feasible only if the inequalities A.2A.2 are satisfied, thereby ensuring a nonempty admissible decision space ().
References
 (1) A. Sarzynski, The impact of solar incentive programs in ten states, George Washington Institute of Public Policy, Washington, DC.
 (2) A. Zervos, Renewables 2013 global status report, Renewable Energy Policy Network for the 21st Century. Paris, France.
 (3) A. Battaglini, J. Lilliestam, A. Haas, A. Patt, Development of supersmart grids for a more efficient utilisation of electricity from renewable sources, Journal of cleaner production 17 (10) (2009) 911–918.
 (4) J. P. Painuly, Barriers to renewable energy penetration; a framework for analysis, Renewable energy 24 (1) (2001) 73–89.

(5)
U. D. of Energy,
Smart grid: An
introduction.
URL https://www.smartgrid.gov/files/sg_introduction.pdf 
(6)
E. Rodrigues, R. Godina, S. Santos, A. Bizuayehu, J. Contreras, J. CatalÃ£o,
Energy
storage systems supporting increased penetration of renewables in islanded
systems, Energy 75 (2014) 265 – 280.
doi:http://dx.doi.org/10.1016/j.energy.2014.07.072.
URL http://www.sciencedirect.com/science/article/pii/S0360544214008949  (7) H. Farhangi, The path of the smart grid, Power and Energy Magazine, IEEE 8 (1) (2010) 18–28.
 (8) J. Mitra, S. Suryanarayanan, System analytics for smart microgrids, in: Power and Energy Society General Meeting, 2010 IEEE, 2010, pp. 1–4. doi:10.1109/PES.2010.5589700.

(9)
M. F. Zia, E. Elbouchikhi, M. Benbouzid,
Microgrids
energy management systems: A critical review on methods, solutions, and
prospects, Applied Energy 222 (2018) 1033 – 1055.
doi:https://doi.org/10.1016/j.apenergy.2018.04.103.
URL http://www.sciencedirect.com/science/article/pii/S0306261918306676 
(10)
A. AgÃ¼eraPÃ©rez, J. C. PalomaresSalas, J. J. G. de la Rosa,
O. FlorenciasOliveros,
Weather
forecasts for microgrid energy management: Review, discussion and
recommendations, Applied Energy 228 (2018) 265 – 278.
doi:https://doi.org/10.1016/j.apenergy.2018.06.087.
URL http://www.sciencedirect.com/science/article/pii/S0306261918309565  (11) L. Hernández, C. Baladrón, J. M. Aguiar, L. Calavia, B. Carro, A. SánchezEsguevillas, D. J. Cook, D. Chinarro, J. Gómez, A study of the relationship between weather variables and electric power demand inside a smart grid/smart world framework, Sensors 12 (9) (2012) 11571–11591.
 (12) T. Wang, D. OâNeill, H. Kamath, Dynamic control and optimization of distributed energy resources in a microgrid, IEEE Transactions on Smart Grid 6 (6) (2015) 2884–2894. doi:10.1109/TSG.2015.2430286.
 (13) H. Kanchev, D. Lu, F. Colas, V. Lazarov, B. Francois, Energy management and operational planning of a microgrid with a pvbased active generator for smart grid applications, Industrial Electronics, IEEE Transactions on 58 (10) (2011) 4583–4592.
 (14) J. Song, V. Krishnamurthy, A. Kwasinski, R. Sharma, Development of a markovchainbased energy storage model for power supply availability assessment of photovoltaic generation plants, IEEE Transactions on Sustainable Energy 4 (2) (2013) 491–500.
 (15) E. R. Sanseverino, M. L. Di Silvestre, M. G. Ippolito, A. De Paola, G. L. Re, An execution, monitoring and replanning approach for optimal energy management in microgrids, Energy 36 (5) (2011) 3429–3436.

(16)
M. H. Amrollahi, S. M. T. Bathaee,
Technoeconomic
optimization of hybrid photovoltaic/wind generation together with energy
storage system in a standalone microgrid subjected to demand response,
Applied Energy 202 (2017) 66 – 77.
doi:https://doi.org/10.1016/j.apenergy.2017.05.116.
URL http://www.sciencedirect.com/science/article/pii/S0306261917306207  (17) O. Ardakanian, S. Keshav, C. Rosenberg, Markovian models for home electricity consumption, in: Proceedings of the 2nd ACM SIGCOMM workshop on Green networking, ACM, 2011, pp. 31–36.

(18)
M. Muselli, P. Poggi, G. Notton, A. Louche,
First
order markov chain model for generating synthetic âtypical daysâ series
of global irradiation in order to design photovoltaic stand alone systems,
Energy Conversion and Management 42 (6) (2001) 675 – 687.
doi:http://dx.doi.org/10.1016/S01968904(00)00090X.
URL http://www.sciencedirect.com/science/article/pii/S019689040000090X  (19) D. Sera, R. Teodorescu, P. Rodriguez, Pv panel model based on datasheet values, in: Industrial Electronics, 2007. ISIE 2007. IEEE International Symposium on, IEEE, 2007, pp. 2392–2396.
 (20) C. Poolla, A. Ishihara, S. Rosenberg, R. Martin, A. Fong, S. Ray, C. Basu, Neural network forecasting of solar power for nasa ames sustainability base, in: Computational Intelligence Applications in Smart Grid (CIASG), 2014 IEEE Symposium on, IEEE, 2014, pp. 1–8.

(21)
N. Oceanic, A. Administration, Noaa’s
national centers for environmental prediction.
URL http://www.nco.ncep.noaa.gov/  (22) C. Poolla, A. K. Ishihara, Localized solar power prediction based on weather data from local history and global forecasts, in: Proceedings of the 45 IEEE Photovoltaic Specialists Conference (PVSC), IEEE, June 2018.

(23)
OpenEI,
Commercial
and residential hourly load profiles for all tmy3 locations in the united
states (2013).
URL http://en.openei.org/datasets/dataset/commercialandresidentialhourlyloadprofilesforalltmy3locationsintheunitedstates  (24) H. He, R. Xiong, J. Fan, Evaluation of lithiumion battery equivalent circuit models for state of charge estimation by an experimental approach, Energies 4 (4) (2011) 582–598.
 (25) Z. M. Salameh, M. A. Casacca, W. A. Lynch, A mathematical model for leadacid batteries, IEEE Transactions on Energy Conversion 7 (1) (1992) 93–98. doi:10.1109/60.124547.
 (26) O. Erdinc, B. Vural, M. Uzunoglu, A dynamic lithiumion battery model considering the effects of temperature and capacity fading, in: Clean Electrical Power, 2009 International Conference on, IEEE, 2009, pp. 383–386.
 (27) F. E. R. Commission, et al., Energy primer, a handbook of energy market basics (2012).

(28)
P. Gas, Electric,
Electric
schedule e6.
URL http://www.pge.com/tariffs/tm2/pdf/ELEC_SCHEDS_E6.pdf 
(29)
P. Gas, Electric,
Small
renewable generator power purchase agreement.
URL http://www.pge.com/includes/docs/pdfs/b2b/energysupply/wholesaleelectricsuppliersolicitation/standardcontractsforpurchase/ELEC_FORMS_791103_Nov2012.pdf  (30) J. L. Mathieu, P. N. Price, S. Kiliccote, M. A. Piette, Quantifying changes in building electricity use, with application to demand response, Smart Grid, IEEE Transactions on 2 (3) (2011) 507–518.

(31)
T. Motors, Powerwall tesla home
battery.
URL https://www.tesla.com/powerwall 
(32)
N. R. E. L. (NREL),
Electricity
local san francisco rates (data cited from nrel).
URL https://www.electricitylocal.com/states/california/sanfrancisco/