Online Stochastic Control of Discrete Loads in Distribution Grids
Abstract
This paper considers power distribution grids with distributed energy resources (DERs) and flexible controllable loads, and designs an incentivebased algorithm to coordinate the network operator and customers to pursue personal and systemwide operational and economic objectives. Heterogeneous DERs and controllable loads with continuous and discrete control commands and various dynamics are considered. We first relax the discrete feasible sets into their convex hull and formulate a welldefined socialwelfare optimization problem with general objective functions and constraints. We then propose an offline distributed stochastic dual algorithm for solving the relaxed problem while recovering original discrete set points at every iteration. We further implement our design in an online setting with timevarying optimal operation points, nonlinear power flow equations and device dynamics, and asynchronous updates from devices. Stability and performance characterization of both offline and online algorithms are analytically established and numerically corroborated.
Discrete decision variable, realtime pricing, asynchronous updates, stochastic dual algorithm, distribution networks, timevarying optimization.
I Introduction
Power grids are enjoying an increasing control flexibility in both power supply and demand sides thanks to growing penetration of distributed energy resources (DERs) and controllable loads on the distribution level, including rooftop photovoltaic (PV) panels, electrical vehicle (EV) batteries, thermostatically controlled loads (TCLs, e.g., water heaters and airconditioners (A/Cs)), etc. While these devices of flexibility may potentially provide ancillary services for the grids [dall2017unlocking], coordinating millions of prospective controllable devices of various dynamics and constraints with potential discrete decision variables to achieve networkoriented goals, e.g., voltage regulation, frequency control, and economic dispatch, is extremely challenging. Moreover, unlike traditional facilities owned by utility companies, the mass customerowned devices are not naturally subject to control of network operators. This motivates us to design a distributed algorithm that can bring selfinterested customers into the control loop such that networkoriented goals and constraints are guaranteed through inducing the desired customer behaviors by appropriate market mechanisms.
Ia Marketbased Distribution Networks Control
Marketbased algorithms have been recently developed to explore control flexibility of distributed energy assets owned by customers. The objective is to incentivize customers to provide ancillary services to the grids based on maximizing their own economic benefits and performance objectives [mohsenian2010autonomous, Maharjan13, Pedram14]. For example, customers may be incentivized to adjust the output powers of DERs or the power consumption of controllable loads in real time to aid voltage regulation [Liu08], control the aggregate network demand [li2016market], and follow regulating signals [vrettos2016robust].
In [zhou2017incentive2], we consider a social welfare optimization problem modeling operational objectives of both network operator and customers as well as voltage constraints. The formulated problem is nonconvex due to its Stackelberg game structure, even though a linear approximation of the nonlinear powerflow equations is utilized. We reformulate the problem as a convex problem together with an optimal incentive signal design strategy, and prove that the reformulated problem is an exact relaxation of the original one, i.e., the solution of the relaxed problem is a global optima of the original nonconvex problem. We further design a distributed algorithm based on primaldual gradient descent for solving the relaxed problem, such that the network operator and the customers pursue given operational and economic objectives, while concurrently ensuring that voltages are within prescribed limits.
IB Stochastic Dual Algorithm
While primaldual gradient algorithms work well for devices with continuous decision variables and free from dynamics, like PV inverters, the design may be infeasible in the presence of more complex devices with discrete decision variables and dynamics, which are common in practice. For example, A/Cs have discrete power consumption rates (on/off) with future room temperatures to be controlled within acceptable ranges, and EV batteries pose requirement upon their state of charge (SOC) with preferred charging/discharging rates that preserve the battery lifespan.
In light of the marketbased framework in [zhou2017incentive2], we seek a more practical algorithm to manage controllable devices with various dynamics by designing distributed algorithms based on dual ascent method. In our new design, the primal updates can be conducted through natural response expected from rational customers, i.e., to maximize individual utility subject to dynamics and constraints of devices given electricity price or ancillary service rewards, while the dual updates procedures, as part of the price/reward signal design, are carried out by network operator based on systemwide operational constraints. Our design presents a scalable way to handle devices of various dynamics by keeping the computation to their owners, which meanwhile also preserves customer privacy. Dual method is also consistent with smart home energy management system (HEMS), which usually achieve an optimal tradeoff between comfort and electricity bill.
In addition, we propose a stochastic scheme to handle discrete decision variables. Particularly, we first relax the discrete feasible sets to their convex hull, and then apply aforementioned dual method to solve the relaxed problem while recovering feasible set points based on the solution of the relaxed problem at every iteration. The dual ascend method together with the stochastic scheme constitute the stochastic dual algorithm we propose in this paper.
The convergence of dual method is generally not guaranteed even for convex problems, and depends on the stepsize choice [molzahn2017survey]. The additional stochastic process we apply makes it even more challenging. In literature, diminishing stepsizes are usually necessary to guarantee the convergence [ram2009incremental, zhu2012distributed, polyak1987introduction, boyd2006subgradient, zhou2017stochastic], which is not practical especially in realtime settings. Nevertheless, we are able to leverage recent insights in dual method to provide sufficient conditions under which the dual algorithm with constant stepsize nevertheless converges to its optima, whilst the stochastic dual algorithm with constant stepsize converges to the optima on average with a radius depending on the variance of random process. To the best of our knowledge, this is leading convergence characterization for dual method applied in power system.
IC Practical Implementation
This paper next considers implementation and analytical characterization of the designed algorithm under realistic surroundings, including timevarying optimal set points, nonlinear power flow model and device dynamics, and asynchronous updates of devices.
IC1 Timevarying optimization problem
In the presence of intermittent renewable energy resources and fluctuation of uncontrollable loads, the optimal operating points varies constantly. This requires a realtime algorithm to track the optimal power set points trajectories of DERS, controllable loads, and system status like voltage levels and net powers at substation in a fast yet accurate way.
IC2 Nonlinear dynamics
While the algorithm is designed based on linearized models of both power flow equations and device dynamics for computational tractability, the actual physical systems deviate from the approximated models. This motivates us to utilize feedback control where we feed the algorithm with data measured from real systems. This provides a good tradeoff between computation complexity and performance guarantee.
IC3 Asynchrony
Last but not least, we consider asynchronous updates from controllable devices with the following realistic motivation:

Customers are not obliged to synchronize their behaviors with each other; they instead make decisions at their convenience of arbitrary time.

Heterogeneous devices have different control granularity, e.g., PV inverters can change their set points in seconds, while A/Cs should not be switched on/off too often.

Synchronous updates from mass devices may cause detrimental crash and even outrage to the grids.

Synchrony can be seen as a special case of asynchrony.
We adapt our algorithm for realistic settings 1)–3), and deliver comprehensive performance analysis.
ID Literature Review & Contributions of This Work
Related works can be categorized as follows:
ID1 Discrete Decision Variables
In literature of power system, algorithms designed for discrete decision variables take either a deterministic way [bernstein2015design, liu1992discrete] which usually generates suboptimal solutions, or a stochastic way, e.g., [kim2013scalable, macfie2010proposed] which often lacks rigorous analytical performance characterization. Other works may rely on commercial solvers that may not be scalable: large number of devices and longer dispatch horizon require much more computation, and forbid realtime implementation.
ID2 Marketbased/Demandresponse Design
Existing literature of marketbased and demandresponse problem formulation mostly focus on demand/supply balancing, without considering network structure [mohsenian2010autonomous, Pedram14, Maharjan13, Tushar14, li2016market, li2011optimal, tsui2012demand]. In those that are networkrecognizant, [li2012demand] requires complex communication and constrained protocols, and [LinaCDC15] proposes a pricing strategy to internalize externalities of voltage constraints and line loss among customers but no feedback control is utilized and only real power is considered.
ID3 Controlling Devices with Flexibility
Controllable devices with dynamics and discrete decision variables are usually studied in two ways: (1) heuristic control that is not designed based on optimization framework, e.g., [mathieu2013state] controls TCLs based on temperature status and analyze the control performance, and (2) optimizationbased control that can integrate specific objective functions and constraints, e.g., [li2011optimal] considers device dynamics but does not involve discrete variables, and [tsui2012demand] considers device dynamics and discrete decision variables, but solves with commercial solvers.
ID4 Distributed Optimization and Control for Distribution Networks
While our theoretic results apply to general operation constraints, one of the major application is voltage regulation. Decentralized voltage regulation [farivar2015local, zhou2015pseudo, zhou2016incremental, simpson2017voltage]; distributed control [baker2017network, dall2017optimal]. Online optimization with feedback control is studied in [hauswirth2017online, dallanese2016optimal, gan2016online, bolognani2015distributed], to deal with nonlinear power flow models. We also refer to [molzahn2017survey] for a recent comprehensive survey.
To sum up, this paper makes the following contributions:

We design a marketbased distributed stochastic dual algorithm to solve the social welfare optimization problem with discrete decision variables and various dynamics, and prove its stability.

We adapt the algorithm for practical settings to handle timevarying optimization, nonlinear dynamics, and asynchronous updates, and characterize its performance.
The rest of this paper is organized as follows. Section II models the networked system and individual devices and formulates a social welfare optimization problem. Section III presents exact convex relaxation and an offline distributed algorithm for solving the problem. Section IV adapts the algorithm design into more practical online settings with nonlinear dynamics and asynchronous updates. Section V provides numerical examples to illustrate analytical results, and Section VI concludes this paper and discusses future works.
Set of nodes, excluding node ;  
Set of distribution lines  
Set of controllable devices of node  
Set of controllable devices of node that update at  
Noncontrollable power injection of node at time  
Power injection from device of node at time  
(Convex hull of) Feasible set of device at node  
Status of device from node at time  
Aggregated power injections of node at time ,  
,  
,  
Actual and linearized system states of node at time  
,  
Signal for real and reactive power of node at time  
,  
System states linearization parameters  
Identity matrix  
Identity vector 
Ii System Model and Problem Formulation
Iia Network Model
Consider a distribution network with nodes collected in the set with and distribution lines collected in the set . At time , denote by and the aggregated real and reactive power injections at node . Denote by the systemstates, e.g., voltage magnitude, power at substation, etc. usually has a nonlinear relationship with and but can be approximated by with a linear function of and as
(1) 
where are linearization parameters. We refer to [baran1989network, christakou2013efficient, guggilam2016scalable, bolognani2015fast, dall2017optimal] for multiple linearization methods for voltage magnitude and power at substation (and current magnitude?).
Remark
The linear model (1) is utilized to facilitate the design of computationallyaffordable algorithms. Such approximation usually introduces a bounded error in system states (see e.g., [chiang1990existence, zhou2016vvac, swaroop2015linear, bolognani2015linear]), which further results in a bounded discrepancy between the generated and the optimal operating points. In Section IV, In Section V, we will utilize nonlinear power flow model for numerical examples.
IiB Node & Device Model
IiB1 Nodal Power Aggregation
At node , assume that the aggregated power injections and stem from two sources: noncontrollable devices and controllable devices. Denote by and the overall real and reactive power injected by all the noncontrollable devices of node . Meanwhile, assume a selfinterested and rational [fudenberg1991game] customer in charge of all controllable devices collected in a set . Denote by and the real and reactive power injections of controllable device at time . Then the power aggregation at node is cast as:
(2) 
IiB2 Controllable Device Model
At each node, we consider the following two typical categories of controllable devices: continuousrate fastupdating devices (continuous devices or fast devices) whose power injections can be chosen from continuous feasible sets, e.g., PV inverters, collected in a subset , and discreterate slowupdating devices (discrete devices or slow devices) whose power injections are subject to discrete choices, e.g., TCLs, EV battery, etc., collected in a subset . We will use the terms fast/continuous devices, as well as slow/discrete devices, interchangeably for the rest of this paper. We also assume that discrete devices only inject real power for simplicity. Following is the modeling of the controllable devices in details.
(i) Fast devices: Let denote the feasible power injection set of a fast devices at time . is assumed to be convex and compact. For example, a PV inverter has in the form of:
(3) 
where denotes the available real power from a PV system at time , and constant is the rated apparent capacity.
(ii) Slow devices: As to a discrete device , denote by its feasible set of discrete power rates together with its potential temporal dynamics and constraints, e.g., A/Cs control room temperatures to be within acceptable ranges, and batteries have required SOC. Generally, for a device , let collect its devicestates from current time till future time . The relationship between and is modeled by an affine vectorvalued function :
(4) 
The requirement upon the devicestates can be represented by a general convex vectorvalued function as:
(5) 
Substitute (4) into (5) to rewrite (5) as a convex function of :
(6) 
Additionally, the power rates are confined as
(7) 
where is a discrete set collecting the feasible power rates of device , introducing nonconvexity. Therefore, the resultant is also discrete and nonconvex. Following are two examples of slow devices:
a). For an A/C , let be the room temperature (devicestate) to be controlled at time . Dynamics (4) can be represented for as:
(8) 
where
(9) 
are constants interpreted as conjectural room temperatures with zero power consumption, is the ambient temperature, and and are parameters specifying thermal characteristics of the A/C and its operating environment; see, e.g., [li2011optimal]. Constraint (5) can be specified, e.g., as
(10) 
confining room temperatures to within acceptable ranges. We further assume this A/C can only be turned on/off, i.e.,
(11) 
with working power rate . Then this A/C’s feasible set from to is , where the bold set notation represents the Cartesian product of the timevarying feasible sets over the windows starting from , e.g., .
b). For an EV battery , let be its SOC (devicestate) at time , and (4) representing the evolution of SOC for is written as:
(12) 
where is charging/discharging efficiency. And (5) can be required SOC by certain time to guarantee performance:
(13) 
We assume a feasible power rates set for battery as
(14) 
with preferred fixed charging power rate and discharging power rate that preserve battery lifespan. This battery’s feasible set from to is then .
Remark
While the actual devices dynamics are mostly nonconvex (e.g., different charging/discharging efficiencies for batteries, dynamics of room temperature evolvement, etc.), we utilize the linear approximation (4) for computational affordability. As will be shown in the proposed algorithms, a feedback control is applied to make up for the discrepancy between linearized and actual dynamics.
IiC Problem Formulation
The overall objective is to design a strategy where the network operator and the customers pursue their own operational goals and economic objectives, while concurrently achieving global coordination to enforce system constraints. In our design, incentive signals are designed to achieve this goal.
For ease of expression, we consider the following notations:
as well as the succeeding notations for collected power injections:
1a) Customer Problem: Let and be the incentive signal vectors for time sent by the network operator to customer for real and reactive power injections.
Let collect the devicestates of all devices of node at time , and consider a cost function that captures welldefined performance objectives of all devices at node within the coming timeslots. In light of being a function of , we utilize the form of for the objective function instead of for simplicity, keeping in mind that may characterize objectives directly related to the devicestates as well.
Recalling that the slow devices have discrete decision variables, making the potential optimization problem nonconvex, we next introduce convex relaxation for the slow devices by considering the convex hull of their discrete feasible sets, defined as
(15) 
Apparently the relaxed sets is convex and compact, and so are their cartesian product. We first solve the problem within the relaxed feasible sets and use stochastic schemes to recover the original discrete feasible power rates later. Let be an identity matrix and we make the following assumption.
Assumption
Functions are strongly convex in , i.e., there exists some such that . Moreover, the gradient of with respect to is Lipschitz continuous for any .
Remark
We assume strong convexity of for the sake of the existence and uniqueness of the solution. If in practice only a (not strongly) convex cost function is available, we can add a small regularization term to enforce strong convexity while introducing some tolerable discrepancy; see, e.g., [Koshal11].
Then for the customer at node , given signals , the following costminimization problem for the upcoming timeslots is solved at time :
(16a)  
(16b) 
where and represent (potential) payment to/reward from the network operator owing to power injection by device for time . Also notice that the devicestates constraints are implicitly included in (16b) as well.
Since (16a) is strongly convex in and that the constraints are convex, given and , a unique solution exists and can be efficiently computed. We present with a vectorvalued bestresponse function as
(17) 
1b) Recovering Feasible Power Rates: For the discrete device, the solution to the relaxed problem is usually not implementable. We consider recovering and implementing its feasible value for current time with the following stochastic scheme: given , we randomly select a feasible power rate based on some corresponding probability distribution such that , where calculates the expectation.
While there can be multiple ways to determine the probability distribution of feasible set points based on , we exemplify the procedure as follows with a twopoint distribution for illustrative purpose. We select two feasible power rates and denote them by and , such that . Then the related probability of the corresponding twopoint distribution is calculated as:
(18) 
where generates the probability of certain event.
Remark
While we recover the feasible power rates only for current time for immediate implementation, we leave the future arrangement unrecovered, since the values of future power rates are not required for now, and they will be recalculated next time any ways.
2) Socialwelfare Problem: Assume the following general network operational constraints for time :
(19) 
where is affine. Based on (1)–(2) is an affine function of ; we define the vectorvalued function that is still affine in for later use. We make the following assumption.
Assumption
is an affine function of with full row rank.
Based on Assumption IIC, the Jacobian of is bounded on , i.e., there exists some constant such that
(20) 
where denotes the Frobenius norm.
We formulate the following optimization problem to be solved by network operator, which captures both social welfare and systemstates constraints of a distribution feeder, as well as the response function of all customers:
(21a)  
(21b)  
(21g)  
where the total payment from/reward to the customers cancels out the total payment received/reward paid by the network operator; thus, it is not explicitly shown in the social welfare objective function (21a).
Also note that (21g) is equivalent to embedding into , bringing nonconvexity in the mean time. This is because the form of (21g) is usually not affine. For better illustration of the nonconvexity of , consider the following simple example with single timeslot and single device with real power only. Assume a quadratic cost function and a box feasible set with upper and lower bounds for real power injections and , and we end up with a nonconvex piecewise linear function :
which will become more complex, or does not even have an explicit form, should one considers more complicated cost functions and feasible sets.
Iii Distributed Stochastic Dual Algorithm
Focusing on a particular problem instance at time , lends itself to a Stackelberg game interpretation where and are calculated via by the network operator (i.e., the leader) and broadcasted to all nodes ; subsequently, each consumer (i.e., the follower) computes the power set points from . By design, is an optimal point of .
While it is challenging for the network operator to solve because constraint (21g) introduces nonconvexity, we will first in Section IIIA introduce the exact convex relaxation of developed in our previous work [zhou2017incentive2]. Next in Section IIIB and Section IIIC we design a stochastic dual algorithm and its distributed implementation that is consistent with customers’ natural response towards incentive signals (i.e., solving ), to solve while recovering feasible power rates to implement for discrete devices at every iteration.
Iiia Convex Reformulation & Signal Design
Consider the following reformulated convex optimization problem:
(22a)  
(22b)  
(22g)  
where the nonconvex constraint (21g) in is replaced with convex constraints (22g). We assume is feasible.
Assumption
satisfies Slater’s condition.
Given the strong convexity of the objective function (22a) in and the feasibility assumption, a unique optimal solution exists for problem . We next utilize the results in to show that together with a signal design strategy is an exact convex relaxation of , i.e., the solution of gives the solution of .
Substitute (22g)–(22g) into (22g), and denote by the dual variable associated with the constraints (22g) for time . Let be the optimal systemstates produced by , and denote the optimal dual variables associated with (22g) as for time . Then, we design the signals for as:
(23a)  
(23b) 
Remark
The signals may have various realistic interpretation with different forms of . For example, when represents voltage regulation (e.g., (46)), the signals can be seen as prices of voltage violation; when is the supply/demand balance, the signals are marketclearing price; when is combination of the two aforementioned forms, the resultant signals are summation of both.
We show that the above signals are bounded, precluding the possibility of infinitely large signals.
Theorem (Theorem 1 of [zhou2017incentive2])
By examining the optimality conditions of and , we have the following result.
Theorem (Theorem 2 in [zhou2017incentive2])
The solution of problem along with the signals defined in (23) is a global optimal solution of problem , i.e., is an exact relaxation of .
From now on, we will use the optima of and interchangeably depending on the context. Next, we will first design a stochastic dual algorithm for solving while recovering feasible set points for discrete devices. Afterwards, based on Theorem IIIA, we will develop an distributed algorithm without exposing any private information of the customers to the network operator.
IiiB Stochastic Dual Algorithm
For the rest of this section, we focus on designing a stochastic dual algorithm and its distributed implementation for solving at given time , and analyzing its performance. For ease of exposition, we only consider single updating timescale for all devices, while extension to multiple updating timescales is mathematically straightforward [zhou2017stochastic]. Based on the results in this section, we will later in Section IV introduce an online algorithm implemented in more realistic settings with asynchronous updates, nonlinear power flow model, and timevarying optimization.
We first write the Lagrangian of as:
(24) 
which is obtained by keeping the constraints (22g) and implicit.
We will implement a stochastic dual algorithm to solve the following minimax problem based on :
(25a)  
(25b) 
while recovering original discrete feasible power rates for slow devices. To this end, we define
(26b)  
and a resultant dual function:
(27) 
with the corresponding dual problem:
(28) 
Lemma (Theorem 2.1 of [necoara2014rate])
We refer the proof to [necoara2014rate].
We consider the dual algorithm for solving (25) while recovering implementable feasible power rates for discrete devices at each iteration, and we have the following iterative stochastic dual algorithm:
(30a)  
(30b)  
(30c) 
where is a projection operator onto the nonnegative orthant, and the stepsize is some constant to be determined next to guarantee convergence. Also notice that in step (30c), is not the de facto gradient of (instead, is; see Lemma IIIB); but as we will show next, the algorithm converges to bounded distance from the optimal solution on average.
Proof
The proof follows [necoara2015linear]. By definition, the dual function is calculated by
(31)  
Meanwhile, the conjugate function of is defined as
(32) 
which gives us