Structure Learning and Statistical Estimation in Distribution Networks  Part II
Abstract
Part I [1] of this paper discusses the problem of learning the operational structure of the grid from nodal voltage measurements. In this work (Part II), the learning of the operational radial structure is coupled with the problem of estimating nodal consumption statistics and inferring the line parameters in the grid. Based on a LinearCoupled (LC) approximation of AC power flows equations, polynomial time algorithms are designed to complete these tasks using the available nodal complex voltage measurements. Then the structure learning algorithm is extended to cases with missing data, where available observations are limited to a fraction of the grid nodes. The efficacy of the presented algorithms are demonstrated through simulations on several distribution test cases.
Power Distribution Networks, Power Flows, Struture/graph Learning, Load estimation, Parameter estimation, Voltage measurements, Transmission Lines, Missing data.
1 Introduction
The present power grid is separated into different tiers for optimizing its operations and control, namely the high voltage transmission system and the medium and low voltage distribution system. The distinction between these systems extends to their operational structure: the transmission system is a loopy graph while the distribution system operates as a radial network (set of trees). The larger volume of power transferred and higher magnitudes of resident voltages in the transmission network as compared to the distribution network have led grid security and reliability studies to focus primarily on the transmission side. Traditionally, the distribution grid has thus suffered from low placement of measurement devices leading to negligible realtime observation and control efforts [2].
In Part I [1] of this paper, we study the design of lowcomplexity algorithms for learning the operational radial structure of the distribution grid despite available metering limited to nodal voltages. In this work, we extend the study to the problem of estimating other features of the distribution grid together with learning the operational structure. Specifically, we utilize available node complex voltages to learn the statistics of load profiles at the grid nodes and to estimate the complexvalued impedance parameters of the operational distribution lines. It is worth noting that line/edge based metering (line flow and breaker status measurements) are considered unavailable as they are seldom observed in real time in today’s grids. Next, we extend the problem of learning the grid structure introduced in Part I to the case with partial observability, where voltage measurements pertaining to a subset of the nodes are not observed. In essence, the results from this work can aid several areas that have gained prominence with the expansion of smart grid. These include failure identification [3], grid reconfiguration [4], power flow optimization and generation scheduling [2, 5, 4, 6], as well as privacy preserving grid operation [7]. Furthermore, learning under partial observability enables the quantification of measurement security necessary to prevent adversarial learning aimed at hidden topological attacks [8, 9].
‘Graph Learning’ or ‘Graphical Model Learning’ [10] is a broad area of work that has been considered in different domains. In general graphs, maximumlikelihood has been employed for learning graph structures [11, 12, 13] through convex optimization as well as greedy techniques. In a learning study specific to general power grids [14] presents a maximum likelihood structure estimator (MLE) based on electricity prices. For radial distribution grids, the authors of [15] discuss structure learning through construction of a spanning tree based on the inverse covariance matrix (or concentration matrix) of voltage measurements, while [3] studies topology identification with Gaussian loads through a maximum likelihood scheme.
In Part I [1], an approach that uses provable trends in second moments of nodal voltage magnitudes to learn the grid structure was presented. Our algorithm design in part I assumes that all nodal loads are, in expectation, consumers of active and reactive power which is realistic for most, if not all, current distribution grids. Here in part II, we use a modified but not conflictive assumption of independence of fluctuations in active and reactive loads at different nodes. As shown below, under this assumption one is not only able to reconstruct the grid structure but also able to infer either the statistics of active and reactive loads at every node or the values of impedance parameters at every operational line. Then, we show how to extend our structure learning algorithm to cases with missing data, where observations from a subset of nodes are not available to the observer. Similarly to Part I, the algorithms in here (Part II) are independent of the exact probability distribution of load profiles as well as variations in values of line parameters and are thus applicable to a wide range of operational conditions.
The rest of this manuscript (part II) is organized as follows. Section 2 contains a brief review of the radial structure of the grid, approximations of power flows and sets formulation of problems considered. Section 3 contains proofs of our main results on second moments of voltage measurements in radial grids. Section 4 describes the algorithm design to learn the operational structure and estimate the statistics of load power profiles in the grid. An extension is also discussed for the problem of structure learning coupled with estimation of line impedances (instead of injection statistics). In Section 5 we present Algorithm that learns the operational radial structure in the presence of missing observations. Simulations results for our Algorithms on test radial distribution cases are presented in Section 6. Finally, conclusions are discussed in Section 7.
2 Technical Preliminaries
This Section provides a brief description of the operational structure of the distribution grid, and introduces the learning problems considered in Part II. We then have a brief reminder about the Linear Coupled Power Flow (LCPF) model (already introduced and discussed in Part I) that we rely on for analysis in later Sections.
Structure of Radial Distribution Network: A distribution grid is represented by a graph , where (of size ) is the set of nodes/buses and is the set of undirected edges/transmission lines. The complete layout of is loopy, but its operational layout (denoted by ) derived by excluding open/nonoperational lines is a union of nonintersecting trees. Each grid tree in comprises of a single substation feeding electricity into load nodes lined along the ‘radial’ tree. Thus, is a ‘baseconstrained spanning forest’ with nonsubstation nodes. See Fig. in Part I [1] for an illustrative example. The set of operational edges that contribute to the structure of the forest is denoted by where . We follow the same notation as Part I and described in Table I of [1].
Summary of Learning Problems: The majority of distribution grids operational today are handicapped by limited real time metering for breaker statuses and power flows [2], as well as infrequent updating of model parameters. The grid operator (utility company) or an external observer/adversary in such a scenario is concerned with the following three tasks:

To learn the current configuration of switches that determine the ‘baseconstrained spanning forest’.

To learn the statistics of the power consumption
^{1} profiles at the nodes. 
To learn the values of resistances and reactances of each operational line of the distribution system.
For all these tasks, the utility or observer relies on available nodal complex voltage (magnitude and phase) readings. Task (1) is coupled with either Task (2) or Task (3) and considered first in the situation of full observability, when complex voltage (magnitude and phase) samples are available at all the nodes of the system. In fact, we show that voltage magnitude samples are sufficient to learn the grid structure (Task (1)), additional voltage phasor measurements are needed for the inference problems in Tasks (2) and (3). However, we also discuss Task (1) independently in the situation where several nodes do not offer any voltage readings. The problem formulations considered in Part I previously and in Part II are summarized in Table 1.
Observations available  Prior Information  Assumptions  Features estimated  Results used 
Voltage magnitudes of all nodes  True second moment of nodal power injections, resistance and reactance of edges  Nonnegative second moments of nodal power injections  Operational network structure  Algorithm in Part I [1] 
Voltage magnitudes of all nodes  None  Uncorrelated nodal power injections  Operational network structure (Task (1))  Theorem 1, Theorem 2, Algorithm 
Voltage magnitudes and phasors of all nodes  Resistance and reactance of edges  Uncorrelated nodal power injections  Mean and variance of nodal power injections (Task (2))  Lemma 2, Algorithm 
Voltage magnitudes and phasors of all nodes  True variance of nodal power injection  Uncorrelated nodal power injections  Resistance and reactance of operational lines (Task (3))  Lemma 2, Algorithm 
Voltage magnitudes of subset of nodes  True variance of nodal power injections, resistance and reactance of edges  Uncorrelated nodal power injections, Missing nodes separated by three or more hops  Operational network structure  Theorem 1, Lemma 2, Algorithm 
The physics of Power Flows (PFs) in forms the background for the learning/reconstruction problems sketched here. Variety of PF models/approximations were discussed in details in Appendix and Section IIIAC of Part I [1]. Let us briefly recap essential features of the LinearCoupled Power Flow (LCPF) model essential for analysis presented in the following Sections, also extending it with some new notations.
Linear Coupled Power Flow (LCPF): Let and denote the diagonal matrices representing, respectively, line resistances and reactances for operational edges in forest . Let real valued vectors and denote the active power injections, reactive power injections, voltage magnitude deviations and voltage phasors at the nonsubstation nodes, respectively. The LCPF model is given by the matrix Eqs (5,6) of Part I, where, and are edgeweighted reduced graph Laplacian matrices (after removing substation/slack buses) for forest with edge weights given by the edge conductances and susceptances respectively. is the reduced directed incidence matrix with each row corresponding to a directed edge in . In fact, is block diagonal with , where each block () corresponds to a tree in . Assuming that and in Eqs. of Part I are fluctuating, we derive the following relations involving the means , and covariance matrices for variables and .
(1)  
(2)  
(3)  
(4)  
(5) 
It is worth mentioning that inclusion of both line resistances and reactances in the LCPF model distinguishes it from the DC power flow models [16] that has limited applicability in distribution grids. In the next Section, we derive key results relating second moments in phase angles and voltage magnitudes in the LCPF for a radial distribution grid. Versions of all subsequent results can be generated for DC power flow models by simply ignoring line resistances or reactances as demonstrated in Part I.
3 Second Moments of Voltages in Radial Grids
Consider a tree with reduced incidence matrix . Let denote the unique path from node to the slack bus of the tree , where path between two nodes refers to the unique set of edges connecting them. As shown in Part I [1], in a radial distribution gird, has the following structure,
(6) 
Let denote the set of descendants of node within the tree where is called a descendent of , if lies on the (unique) path from to the slack bus of . We include itself in the set of its descendants. Similarly, we call the parent of within if is an immediate descendant of as illustrated in Fig 1.
The following statement holds (see Lemma in [1] for detailed proof).
Lemma 1.
For two nodes, and its parent , in tree
(7) 
Before the discussion of our results on trends in voltage covariances, we make the following assumption on the covariances of load consumption profiles.
Assumption : Powers at different nodes are not correlated, while active and reactive powers at the same node are positively correlated. Thus,
Few remarks are in order. First, the assumption of independence of fluctuations is realistic in general, reflecting diversity of individual consumer behavior on relatively short time scales. Second, unless consumerlevel control of reactive power is implemented [17] is implemented, fluctuations in active and reactive consumption/generation at the same node will have a strong tendency to align, giving positive correlation. Since, Assumption pertains to covariances (‘centered’ second moments), it does not run counter to the assumption in Part I, where ‘noncentered’ second moments of power injections are considered to be positive. In fact, nodal loads (consumers of active and reactive power) satisfy both the assumptions given in Part I and Part II. Note that Assumption does not restrict individual nodal loads to follow any specific distribution.
The following result states that covariances of voltage magnitude deviations increase as we move farther away from the root of any tree in the grid.
Theorem 1.
If node is a descendant of node on tree in forest , then .
Proof.
is given by Eq. (3) with four nonnegative terms on the right side. Let the first term be denoted by . For onehop neighbors, node and its parent , we use Lemma 1 to get
(8)  
(9) 
Combining the inequalities, we get . Extending the same analysis to the remaining three terms in Eq. (3) and then moving from onehop neighbors to descendants proves the theorem. ∎
Next, we focus on the term , which is the expected value of the squared centered difference between two node voltage deviations (). For any two nodes and that lie on tree , we have
where is composed of four terms as given by Eq. (3). Using Eq. 8 for each of the four terms within and adding them, we derive
(10) 
Lemma 2.
If is ’s parent in tree ,
(11)  
(12)  
(13) 
Eqs. (12, 13) can derived through the same analysis as one leading to Eq. (11). Note that for each equation in Lemma 2, the right side contains power covariance terms originating from the nodes in alone. Thus, if the covariances of all descendants are known, Eqs. (11,12,13) can be used to infer the three covariance quantities () associated with node . Furthermore, parameters () included in these equations pertain to the single operational line . For the case where injection covariances are known from historical data, we can thus estimate the parameters of line as well as , the covariance between active and reactive injections at node . We use these facts later in the text while designing our learning algorithms.
Next, we prove an important inequality involving the magnitude of on the grid nodes.
Lemma 3.
Proof.
Let us first prove the Lemma for the Case . As shown in Fig. 2, one observes . Further, , where represents edges traversed along the path leading from node to the root of . Consider a node in the tree . When , one uses (6) to derive
(14) 
Similarly, for node , one obtains
(15) 
For , we arrives at
(16) 
Next, using Eqs. (14,15,16), we arrive at
(17)  
(18) 
Similar inequalities hold for as well. We can now apply Eqs. (17,18) to Eq. (10) to prove for Case .
In the case (see Fig. 2) nodes and are descendants of node . Let be the penultimate (second to the last) node lying on the path from to , and be the penultimate node on the path from to . Here, and are disjoint subsets of . Then, for any and , observe that . This results in
(19)  
(20) 
Furthermore, for ,
(21) 
Versions of Eqs. (19,20,21) for can be derived in a similar way. Using these results in Eq. 10, one arrives at for Case . This completes the proof. ∎
The following theorem follows directly from Lemma 3.
Theorem 2.
For every node with set of descendants and parent ,
Proof.
In the case of Lemma 3, the optimal node for exists on the path from node to the root. Considering case , one finds that the optimal node on that path is node ’s parent . ∎
Theorem 2 implies that among all nondescendants of a node, the minimum expected squared centered difference of voltage magnitude deviations is achieved at its parent node. Indeed in the next Section, we utilize this result to identify a node’s parent.
4 Learning Grid Structure with Estimation of Load or Parameters
We first present our algorithm design for Tasks and , structure learning coupled with estimation of nodal power injection statistics. Next, we look at solving for Tasks and , structure learning coupled with estimation of line parameters.
4.1 Learning Structure and Injection Statistics
The results of the previous Section (specifically, Theorem 1, Lemma 2 and Theorem 2) provide the machinery for the algorithm design. Algorithm learns the radial operational structure (Task ) as well as estimates the mean and covariance of the power injections at the load nodes (Task ). The observer here is assumed to be aware of the load nodes that are connected directly to the grid substations. This is necessary as the assignment of substations, one per tree in forest cannot be uniquely determined. This occurs due to the assumption of zero fluctuations of voltage magnitude and phase at substations which makes the relations involving voltage deviations in the previous section insensitive when the substation is the parent node. Resistance and reactance parameters of all lines (open and operational) are assumed known here.
Algorithm Overview: In each iteration, the node with the highest variance in voltage deviation among the yet undiscovered node set is selected in Step 4. Theorem 1 ensures that selecting nodes in the decreasing order of their variances leads to discovery of node only after all its descendants have been discovered previously. Set denotes the current set of leaves (previously discovered nodes with unknown parents). In Step 6, the selected node is made the parent of a node in set if the condition in Theorem 2 is satisfied. Here, each entry in the descendant covariance vectors and contains the sum of load power covariances over all descendants of each node, other than the node itself. The values of covariance matrices of power injections for are inferred in Step 8 using Lemma 2. Steps 12 and 15 are used to update the current set of leaves for use in the next iteration. Finally, in Step 18, the mean of nodal power is computed using the measurement matrix constructed from the grid structure. Note that instead of learning the covariances in sequentially through Step 8, one can use the generated measurement matrix directly to learn them together at the end.
Algorithm Complexity: Computing empirical covariance matrix of voltage deviation is considered to be a part of preprocessing and is thus ignored in the complexity estimation. One makes iterations to select all the nonsubstation nodes. Within each iteration, an edge selection (Step 6) calls for a check with each node in . Thus, the worstcase complexity for learning the structure is . Computing the means and the covariances is of complexity through matrix multiplication.
Observe that learning the forest structure in Algorithm relies on voltage magnitude deviation measurements alone, and in fact does not require knowledge of line parameters in the grid. Phase measurements and values of line resistance and reactance are needed only to estimate the means and covariances of power injections.
4.2 Learning Structure and Line Parameters
The first goal of the observer here is the same as in Section 4.1  to learn the operational grid structure. However, we consider a modified scenario where the covariances for active and reactive nodal injections ( and ) are already known from historical data and thus do not need to be estimated. Instead, the observer here aims at estimating the impedance parameters ( and ) for each operational line within the grid. Consider Eqs. (11,12,13). If matrix is also known, the observer can easily solve these linear equations with , and as the three unknowns to estimate the impedance for each operational edge. However, may be harder to obtain in reality as its computation requires timesynchronized historical samples of active and reactive injections. If information on is unavailable, variables and form three nonlinear Eqs. (11,12,13) for each edge . Note that Algorithm infers the radial grid structure iteratively upward from the descendant nodes to the parents. Therefore, we also infer line parameters () and by solving Eqs (11,12,13) in each iteration for the newly discovered edge between node and its parent in tree . Let , and denote the expressions on the left side of Eqs. (11,12,13) respectively. From Eqs. (11,12), we derive, . We can now eliminate terms involving and to get Eq. (22) which is a quadratic expression in . We use it to infer and . To infer , we use values of for descendants of node that are determined in previous iterations.
(22) 
Every step in this algorithm, except modified Step 8, corresponds to respective step in Algorithm . The Step 8 is modified such that Eqs. (22), followed from (11), are used to derive the line parameters and . As this algorithm formulation and analysis follows Algorithm , we omit it for brevity. In the next Section, we discuss a critical extension of the structure learning problem (Task ) to the case where the available nodal data is incomplete due to some missing entries.
5 Learning BaseConstrained Spanning Forest with Missing Data
The structure learning problem discussed in the preceding Section (Task (1)) requires the observer to have voltage magnitude data for all nodes within the distribution grid. However, this may not be true in practice. In fact, loss of communication and/or synchronization troubles with meters over short periods of time, along with meter breakdowns over longer timescales, can result in missing data over a set of missing nodes in the system. We assume here that the “missing” nodes are positioned within the grid not fully arbitrarily, but they satisfy the following property.
Assumption : Missing nodes in set are separated by greater than two hops in the distribution grid forest and they are not immediate children (not first descendants) of the substation nodes.
This assumption implies that there exists no observed node which is connected to more than one missing node. Note that a missing node can exist in either of the two possible configurations  a leaf or an intermediary position  as illustrated in Fig. 3. Assumption guarantees that in either case, both the parent and grandparent (parent of the parent) nodes of the missing node are observed. Additionally, unlike structure learning in Task (1), in this section we assume that information, e.g. estimated or originating from historical measurements, on the actual values of and covariance matrices and impedances of all lines is available. We now construct Algorithm to learn the operational grid structure in the presence of a missing set with nodes whose voltage magnitude deviations are unknown.