# Distributed Gauss-Newton Method for AC State Estimation Using Belief Propagation

###### Abstract

We present a detailed study on application of factor graphs and the belief propagation (BP) algorithm to the power system state estimation (SE) problem. We start from the BP solution for the linear DC model, for which we provide a detailed convergence analysis. Using insights from the DC model, we use two different approaches to derive the BP algorithm for the non-linear AC model. The first method directly applies BP methodology, however, providing only approximate BP solution for the AC model. In the second approach, we make a key further step by providing the solution in which the BP is applied sequentially over the AC model, akin to what is done by the Gauss-Newton method. The resulting BP-based Gauss-Newton algorithm has the interpretation of a fully distributed Gauss-Newton method with the same accuracy as the centralized SE, preserving a number of advantages of the BP framework. The paper provides extensive numerical study of the proposed algorithms, provides details on their convergence properties, and gives a number of useful insights for their implementation.

## I Introduction

Motivation: Electric power systems consist of generation, transmission and consumption spread over wide geographical areas and operated from the control centers by the system operators. Maintaining normal operation conditions is of the central importance for the power system operators [1, Ch. 1]. Control centers are traditionally operated in centralized and independent fashion. However, increase in the system size and complexity, as well as external socio-economic factors, lead to deregulation of power systems, resulting in decentralized structure with distributed control centers. Cooperation in control and monitoring across distributed control centers is critical for efficient system operation. Consequently, existing centralized algorithms have to be redefined based on a new requirements for distributed operation, scalability and computational efficiency [2].

The system monitoring is an essential part of the control centers, providing control and optimization functionality whose efficiency relies on accurate state estimation (SE). The core of the SE is the SE algorithm that provides an estimate of the system state (i.e., the set of all complex bus voltages) based on the network topology and available measurements. Besides the SE algorithm, the SE includes several additional routines such as network topology processors, observability analysis and bad data analysis.

The centralized SE assumes that the measurements collected across the system are available at the control center, where the centralized SE algorithm provides the system state estimate. Precisely, the centralized SE algorithm typically uses the Gauss-Newton method to solve the non-linear weighted least-squares (WLS) problem [3], [4]. In contrast, decentralized SE distributes communication and computational effort across multiple control centers to provide the system state estimate. There are two main approaches to distributed SE: i) algorithms which require a global control center to exchange data with local control centers, and ii) algorithms with local control centers only [5]. Distributed SE algorithms target the same state estimate accuracy as achievable using the centralized SE algorithms.

Literature review: The mainstream distributed SE algorithms exploit matrix decomposition techniques applied over the Gauss-Newton method. These algorithms usually achieve the same accuracy as the centralized SE algorithm and work with either global control center [6, 7, 8] or without it [9, 10, 11, 12]. Recently, the SE algorithms based on distributed optimization [13], and in particular, the alternating direction method of multipliers became very popular [14]. Examples include algorithms based on convex relaxation [15, 16], and without convex relaxation [17]. In [18], the gossip-based Gauss-Newton algorithm is proposed which provides flexible communication model, but suffers from slight performance degradation compared to the centralized SE. The work in [19] presents a fully distributed SE algorithm for wide-area monitoring which provably converges to the centralized SE. We refer the reader to [20] for a detailed survey of the distributed multi-area SE.

Belief-Propagation Approach: In this paper, we solve the SE problem using probabilistic graphical models [21], a powerful tool for modeling the dependencies among the systems of random variables. We represent the SE problem using a popular class of probabilistic graphical models called factor graphs and solve it using the belief propagation (BP) algorithm. Applying the BP algorithm on probabilistic graphical models without loops, one obtains exact marginal distributions or a mode of the joint distribution of the system of random variables [21], [22]. The BP algorithm can be also applied to graphical models with loops (loopy BP)[23], although in that case, the solution may not converge to the correct marginals/modes of the joint distribution. BP is a fully distributed algorithm suitable for accommodation of distributed power sources and time-varying loads. Moreover, placing the SE into the probabilistic graphical modelling framework enables not only efficient inference, but also, a rich collection of tools for learning parameters or structure of the graphical model from observed data [24].

The work in [25] provides the first demonstration of BP applied to the SE problem. Although this work is elaborate in terms of using, e.g., environmental correlation via historical data, it applies BP to a simple linearized DC model. The AC model is recently addressed in [26], where tree-reweighted BP is applied using preprocessed weights obtained by randomly sampling the space of spanning trees. The work in [27] investigates Gaussian BP convergence performance for the DC model. Although the above results provide initial insights on using BP for distributed SE, it is fair to say that a systematic analysis of applying the BP algorithm on the SE problem, and in particular for the AC model, is still missing. This paper intends to fill this gap.

Contributions: In this paper, we present a novel distributed BP-based Gauss-Newton algorithm. In the process of deriving the proposed algorithm, we provide a step-by-step guide for application of BP algorithm to the SE problem, giving this paper strong tutorial flavor. Our methodology is to start with the simplest linear DC model and use insights obtained therein to derive the native BP solution for the non-linear AC model. Unfortunately, as closed-form expressions for certain classes of BP messages cannot be obtained, we present suitable approximations that lead us to propose the AC-BP algorithm as an (approximate) BP solution for the AC model. Then, as a main contribution, we make a key further step where we change the perspective of our BP approach and, instead of applying the BP directly onto the non-linear AC model, we present the solution where the BP is applied sequentially over the AC model, akin to what is done by the Gauss-Newton method. The resulting Gauss-Newton BP (GN-BP) represents a BP counterpart of the Gauss-Newton method achieving the same accuracy, however, preserving a number of advantages brought in by the BP framework. For example, the proposed GN-BP algorithm is of considerably lower computational complexity, can be easily designed to provide asynchronous operation, could be integrated as part of real-time systems [28], and is flexible and easy to distribute and parallelize. Finally, this paper provides a novel and detailed convergence analysis of the DC-BP algorithm and points to extension of this analysis for the proposed GN-BP algorithm. We close this paper by extensive numerical study where we compare the proposed BP-based algorithms with the centralized SE, and provide a number of useful insights for their implementation.

Paper Organization: In Section II, we present basic concepts of factor graphs and BP. Section III reviews the centralized DC and AC SE. Section IV presents closed form BP expressions and detailed convergence analysis for the DC model. In Section V, the AC model is solved via BP-based Gauss-Newton method. Section VI presents extended numerical convergence and performance results. Concluding remarks are given in Section VII.

## Ii Factor Graphs and BP Algorithm

Factor graphs and BP algorithm are widely used tools for probabilistic inference [21], [22]. In the standard setup, the goal of the BP algorithm is to efficiently evaluate the marginals of a system of random variables described via the joint probability density function . Assuming that the function can be factorized proportionally () to a product of local functions:

(1) |

where , the marginalization problem can be efficiently solved using BP algorithm.

The first step is forming a factor graph, which is a bipartite graph that describes the structure of the factorization (1). The factor graph structure comprises the set of factor nodes , where each factor node represents local function , and the set of variable nodes . The factor node connects to the variable node if and only if [29].

The BP algorithm on factor graphs proceeds by passing two types of messages along the edges of the factor graph: i) a variable node to a factor node, and ii) a factor node to a variable node messages. Both variable and factor nodes in a factor graph process the incoming messages and calculate outgoing messages, where an output message on any edge depends on incoming messages from all other edges. BP messages represent ”beliefs” about variable nodes, thus a message that arrives or departs a certain variable node is a function (distribution) of the random variable corresponding to the variable node.

Message from a variable node to a factor node: Consider a part of a factor graph shown in Fig. 1 with a group of factor nodes that are neighbours of the variable node .

The message from the variable node to the factor node is equal to the product of all incoming factor node to variable node messages arriving at all the other incident edges:

(2) |

where represents the set of factor nodes incident to the variable node , excluding the factor node . Note that each message is a function of the variable .

Message from a factor node to a variable node: Consider a part of a factor graph shown in Fig. 2 that consists of a group of variable nodes that are neighbours of the factor node .

The message from the factor node to the variable node is defined as a product of all incoming variable node to factor node messages arriving at other incident edges, multiplied by the function associated to the factor node , and marginalized over all of the variables associated with the incoming messages:

(3) |

where is the set of variable nodes incident to the factor node , excluding the variable node .

Marginal inference: The marginal of the variable node , illustrated in Fig. 3, is obtained as the product of all incoming messages into the variable node :

(4) |

where is the set of factor nodes incident to the variable node .

In the following, we focus on the Gaussian BP (GBP) over the set of real-valued random variables where all the local functions and all the inputs to the BP algorithm represent Gaussian distributions. Due to the fact that variable node and factor node processing preserves ”Gaussianity” of the messages, the resulting GBP algorithm operates by exchanging the messages that represent Gaussian functions. Therefore, each message exchanged in GBP is completely represented using only two values: the mean and the variance [30].

## Iii SE in Electric Power Systems

The SE algorithm estimates the values of the state variables based on the knowledge of network topology and parameters, and measured values obtained from measurement devices spread across the power system. The knowledge of the network topology and parameters is provided by the network topology processor in the form of the bus/branch model, where branches of the grid are usually described using the two-port -model [1, Ch. 1,2]. The bus/branch model can be represented using a graph , where the set of nodes represents the set of buses, while the set of edges represents the set of branches of the power network.

As an input, the SE requires a set of measurements of different electrical quantities spread across the power network. Using the bus/branch model and available measurements, the observability analysis defines the measurement model [1, Ch. 4] which can be described as the system of equations[4]:

(5) |

where is the vector of the state variables, , , is the vector of measurement functions, is the vector of measurement values, and is the vector of uncorrelated measurement errors. The SE problem is commonly an overdetermined system of equations [31, Sec. 2.1].

Each measurement is associated with measured value , measurement error , and measurement function . Under the assumption that measurement errors follow a zero-mean Gaussian distribution, the probability density function associated with the i-th measurement equals:

(6) |

where is the measurement variance defined by the measurement error , and the measurement function connects the vector of state variables to the value of the i-th measurement.

The SE in electric power systems deals with the problem of determining state variables according to the noisy observed data and a prior knowledge:

(7) |

Assuming that the prior probability distribution is uniform, and given that does not depend on , the maximum a posteriori (MAP) solution of (7) reduces to the maximum likelihood solution, as given below [32]:

(8) |

One can find the solution (8) via maximization of the likelihood function , which is defined via likelihoods of independent measurements:

(9) |

It can be shown that the solution of the MAP problem can be obtained by solving the following optimization problem, known as the WLS problem [33, Sec. 9.3]:

(10) |

The state estimate representing the solution of the optimization problem (10) is known as the WLS estimator, and is equivalent to the MAP estimator (9).

### Iii-a AC State Estimation

The AC SE model is defined using the measurement functions that precisely follow the physical laws that connect the measured variables and the state variables. As a result, the system (5) in general represents the system of non-linear equations.

In a usual scenario, the AC SE model takes bus voltage magnitudes and bus voltage angles, transformer magnitudes of turns ratio and transformer angles of turns ratio as state variables . Without loss of generality, in the rest of the paper, for the AC SE we observe bus voltage angles and bus voltage magnitudes as state variables . Consequently, the number of state variables is .

The typical set of measurements in electric power systems includes: active and reactive power flow and current magnitude, , , respectively; active and reactive power injection, , ; and bus voltage angle and magnitude, , . For completeness, we provide the set of non-linear measurement functions of the AC SE model given in Appendix A, equations (62) - (65).

Based on the available set of measurements, the WLS estimator , i.e., the solution of the WLS problem (10), can be found using the Gauss-Newton method:

(11a) | |||

(11b) |

where is the iteration index, is the vector of increments of the state variables, is the Jacobian matrix of measurement functions at (see Appendix A, for details), is a diagonal matrix containing inverses of measurement variances, and is the vector of residuals[3, Ch. 10].

### Iii-B DC State Estimation

The DC model is obtained by linearisation of the AC model. In the typical operating conditions, the difference of bus voltage angles between adjacent buses is very small , which implies and . Further, all bus voltage magnitudes are , , and all shunt elements and branch resistances can be neglected. This implies that the DC model ignores the reactive powers and transmission losses and takes into account only the active powers. Therefore, the DC SE takes only bus voltage angles as state variables. Consequently, the number of state variables is .

The set of DC model measurements involves only active power flow , , active power injection , , and bus voltage angle , , measurements and the model is dealing with linear measurement functions given Appendix A.

## Iv BP-based Algorithm for the DC SE

In the SE problem defined above, each measurement function depends on a limited (typically small) subset of state variables . Hence, the likelihood function can be factorized into factors (9) affecting small subsets of state variables. This motivates solving the SE problem scalably and efficiently using probabilistic graphical models. The solution involves defining the factor graph corresponding to (9), and deriving expressions for BP messages exchanged over the factor graph, as detailed next.

For completeness of exposition, we first present the solution of the DC SE problem using the BP algorithm; we refer to the corresponding method as the DC-BP. Furthermore, and as a novel contribution, we provide an in-depth convergence analysis of the DC-BP algorithm, and present methods to improve its convergence. The material in this section sets the stage for the main contribution of this paper - the BP-based Gauss-Newton method for AC model presented in the following section.

### Iv-a The Factor Graph Construction

For the DC model, the set of variable nodes is defined by the state variables , i.e., . The set of factor nodes is defined by the set of measurements , with measurement functions given in Appendix A, equations (66) - (68). Measurements define likelihood functions that are in turn equal to local functions associated to factor nodes. More precisely, a factor node connects to a variable node if and only if the state variable is an argument of the corresponding measurement function .

###### Example 1 (Constructing factor graph).

In this toy example, using a simple 3-bus model presented in Fig. (a)a, we demonstrate the conversion from a bus/branch model with a given measurement configuration into the corresponding factor graph for the DC model.

The variable nodes represent state variables, i.e., . Factor nodes are defined by corresponding measurements, where in our example, measurements and are mapped into factor nodes , .

### Iv-B Derivation of BP Messages

Message from a variable node to a factor node: Let us assume that the incoming messages , , into the variable node are Gaussian and represented by their mean-variance pairs , , . According to (2), it can be shown that the message from the variable node to the factor node is proportional () to:

(13) |

with mean and variance obtained as:

(14a) | ||||

(14b) |

After the variable node receives the messages from all of the neighbouring factor nodes from the set , it evaluates the message according to (14) and sends it to the factor node .

Message from a factor node to a variable node: Due to linearity of measurement functions , closed form expressions for these messages is easy to obtain and follow a Gaussian form:

(15) |

Let us assume that the messages into factor nodes are Gaussian, denoted by:

(16) | ||||

The Gaussian function associated with the factor node is given by (Section III, equation (6)):

(17) |

The DC model contains only linear measurement functions which we represent in a general form as:

(18) |

where is the set of variable nodes incident to the factor node , excluding the variable node . From the expression (3), and using (16)-(18), it can be shown that the message from the factor node to the variable node is represented by the Gaussian function (15), with mean and variance obtained as:

(19a) | ||||

(19b) |

To summarize, after the factor node receives the messages from all of the neighbouring variable nodes from the set , it evaluates the message according to (19a) and (19b), and sends it to the variable node .

Marginal inference: According to (4), it can be shown that the marginal of the state variable is represented by:

(20) |

with the mean value and variance :

(21a) | ||||

(21b) |

Finally, the mean-value is adopted as the estimated value of the state variable .

### Iv-C Iterative DC-BP Algorithm

The SE scenario is in general an instance of loopy BP since the corresponding factor graph usually contains cycles. Loopy BP is an iterative algorithm, with an iteration index , and requires a message-passing schedule. The scheduling where messages from variable to factor nodes, and messages from factor nodes to variable nodes, are updated in parallel in respective half-iterations, is known as synchronous scheduling. Synchronous scheduling updates all messages in a given iteration using the output of the previous iteration as an input [34].

To present the algorithm precisely, we need to introduce different types of factor nodes. The indirect factor nodes correspond to measurements that measure state variables indirectly. In the DC scenario, this includes active power flow and power injection measurements. The direct factor nodes correspond to the measurements that measure state variables directly. For our choice of state variables for the DC scenario, an example includes measurements of angles.

Besides direct and indirect factor nodes, we define three additional types of singly-connected factor nodes. The slack factor node corresponds to the slack or reference bus where the voltage angle has a given value. The initialization factor node is required to initialize the algorithm at certain variable nodes. Finally, the virtual factor node is a is singly-connected factor node used if the variable node is not directly measured. Both initialization and virtual factor nodes take the value of ”flat start” with variance or a priori given mean value and variance of state variables.
We refer to direct factor nodes and three additional types of singly-connected factor nodes^{1}^{1}1We note that each variable node in the initialization step of the algorithm has the corresponding local factor node attached. Local factor nodes only send, but do not receive, the message to/from incident variable nodes. Direct and virtual factor nodes repeatedly transmit the same message to the corresponding variable node throughout BP iterations. as local factor nodes .

###### Example 2 (Different types of factor nodes).

In this example, we consider the bus/branch model with four measurements illustrated in Fig. (a)a that we use to describe different types of factor nodes.

The corresponding factor graph is given in Fig. (b)b, with indirect factor nodes (red squares), direct factor nodes (orange squares), the initialization factor node (green square) and the virtual factor node (blue square).

The DC-BP algorithm is presented in Algorithm 1, and it is an instance of a loopy GBP applied over a linear model defined by linear measurement functions . It is well known that, if loopy GBP applied over a linear model converges, it will converge to a fixed point representing a solution of an equivalent WLS problem (12) [35]. Unlike means, the variances of GBP messages need not converge to correct values. In the following, it is sufficient to investigate the conditions for the DC-BP convergence as in case of convergence, the obtained mean values are the WLS solution.

### Iv-D Convergence of DC-BP Algorithm

In this part, we present convergence analysis of DC-BP algorithm with synchronous scheduling. In the following, it will be useful to consider a subgraph of the factor graph that contains the set of variable nodes , the set of indirect factor nodes , and a set of edges connecting them. The number of edges in this subgraph is . Within the subgraph, we will consider a factor node connected to its neighboring set of variable nodes by a set of edges , where is the degree of . Next, we provide results on convergence of both variances and means of DC-BP messages, respectively.

Convergence of the Variances: From equations (14b) and (19b), we note that the evolution of the variances is independent of mean values of messages and measurements. Let denote a vector of variance values of messages from indirect factor nodes to variable nodes . Note that this vector can be decomposed as:

(22) |

where the -th element is equal to:

(23) |

Substituting (14b) in (19b), the evolution of variances is equivalent to the following iterative equation:

(24) |

More precisely, using simple matrix algebra, one can obtain the evolution of the variances in the following matrix form:

(25) |

where

(26a) | ||||

(26b) |

Note that in (25), the dependance on is hidden in matrix , or more precisely, in matrix . Next, we briefly describe both the matrices and matrix-operators involved in (25).

The operator , where is the -th diagonal entry of the matrix . The unit vector is of dimension and is equal to . The diagonal matrix is obtained as .

The matrix contains diagonal entries of the Jacobian non-zero elements, where -th element . The matrix contains indirect factor node variances, with the -th entry .

The matrix contains inverse variances from singly-connected factor nodes to a variable node, if such nodes exist, where the -th element . For example, equals:

(27) |

The matrix , , is a block-diagonal matrix in which the -th element is a block matrix , where the matrix is block matrix of ones, and is identity matrix. The matrix is of the following block structure:

(28) |

where is a block matrix of zeros, and with the -th entry:

(29) |

Note that the following holds: .

###### Theorem 1.

The variances from indirect factor nodes to variable nodes always converge to a unique fixed point for any initial point .

###### Proof.

Convergence of the Means: Equations (14a) and (19a) show that the evolution of the mean values depends on the variance values. Due to Theorem 1, it is possible to simplify evaluation of mean values from indirect factor nodes to variable nodes by using the fixed-point values of . The evolution of means becomes a set of linear equations:

(30) |

where

(31a) | ||||

(31b) | ||||

(31c) | ||||

(31d) |

Note that the vector of means can be decomposed as:

(32) |

where the -th element is equal to:

(33) |

The vector contains means of indirect factor nodes, where . The diagonal matrix is obtained as . The vector contains means from direct and virtual factor nodes to a variable node, if such nodes exist, where the -th element . For example, the element of is equal to:

(34) |

###### Theorem 2.

The means from indirect factor nodes to variable nodes converge to a unique fixed point :

(35) |

for any initial point if and only if the spectral radius .

###### Proof.

The proof steps follow the proof of Theorem 5.2, [36]. ∎

To summarize, the convergence of the DC-BP algorithm depends on the spectral radius of the matrix:

(36) |

If the spectral radius , the DC-BP algorithm will converge and the resulting vector of mean values will be equal to the solution of the MAP estimator.

### Iv-E Convergence of DC-BP with Randomized Damping

In this section, we propose an improved DC-BP algorithm that applies synchronous scheduling with randomized damping. Several previous works reported that damping the BP messages improves the convergence of BP[37, 38]. Here, we propose a different randomized damping approach, where each mean value message from indirect factor node to a variable node is damped independently with probability , otherwise, the message is calculated as in the standard DC-BP algorithm. The damped message is evaluated as a linear combination of the message from the previous and the current iteration, with weights and , respectively. In Section VI, we demonstrate that the DC-BP with randomized damping dramatically improves convergence as compared to the standard DC-BP.

In the proposed damping, the equation (30) is redefined as:

(37) |

where is the weighting coefficient, and . In the above expression, and are obtained as:

(38a) | ||||

(38b) |

where diagonal matrices and are defined as , , and , respectively, where is a Bernoulli random variable with probability independently sampled for each mean value message.

Substituting (38a) and (38b) in (37), we obtain:

(39) |

Note that . In a more compact form, equation (39) can be written as follows:

(40) |

where

(41a) | ||||

(41b) |

###### Theorem 3.

The means from indirect factor nodes to variable nodes converge to a unique fixed point for any initial point if and only if the spectral radius . For the resulting fixed point, it holds that .

## V BP-Based Distributed Gauss-Newton Method for AC SE

The rest of the paper focuses on the more challenging AC model, where our goal is to derive the BP-based AC SE algorithm. However, due to non-linearity of measurement functions, the closed-form expressions for certain classes of BP messages cannot be obtained. Using approximations, we derive the AC-BP algorithm as an (approximate) BP solution for the AC SE problem. Due to space constraints, we refer the reader to Appendix C, for a detailed derivation of the AC-BP algorithm. Unfortunately, due to approximations, the AC-BP algorithm does not match the performance of the centralized AC SE based on Gauss-Newton method (as we show in Section VI).

As the main contribution of this paper, we adopt different methodology to derive efficient BP-based AC SE method. Namely, instead of using the AC-BP method, we present the solution where BP is applied sequentially over the AC model, akin to what is done by the Gauss-Newton method. The resulting GN-BP algorithm represents a BP counterpart of the Gauss-Newton method, achieving exactly the same accuracy as the Gauss Newton method.

### V-a Gauss-Newton Method as a Sequential MAP Problem

Consider the Gauss-Newton method (11) where, at each iteration step , the algorithm returns a new estimate of denoted as . Note that, after a given iteration, an estimate is a vector of known (constant) values. If the Jacobian matrix has a full column rank, the equation (11a) represents the linear WLS solution of the minimization problem [39]:

(46) |

Hence, at each iteration , the Gauss-Newton method produces WLS solution of the following system of linear equations:

(47) |

where comprises linear functions, while is the vector of measurement errors. The equation (11a) is the weighted normal equation for the minimization problem defined in (46), or alternatively, equation (11a) is a WLS solution of (47). Consequently, the probability density function associated with the i-th measurement (i.e., the i-th residual component ) at any iteration step :

(48) |

The MAP solution of (9) can be redefined as an iterative optimization problem where, instead of solving (11), we solve the MAP (sub)problem:

(49a) | ||||

(49b) |

Next, we show that the solution of the above MAP sub-problem (49a) over increment variables can be efficiently obtained using the BP algorithm applied over the underlying factor graph.

Note that, in general, the resulting BP algorithm represents an instance of a loopy GBP over a linear model defined by linear functions . Thus if it converges, it provides a solution equal to the linear WLS solution of (11a). The BP solution of in each iteration (outer iteration loop) is obtained via iterative BP algorithm (inner iteration loops). Every inner BP iteration outputs , where is the number of inner BP iterations within the outer iteration .

### V-B The Factor Graph Construction

From the factorization of the likelihood expression (49a), one easily obtains the factor graph corresponding to the GN-BP method as follows. The increments of state variables determine the set of variable nodes and each likelihood function represents the local function . Since the residual equals , in general, the set of factor nodes is defined by the set of measurements . The factor node connects to the variable node iff the increment of the state variable is an argument of the corresponding function