A LearningtoInfer Method for RealTime
Power Grid MultiLine Outage Identification
Abstract
Identifying a potentially large number of simultaneous line outages in power transmission networks in real time is a computationally hard problem. This is because the number of hypotheses grows exponentially with the network size. A new “LearningtoInfer” method is developed for efficient inference of every line status in the network. Optimizing the line outage detector is transformed to and solved as a discriminative learning problem based on Monte Carlo samples generated with power flow simulations. A major advantage of the developed LearningtoInfer method is that the labeled data used for training can be generated in an arbitrarily large amount rapidly and at very little cost. As a result, the power of offline training is fully exploited to learn very complex classifiers for effective realtime multiline outage identification. The proposed methods are evaluated in the IEEE 30, 118 and 300 bus systems. Excellent performance in identifying multiline outages in real time is achieved with a reasonably small amount of data.
I Introduction
Lack of situational awareness in abnormal system conditions is a major cause of blackouts in power networks [3]. Network component failures such as transmission line outages, if not rapidly identified and contained, can quickly escalate to cascading failures. In particular, when line failures happen, the power network topology changes instantly, newly stressed areas can unexpectedly emerge, and subsequent failures may be triggered that lead to increasingly complex network topology changes. While the power system is usually protected against the so called “” failure scenarios (i.e., only one component fails), as failures accumulate, effective automatic protection is no longer guaranteed. Thus, when cascading failures start developing, effective protective actions/interventions critically depend on correct and timely knowledge of the network status. Indeed, without accurate knowledge of the line outages, protective control methods have been observed to further aggravate the failure scenarios [4]. Thus, realtime line outage identification is essential to all network control decisions for mitigating failures. In particular, since the first few line outages may have already escaped the operators’ attention, the ability to identify in real time the network topology with an arbitrary number of line outages becomes critical to prevent system collapse.
Realtime line outage identification is however a very challenging problem, especially when unknown line outages in the network quickly accumulate as in scenarios that cause largescale blackouts [3]. The number of possible outage hypotheses grows exponentially with the number of line outages, making realtime multiline outage identification fundamentally hard. Other limitations in practice such as behaviors of human operators under time pressure, missing and contradicting information, and privacy concerns over data sharing can make this problem even harder. Assuming a small number of line failures, exhaustive search methods have been developed in [5], [6], [7] and [8] based on hypothesis testing, and in [9] based on logistic regression. To overcome the prohibitive computational complexity of exhaustive search methods, [10] has developed sparsity exploiting outage identification methods with overcomplete observations to identify sparse multiline outages. Without assuming sparsity of line outages, a graphical model based approach has been developed for identifying arbitrary multiline outages [11]. Sequential line outage detection method has also been proposed [12].
On a related note, nonrealtime power grid topology identification has also been extensively studied: the underlying topology stays the same, while many data are collected over a relatively long period of time before the topology can be identified [13, 14, 15]. A variety types of data have been exploited for addressing this problem, e.g., data of power injections [16], voltage correlation [17], and energy prices [18]. For power distribution systems in particular, various graph learning approaches have also been developed [19, 20].
In this paper, we focus on realtime identification of a potentially large number of simultaneous line outages based on a set of measurements collected at any one point of time in the power system. We start with a probabilistic model of the variables in a power system (line statuses, power injections, voltages, power flows, currents etc.) and in its monitoring system (sensor measurements on all kinds of physical quantities). We then formulate the multiline outage identification problem in a Bayesian inference framework, where we aim to compute the posterior probabilities of the postoutage topologies given any measurements at any one point of time.
To overcome the fundamental computational complexity due to the exponentially large number of possible postoutage topologies, we develop a learning based framework inspired by (but different from) variational inference, in which we aim to approximate the desired posterior probabilities using models that allow computationally easy marginal inference of line statuses. Importantly, we develop “endtoend” predictor models for multiline outage identification, and allow arbitrary model structures and complexities. In order to find effective endtoend predictor models, we transform optimizing a predictor model to a discriminative learning problem leveraging a Monte Carlo approach: a) based on fullblown power flow equations, data samples of network topology, network states, and sensor measurements in the network can be efficiently generated according to a generative model of these quantities, and b) with these simulated data, discriminative models are learned offline, which then offer realtime prediction of the line outages based on newly observed measurements from the real network. We thus term the proposed method “LearningtoInfer”. It is important to note that this LearningtoInfer method is not limited by any potential lack of realworld data, as the offline training procedure can be conducted entirely based on simulated data.
A major strength of the proposed LearningtoInfer method is that the labeled data set for training the predictor model can be generated in an arbitrarily large amount, at very little cost. As such, we can fully exploit the benefit of offline model training in order to get accurate online multiline outage identification performance. The proposed approach is also not restricted to specific models and learning methods, but can exploit any powerful models such as deep neural networks [21]. As a result, predictor models of very high complexities can be adopted, yet without worrying about overfitting since more labeled training data can always be generated had overfitting been observed.
The developed LearningtoInfer method is evaluated in the IEEE 30, 118, and 300 bus systems [22] for identifying an arbitrary number of line outages. It is demonstrated that, even with relatively simple predictor models and a reasonably small amount of data, the performance is surprisingly good for this very challenging task.
The remainder of the paper is organized as follows. Section II introduces the system model, and formulates realtime multiline outage identification as a Bayesian inference problem. Section III develops the LearningtoInfer method. Section IV discusses the architectures of neural networks employed in this study. Section V presents the results from our numerical experiments. Section VI concludes the paper.
Ii Problem Formulation
Iia Power Flow Models
We consider a power system with buses, and its baseline topology (i.e., the network topology when there is no line outage) with lines. We denote the incidence matrix of the baseline topology by [23]. We use a binary variable to denote the status of a line , with for a connected line , and otherwise. The actual topology of the network can then be represented by . Generalizing this notation with a bit of abuse, we also employ to denote whether two buses and are connected by a line or not, (for simplicity, we consider any two buses can be connected by at most one line.) Given a network topology , the system’s bus admittance matrix can be determined accordingly with the physical parameters of the system [24]: , where and denote conductance and susceptance, respectively. Note that, when two buses and are not connected, .
We denote the real and reactive power injections at all the buses by , and the voltage magnitudes and phase angles by . Given the bus admittance matrix , the nodal power injections and the nodal voltages satisfy the following AC power flow equations [24]:
(1) 
where a subscript denotes the component of a vector. In particular, given the network topology and a set of controlled input values , (where and consist of some subsets of and , respectively,) the remaining values of can be determined by solving (IIA). Typically, apart from a slack bus, most buses are “ buses” at which the real and reactive power injections are controlled inputs, and the remaining buses are “ buses” at which the real power injection and voltage magnitude are controlled inputs [24]. We refer the readers to [24] for more details of solving AC power flow equations.
A useful approximation of the AC power flow model is the DC power flow model: under a topology , the nodal real power injections and voltage phase angles approximately satisfy the following equation [24]:
(2) 
where , , and is the reactance of line . We note that, in the DC power flow model, reactive power is not considered, and all voltage magnitudes are approximated by a constant.
IiB Observation Models
To monitor the power system, we consider realtime measurements taken by sensors measuring nodal voltage magnitudes and phase angles, current magnitudes and phase angles on lines, real and reactive power flows on lines, nodal real and reactive power injections, etc. In general, the observation model can be written as the following,
(3) 
where a) collects all the noisy measurements, b) , denotes the noiseless values of the measured quantities, and the forms of depend on the specific locations and types of the sensors, and c) denote the measurement noises.
Remark 1
A noiseless measurement function can be an implicit function without a closed form expression. For example, given and , while the nodal voltage magnitude and phase angle at a particular bus can be solved from (IIA), such a solution can only be obtained using numerical methods, and a closed form expression is not available. For discussions on the existence and uniqueness of the solution to the power flow equations (IIA), we refer the readers to [25].
The observation models can be significantly simplified under the approximate DC power flow model (2). For example, measurements of provided by phasor measurement units (PMUs) located at a subset of the buses can be modeled as
(4) 
where is formed by entries of from buses in . From the DC power flow model (2), we have
(5) 
where denotes pseudoinverse^{1}^{1}1For a connected network, the solution of given is made unique by setting the phase angle at a reference bus to be zero.. We note that, while the noiseless voltage phase angle measurements enjoy a closed form (5) and are linear in the power injections , they are not linear in the line statuses .
IiC MultiLine Outage Identification as Bayesian Inference
We are interested in identifying the postoutage network topology in real time based on instant measurements collected in the power system. We formulate this multiline outage identification problem as a Bayesian inference problem. First, we model and with a joint probability distribution,
(6) 
It is important to note that, given , the noiseless measurements (cf. (3)) can be exactly computed by solving the AC power flow equations (IIA). Adding noises to then leads to .
Remark 2 (Generative Model)
(6) represents a generative model [26] with which a) the topology and the controlled inputs of power injections and voltage magnitudes are generated according to a prior distribution , and b) all the quantities measured in the system can then be computed by solving the power flow equations (IIA), based on which the actual noisy measurements follow the conditional probability distribution . We note that, as in many Bayesian inference problems, an accurate prior distribution may be difficult to obtain in practice. Nonetheless, a sharp concentration of the posterior distribution on the true postoutage network topology allows effective inference of multiline outages even in the absence of accurate knowledge of the prior.
Our objective is to infer the topology of the power grid given the observed measurements . Thus, under a Bayesian inference framework, we are interested in computing the posterior conditional probabilities: ,
(7) 
Given the observations , a maximum aposteriori probability (MAP) detector would pick as the topology/multiline outage identification decision, which minimizes the identification error probability [27]. However, as the number of hypotheses of grows exponentially with the number of unknown line statuses, performing such a hypothesis testing based on an exhaustive search becomes computationally intractable. In general, as there are up to possibilities for , computing, or even listing the probabilities has an exponential complexity.
Posterior Marginal Probabilites
As an initial step towards addressing the fundamental challenge of computational complexity, instead of computing , we focus on computing the posterior marginal conditional probabilities . We note that the posterior marginals are characterized by just numbers, , as opposed to numbers required for characterizing . Accordingly, the hypothesis testing problem on is decoupled into separate binary hypothesis testing problems: for each line , the MAP detector identifies . As a result, instead of minimizing the identification error probability of the vector , the binary MAP detectors minimize the identification error probability of each line status .
Although listing the posterior marginals are tractable, computing them, however, still remains intractable. In particular, even with given, summing out all , to obtain still requires exponential computational complexity [28]. As a result, even a binary MAP detection decision of cannot be made in a computationally tractable way. This challenge will be addressed by a novel method we will develop in the next section.
Iii A LearningToInfer Method
Iiia A Variational InferenceInspired Framework
In this section, we develop a variational inferenceinspired method for approximate inference of the posterior marginal conditional probabilities . The general idea is to find a conditional distribution that

approximates the original very closely, and

offers fast and accurate multiline outage identification results based on easily computable .
In particular, we consider that is modeled by some parametric form (e.g., neural networks), and is hence chosen from some family of parametrized conditional probability distributions , where is a vector of model parameters. It is worth noting that is a function of both and , and the parameters associate both and with the probability value , for all possible and .
To achieve the two goals above, we aim to choose a family of probability distributions to satisfy the following:

The parametric form of has sufficient expressive power to represent very complicated functions, so that our approximation to the true can be made sufficiently precise.

It is easy to compute the marginal , so that we can use it to infer with low computational complexity in real time based on the observed .
From a family of parametrized distributions , we would like to choose a that approximates as closely as possible. For this, we employ the KullbackLeibler (KL) divergence as a metric of closeness between two probability distributions,
(8) 
Note that, for any particular realization of observations , a KL divergence can be computed. Thus, can be viewed as a function of . Since we would like the parametrized conditional to closely approximate for all , we would like to minimize the expected KL divergence as follows:
(9) 
where the expectation is taken with respect to the true distribution .
Offline computation:  

1. Generate labeled data set using Monte Carlo  
simulations with the fullblown power flow and  
sensor models.  
2. Select a parametrized predictor model .  
3. Train the model parameters using the generated data  
set.  
Online inference (in real time):  
1. Collect instant measurements from the system.  
2. Compute the approximate posterior marginals  
and infer the line statues . 
IiiB From Generative Model to Discriminative Learning
Evaluating is, however, very difficult, primarily because it again requires the summation of an exponentially large number of terms. To address this, the key step forward is that we can approximate the expectation by the empirical mean of over a large number of Monte Carlo samples, generated according to (ideally) the true joint probability (cf. (6)). We denote the relevant Monte Carlo samples by . Accordingly, (9) is approximated by the following,
(10) 
With a data set generated using Monte Carlo simulations, (10) can then be solved as a deterministic optimization problem. The optimal solution of the model parameters approaches that for the original problem (9) as .
In fact, the problem (10) can be viewed as an empirical risk minimization problem in machine learning [29], as it trains a discriminative model with a data set generated from a generative model (cf. Remark 2). As a result of this offline learning / training process (10), an approximate posterior function is obtained. Furthermore, it can be shown that (10) is equivalent to finding the maximum likelihood estimate of on the data set .
IiiC Offline Learning for Online Inference
It is important to note that,

the training process to obtain the function is conducted completely offline;

the use of the trained function is, however, in real time, i.e., online.
In particular, in real time, given whatever newly observed measurements of the system, based on , the approximate posterior marginals will be computed. Based on such instantly computed , a detection decision of whether line () is connected or not in the current topology will be made. For example, a MAP detector would make the following decision,
(11) 
Accordingly, we name our proposed methodology “LearningtoInfer”: To perform real time inference of multiline outages, we exploit offline learning to train a detector based on labeled data simulated from the fullblown physical model of the power system. The methodology is summarized in Table I. A system diagram is plotted in Figure 1.
Remark 3 (Training Binary Classifiers)
For any detector that identifies the status of a line , (e.g., a binary MAP detector), it can also be viewed as a binary classifier : For each possible realization of , this classifier outputs an inferred status of line . From this perspective, solving (10) is exactly a supervised learning process based on a labeled data set, , where are the output labels that correspond to the input data . As a result, the rich literature on supervised learning for training binary classifiers directly apply to our problem under this LearningtoInfer framework.
Remark 4 (Difference from Variational Inference)
It is worth noting the fundamental difference between the proposed LearningtoInfer method and variational inference methods. Importantly, for every new inference instance given a new observation , variational inference methods need to call an optimization procedure to solve for a new variational model. In contrast, LearningtoInfer only trains the predictor once in an offline fashion, and simply calls the trained for any new inference instance given a new observation . As such, the online computation time needed by LearningtoInfer is very little (e.g., performing a forward pass in a neural network), whereas that needed by variational inference methods is much more significant. In essence, LearningtoInfer exploits the underlying lower dimensional structure of to achieve generalizability of the trained predictor to all possible new observations .
IiiD Advantages of the Proposed Method
One great advantage of this LearningtoInfer method is that we can generate labeled data very efficiently. Specifically, we can efficiently sample from the generative model of (cf. (6)) as long as we have some prior that is easy to sample from. While historical data and expert knowledge would surely help in forming such priors, using simple uninformative priors can already suffice as will be shown later in the numerical examples. As a result, we can obtain an arbitrarily large set of data at very little cost to train the discriminative model. This is quite different from the typical situations encountered in machine learning problems, where obtaining a large amount of labeled data is usually expensive as it requires extensive human annotation effort.
Furthermore, once the approximate posterior distribution is learned, it can be deployed to infer the multiline outages in realtime as the computation complexity of is very low by design. This is especially important in monitoring largescale power grids in real time, because, although training could take a reasonably long time, the inference speed is very fast. Therefore, the learned predictor can be used in real time with lowcost hardware.
Limitations of Historical Data and Power of Simulated Data
In overcoming the computational complexity challenges of realtime multiline outage identification, it is particularly worth noting the fundamental limitation of using real historical data. Even with the explosion of data available from pervasive sensors in power systems, the data are often collected under a very limited set of system scenarios. For example, most historical data are collected under normal system topologies. Even with data collected under slowly updated systems or faulty systems, the underlying topologies in these real world cases only represent an extremely small fraction of the entire, exponentially large set of all topologies. Consequently, historical data are fundamentally insufficient to resort to for realtime multiline outage identification especially under rare failure events.
Simulated data, as evidenced in the proposed LearningtoInfer framework, offer great potential beyond what historical data can offer. An orders of magnitude richer set of scenarios can be generated, and a learning procedure based on these simulated data can provide very powerful classifiers for identifying arbitrary multiline outages that may appear in the future, but have not at all appeared in the past including the simulated scenarios. Last but not least, it is important to note that the simulated scenarios needed for the proposed LearningtoInfer method would still be a very small fraction of the entire, exponentially large model space, as will be demonstrated later in the numerical experiments. As such, it is the good generalizability of the classifiers trained using the simulated data that enables effective outage inference under new failure events.
Remark 5 (Learning from the Physical Model)
In the proposed LearningtoInfer method, the training process is at heart learning from the underlying power system physical model. Instead of manually deriving outage detection rules from analyzing the physical model, the proposed method uses a training procedure to learn such rules from massive data generated according to the physical model. As such, the rich information embedded in the physical model are carried by the data simulated with it, and then learned by the predictor from training with these simulated data. The LearningtoInfer method is thus a systematic “indirect” way of learning and using the information from the physical model.
Remark 6 (Side Information and Change of Settings)
An interesting question on generalizing the LearningtoInfer method is how additional information (other than the observed ) may be incorporated. For example, the system operator may receive the side information that certain lines are active for sure. Furthermore, there can also be more systematic changes on what information are collected, notably, change of the measurement set due to installation of additional sensors. For incorporating additional information, one way is to introduce additional inputs to the predictor during the offline training process. For example, we can let each line have a “prior” (even though in reality it can come from a posterior knowledge source) which is fed into the predictor. The data set generation and training would then need to include varying priors of these. Furthermore, a systematic way of dealing with slowly updating priors as well as changes in the measurement sets is to employ “Transfer Learning”. Specifically, the changes in the measurement sets tend not to be so dramatic over a short period of time. Thus, the previously trained neural network can serve as a good initial point when we tune the neural network for an updated measurement set. The additional training time needed would be much shorter than if we train from scratch. These extensions are however out of the scope of this paper, and are left for future investigations.
Iv Neural Network Architectures for Learning Classifiers
To perform binary MAP inference of each line status, the decision boundary of the MAP detector is highly nonlinear (cf. Remark 3). We investigate classifiers based on neural networks to capture such complex nonlinear decision boundaries. In other words, we employ neural networks as the parametric models : given the input data , the output layer of the neural network will produce the probabilities , (based on which identification decisions are then made.)
In particular, we employ a neural network architecture that allows classifiers for different lines to share features. Specifically, instead of training separate neural networks each with one node in its output layer, we train one neural network whose output layer consists of nodes each predicting a different line’s status. An illustration of this architecture is depicted in Figure 2: a) the input layer of the neural network consists of , b) the hidden layers of neurons compute a number of nonlinear features of the input , and c) the output layer applies binary classifiers to these features to predict . Specifically, logistic functions are employed in the output layer whose outputs correspond to . As a result, the features computed by the hidden layers can all be used in classifying any line’s status. The intuition of using shared features is that certain common features may provide good predictive power in inferring many different lines’ statuses in a power network. For training and testing, we generate labeled data randomly that satisfy the power flow equations and the observation models. Each then consists of labels used by the output classifiers respectively.
With the proposed LearningtoInfer method, since labeled data can be generated in an arbitrarily large amount using Monte Carlo simulations, whenever overfitting is observed, it can in principle always be overcome by generating more labeled data for training. Thus, as long as the computation time allows, we can use neural network models of very high complexity for approximating the binary MAP detectors, without worrying about overfitting.
V Numerical Experiments
We evaluate the proposed LearningtoInfer method for multiline outage identification with three benchmark systems of increasing sizes, the IEEE 30, 118, and 300 bus systems, as the baseline topologies. As opposed to considering only a small number of simultaneous line outages as in existing works, we allow any number of line outages, and investigate whether the learned discriminative classifiers can successfully recover the postoutage topologies in real time.
Va Data Set Generation
In our experiments, the data sets are primarily generated with the DC power flow model (2). Here, our focus is to examine whether the proposed LearningtoInfer method can effectively overcome the fundamental challenge of exponential computation complexity due to the potentially large number of simultaneous line outages. For this, the DC power flow model offers sufficient modeling details. We will then at the end of the section run experiments with data sets generated with the AC power flow model (IIA), and verify that the lessons learned from the DC power flow experiments continue to hold.
With the DC power model, the set of controlled inputs reduce to , and the generative model (6) reduces to . To generate a data set , we assume the prior distribution factors as . As such, we generate the postoutage network topologies and the power injections independently:

We generate the line statuses using independent and identically distributed (IID) Bernoulli random variables, so that the average numbers of line outages are and for the IEEE 30, 118 and 300 bus systems, respectively. These numbers of simultaneous line outages are significantly higher than those typically assumed in sparse line outage studies. We do not consider disconnected networks in this study, and exclude the line status samples if they lead to disconnected networks. As such, considering that some lines must always be connected to ensure network connectivity, after some network reduction, the equivalent networks for the IEEE 30, 118, and 300 bus systems have , and lines that can possibly be in outage, respectively.

We would like our predictor to be able to identify multiline outages for arbitrary values of power injections as opposed to fixed ones. Accordingly, we generate using the following procedure: For each data sample, we first generate bus voltage phase angles as IID uniformly distributed random variables in , and then compute according to (2) under the baseline topologies. We note that, the spread of the phase angles in the generated data sets can cover nearly all possible power injection cases in real power transmission networks.
With each pair of generated and , we consider two types of measurements that constitute : nodal voltage phase angle measurements and nodal power injection measurements. For these, a) we generate IID Gaussian voltage phase angle measurement noises with a standard deviation of degree, the stateoftheart PMU accuracy [30], and b) we assume power injections are measured accurately. In the following experiments, we consider that measurements of voltage phase angles and power injections are collected at all the buses. The effect of number and locations of sensors will be discussed later in this section.
The (reduced) IEEE 30 bus system with lines  

Number of all possible  
postoutage topologies  
Number of topologies with  
line outages  
The generated data set  
The (reduced) IEEE 118 bus system with lines  
Number of all possible  
postoutage topologies  
Number of topologies with  
disconnected lines  
The generated data set  
The (reduced) IEEE 300 bus system with lines  
Number of all possible  
postoutage topologies  
Number of topologies with  
disconnected lines  
The generated data set 
In this study, we generate , , and data samples for the IEEE 30, 118, and 300 bus systems, respectively. These data are further divided into , , and samples for training, validation, and testing, respectively. We note that over of the generated 30bus multiline outages are distinct from each other, so are those of the generated 118bus multiline outages and those of the 300bus multiline outages. As a result, these generated data sets can very well evaluate the generalizability of the trained classifiers, as (almost) all data samples in the test set have postoutage topologies unseen in the training set.
Furthermore, we would like to compare the size of the generated data set to the total number of possible outage hypotheses, as highlighted in Table II. Clearly, a) it is computationally prohibitive to perform line outage inference based on exhaustive search, and b) the generated and data sets are only a tiny fraction of the entire space of all multiline outages. Yet, we will show that the classifiers trained with the generated data sets exhibit excellent inference performance and generalizability.
VB Neural Network Structure and Training
We employ threelayer (i.e., one hidden layer) fully connected neural networks with the feature sharing architecture (cf. Figure 2). Rectified Linear Units (ReLUs) are employed as the activation functions in the hidden layer. In training the classifiers, we use stochastic gradient descent (SGD) with momentum update and Nesterov’s acceleration [31]. While this optimization algorithm works sufficiently well for our experiments, we note that other algorithms may further accelerate the training procedure [32].
VC Evaluation Results
VC1 Performance of the LearningtoInfer Method
We employ and neurons in the hidden layer for the IEEE 30, 118 and 300 bus systems, respectively. For all the three systems, we plot in Figure 3 the achieved training and validation losses for every epoch, and in Figure 3 the achieved testing accuracies for every epoch. It is clear that the training and validation losses stay very close to each other for all the three systems, and thus no overfitting is observed. Moreover, very high testing accuracies, 0.989, 0.990 and 0.997 are achieved for the IEEE 30, 118 and 300 bus systems, respectively.
The testing accuracies can be equivalently understood by the average numbers of misidentified line statuses, plotted in Figure 3. We observe that, at the beginning of the training procedures, the average numbers of misidentified line statuses are , and for the IEEE 30, 118 and 300 bus systems, which are exactly the average numbers of disconnected lines in the respective generated data sets (cf. Section VA). Indeed, this coincides with the result from a naive identification decision rule of always claiming all the lines as connected (i.e., a trivial majority guess). As the training procedures progress, the average numbers of misidentified line statuses are drastically reduced to eventually , and . In other words, for the IEEE 300 bus system for example, facing on average simultaneous line outages, only line status would be misidentified on average by the learned classifier. We note that such a performance is achieved with outage identification decisions made in real time, under a millisecond. While the training process can potentially be time consuming, it is however done completely offline.
It is worth noting that we have generated the training, validation and testing data sets with uniformly random voltage phase angles, and hence considerably variable power injections. In practice, there is often more informative prior knowledge about the power injections based on historical data and load forecasts. With such information, the model can be trained with much less variable samples of power injections, and the outage identification performance can be further improved.
VC2 Model Size, Sample Complexity, and Scalability
In the proposed LearningtoInfer method, obtaining labeled data is not an issue since data can be generated in an arbitrarily large amount using Monte Carlo simulations. This leads to two questions that are of particular interest: to learn a good classifier, a) what size of a neural network is needed? and b) how much data need to be generated? To answer these questions, we vary the sizes of the hidden layer of the neural networks as well as the training data size, and evaluate the learned classifiers for the three benchmark systems. We plot the testing results for the IEEE 30, 118 and 300 bus systems in Figure 4, 4 and 4, respectively. It is observed that the best performance is achieved with data and with 300/1000/3000 neurons for the 30/118/300 bus systems, respectively. Further increasing the data size or the neural network size would see much diminished returns.
Based on all these experiments, we now examine the scalability of the proposed LearningtoInfer method as the problem size increases. We observe that training data sizes of and and neural network models of sizes 300, 1000 and 3000 ensure very high and comparable performance with no overfitting for the IEEE 30, 118 and 300 bus systems, respectively. When these data sizes are reduced by a half, some levels of overfitting then appeared for these models in all the three systems. We plot the training data sizes compared to the problem sizes for the three systems in Figure 5. We observe that the required training data size increases approximately linearly with the problem size. This linear scaling behavior implies that the proposed LearningtoInfer method can be effectively implemented for largescale systems with reasonable computation resources.
VC3 Effect of Number and Locations of Sensors
We now discuss the effect of sensor placement in realtime multiline outage identification. It is clear that the performance of line outage identification would closely depend on where and what types of sensor measurements are collected. Given limited sensing resources, optimizing the sensor placement is a hard problem for which many studies have addressed (see, e.g., [7] among others). Here, we present the results from a case study on the IEEE 30 bus system, for which voltage phase angles are collected only at buses (as opposed to all the buses as in the previous experiments), as depicted in Figure 6. Interestingly, the achieved average identification accuracy only drops to (from when all the buses are monitored.) This translates to on average only misidentified line statuses among a total of lines. A more comprehensive study of sensor placement for realtime multiline outage identification is left for future work.
VD Experiments with the AC Power Flow Model
We close this section by verifying the performance of the proposed LearningtoInfer method with data generated from the AC power flow model. Specifically, we consider the IEEE 118bus system with 18 generators and 99 loads. Similar to the earlier data set generation process with the DC power flow model, we randomly generate distinct connected postoutage topologies with an average number of line outages. We then significantly and randomly vary the power generation and loads in the system with standard deviations equal to 50% of the means, and generate distinct generation and load profiles.
For each data point which includes a postoutage topology and a generation and load profile, we solve the AC power flow equations (IIA). To have a consistent comparison with the earlier experiments with the DC power flow model, we continue to rely on measurements of nodal voltage phase angles, real power generation, and real power loads to infer the multiline outages in real time. We will demonstrate that, with the AC power flow model, very high performance similar to that with the DC power flow model can be achieved. Undoubtedly, other types of measurements (e.g., voltage magnitudes, reactive power) may be used to further improve the performance, which is left for future investigation.
The data are divided into , , and for training, validation, and testing, respectively. Similarly to the DC power flow experiments, we employ a twolayer fully connected neural network with 1000 neurons in the hidden layer for learning to infer multiline outages. The same training algorithm is applied. We plot the training and testing accuracies for every epoch in Figure 7. We observe that a testing accuracy is achieved, (recall that the same accuracy, , is achieved in the earlier experiments on the 118bus system with the DC power flow model). This translates to on average misidentified line statuses.
Furthermore, we looked into the types of misidentification errors, and observed that a) the rate of missed detection (i.e., missing a line outage when it actually occurred among other simultaneous line outages) is , and b) the rate of false alarm (i.e., identifying a line as in outage when it is in fact connected) is a much lower . As a result, we observe that nearly of the on average misidentified line statuses are from missing to detect of the on average simultaneous line outages, resulting in missed line outages.
VE On Computation Times for Data Generation and Training
As discussed above, a major advantage of the LearningtoInfer method is that offline computation is exploited for achieving fast and accurate online inference. Specifically, the offline computation consists of two components: a) data generation based on the physical model, and b) predictor training based on the generated data. We discuss in the following several aspects of the offline computation times for data generation and predictor training.
The time consumed for generating the 1M data with the AC power flow on the IEEE 118 bus system (cf. Section VD) is a little over an hour using MATPOWER [33]. The training time with 2000 epochs on these data is a little over two hours. Both are run on a laptop with an Intel Core i7 3.1GHz CPU and 8 GB of RAM. Various approaches can be applied to reduce both times. On the one hand, data generation can be trivially parallelized and significantly accelerated as such. It is worth reemphasizing that data generation via simulations, while still may take a nontrivial amount of time for large systems, is regardless many orders of magnitude faster than collecting and manually labeling historical data from realworld systems. On the other hand, the experiments conducted in this section have achieved very high identification accuracies around or above . In practice, if the performance requirement is not as high (e.g., ), then a significantly smaller amount of data (cf. Figures 4 4 and 4) and less number of training epochs (cf. Figure 3) would be sufficient. The sizes of the neural networks can also be reduced which will lead to faster training. Leveraging the above approaches, much less computation times can be achieved for offline data generation and training.
Vi Conclusion
We have developed a new LearningtoInfer method for realtime multiline outage identification in power grids. The computational complexity due to the exponentially large number of outage hypotheses is overcome by efficient marginal inference with optimized predictor models. Optimization of the predictor model is transformed to and solved as a discriminative learning problem, based on Monte Carlo samples efficiently generated with fullblown power flow models. The developed LearningtoInfer method has the major advantages that a) the training process takes place completely offline, and b) labeled data sets can be generated in an arbitrarily large amount fast and at very little cost. As a result, very complex predictor models can employed without worrying about overfitting, as more labeled training data can always be generated had there been overfitting observed. With the classifiers learned offline, their actual use is in real time, and outage identification decisions are made under a millisecond. We have evaluated the proposed method with the IEEE 30, 118 and 300 bus systems. It has been demonstrated that arbitrary multiline outages can be identified in real time with excellent performance using classifiers trained with a reasonably small amount of generated data.
References
 [1] Y. Zhao, J. Chen, and H. V. Poor, “Learning to infer: A new variational inference approach for power grid topology identification,” in Proc. IEEE Workshop on Statistical Signal Processing, Jun. 2016, pp. 1–5.
 [2] ——, “Efficient neural network architecture for topology identification in smart grid,” in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec. 2016, pp. 811–815.
 [3] USCanada Power System Outage Task Force, Final Report on the August 14, 2003 Blackout in the United States and Canada, 2004.
 [4] ArizonaSouthern California Outages on September 8, 2011: Causes and Recommendations. FERC, NERC, 2012.
 [5] J. E. Tate and T. J. Overbye, “Line outage detection using phasor angle measurements,” IEEE Transactions on Power Systems, vol. 23, no. 4, pp. 1644 – 1652, Nov. 2008.
 [6] ——, “Double line outage detection using phasor angle measurements,” in Proc. IEEE Power and Energy Society General Meeting, Jul. 2009.
 [7] Y. Zhao, J. Chen, A. Goldsmith, and H. V. Poor, “Identification of outages in power systems with uncertain states and optimal sensor locations,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 6, pp. 1140–1153, Dec. 2014.
 [8] Y. Zhao, A. Goldsmith, and H. V. Poor, “On PMU location selection for line outage detection in widearea transmission networks,” in Proc. IEEE Power and Energy Society General Meeting, July 2012, pp. 1–8.
 [9] M. Garcia, T. Catanach, S. Vander Wiel, R. Bent, and E. Lawrence, “Line outage localization using phasor measurement data in transient state,” IEEE Transactions on Power Systems, vol. 31, no. 4, pp. 3019–3027, 2016.
 [10] H. Zhu and G. B. Giannakis, “Sparse overcomplete representations for efficient identification of power line outages,” IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 2215–2224, Nov. 2012.
 [11] J. Chen, Y. Zhao, A. Goldsmith, and H. V. Poor, “Line outage detection in power transmission networks via message passing algorithms,” in Proc. 48th Asilomar Conference on Signals, Systems and Computers, 2014, pp. 350–354.
 [12] J. Heydari and A. Tajer, “Quickest localization of anomalies in power grids: A stochastic graphical framework,” IEEE Transactions on Smart Grid, 2017.
 [13] X. Li, H. V. Poor, and A. Scaglione, “Blind topology identification for power systems,” in Proc. the IEEE International Conference on Smart Grid Communications, 2013, pp. 91–96.
 [14] Y. Yuan, O. Ardakanian, S. Low, and C. Tomlin, “On the inverse power flow problem,” arXiv preprint arXiv:1610.06631, 2016.
 [15] I. Gera, Y. Yakoby, and T. Routtenberg, “Blind estimation of states and topology (best) in power systems,” in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2017, pp. 1080–1084.
 [16] M. He and J. Zhang, “A dependency graph approach for fault detection and localization towards secure smart grid,” IEEE Transactions on Smart Grid, vol. 2, no. 2, pp. 342–351, Jun. 2011.
 [17] S. Bolognani, N. Bof, D. Michelotti, R. Muraro, and L. Schenato, “Identification of power distribution network topology via voltage correlation analysis,” in Proc. IEEE Conference on Decision and Control, 2013, pp. 1659–1664.
 [18] V. Kekatos, G. B. Giannakis, and R. Baldick, “Online energy price matrix factorization for power grid topology tracking,” IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1239–1248, 2016.
 [19] Y. Weng, Y. Liao, and R. Rajagopal, “Distributed energy resources topology identification via graphical modeling,” IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 2682–2694, 2017.
 [20] D. Deka, M. Chertkov, and S. Backhaus, “Structure learning in power distribution networks,” IEEE Transactions on Control of Network Systems, 2017.
 [21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
 [22] Power Systems Test Case Archive, University of Washington Electrical Engineering, https://www.ee.washington.edu/research/pstca/.
 [23] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, “Network flows: theory, algorithms, and applications,” 1993.
 [24] J. D. Glover, M. Sarma, and T. Overbye, Power System Analysis & Design. Cengage Learning, 2011.
 [25] R. Baldick, Applied Optimization: Formulation and Algorithms for Engineering Systems. Cambridge University Press, 2006.
 [26] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
 [27] H. V. Poor, An Introduction to Signal Detection and Estimation. SpringerVerlag, New York, 1994.
 [28] M. Mezard and A. Montanari, Information, physics, and computation. Oxford University Press, 2009.
 [29] V. Vapnik, Statistical Learning Theory. Wiley, New York, 1998.
 [30] A. von Meier, D. Culler, A. McEachern, and R. Arghandeh, “Microsynchrophasors for distribution systems,” in Proc. IEEE Innovative Smart Grid Technologies (ISGT), 2013.
 [31] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, 2013, vol. 87.
 [32] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [33] R. D. Zimmerman, C. E. MurilloSanchez, and R. J. Thomas, “MATPOWER: steadystate operations, planning and analysis tools for power systems research and education,” IEEE Transactions on Power Systems, vol. 26, no. 1, pp. 12 – 19, Feb. 2011.