SampleDerived Disjunctive Rules for Secure Power System Operation
Abstract
Machine learning techniques have been used in the past using Monte Carlo samples to construct predictors of the dynamic stability of power systems. In this paper we move beyond the task of prediction and propose a comprehensive approach to use predictors, such as Decision Trees (DT), within a standard optimization framework for pre and postfault control purposes. In particular, we present a generalizable method for embedding rules derived from DTs in an operation decisionmaking model. We begin by pointing out the specific challenges entailed when moving from a prediction to a control framework. We proceed with introducing the solution strategy based on generalized disjunctive programming (GDP) as well as a twostep search method for identifying optimal hyperâparameters for balancing cost and control accuracy. We showcase how the proposed approach constructs security proxies that cover multiple contingencies while facing highdimensional uncertainty with respect to operating conditions with the use of a case study on the IEEE 39bus system. The method is shown to achieve efficient system control at a marginal increase in system price compared to an oracle model.
I Introduction
The increasing uncertainty that surrounds system operation renders the adoption of probabilistic security assessment frameworks a high priority for many Transmission System Operators (TSOs) worldwide. In the past, largescale Monte Carlo techniques that involve the highdensity sampling of operating points and postfault stability assessment via timedomain simulations have been proposed (e.g., [1, 2, 3]). Consequently, machine learning can be applied to the Monte Carlo samples to build rules that predict postfault stability for unseen operating points. For this predictive task, most researchers have adopted decision trees (DT) or DT ensembles.
Beyond prediction of postfault stability, such samplederived rules can also be used as a control method to delineate the system’s prefault stable operating region. By embedding appropriate constraints in a TSO’s operation and scheduling tools, postfault stability can, in theory, be achieved. In general, two different approaches can be used to infer suitable control actions from samplederived rules; a heuristic search strategy and an optimizationbased approach. In the heuristic followed by [4, 5], when a postfault unstable operating point is encountered, the DT’s decision path of generator power bounds is followed upwards from the terminal node to the corresponding parent node. Subsequently, the generators are redispatched according to the threshold of this parent node to shift the operation to the child node encapsulating mostly postfault stable operating points. The second approach, followed by e.g., [6, 7], solves an Optimal Power Flow (OPF) problem for each terminal node encapsulating mostly postfault stable points, where the corresponding decision path has been added in the form of inequality constraints. After solving all constrained OPFs, the solution with minimal operating cost is selected. Recently, the authors of [8] published a SecurityConstrainedOPF (SCOPF) based on data considering line flow limits as features. A slightly different approach, but one that still considers the complete decision path, is proposed in [9]. Postfault stable generation redispatch is achieved by first finding the most effective generators and second restricting the assumed postfault stable region with adapted bounds for the generator powers.
In practice two challenges arise: (i) the online computation of current control approaches using rules from Monte Carlo samples entails a high computational burden, (ii) the very nature of cost optimality drives system operation right on the limiting rule [6], thus potentially leading to postfault unstable operation even in cases of DTs with very high prediction accuracy.
Apart from describing the challenges of using samplederived rules in a control setting, the contributions of this paper are twofold:

Introducing the disjunctive formulation of the control approach to reduce computational complexity

Proposing a procedure to select parameters that improve the accuracy of rules in a control setting.
A novel approach to embedding disjunctive rules for feasible operation is introduced to address the issue of computational complexity. In particular, the standard corrective operation problem can be extended using generalized disjunctive programming (GDP) [10], resulting in a mixed integer linear problem (MILP) for the DC case or a mixed integer nonlinear problem (MINLP) in the AC case. Convexhull and BigM reformulations are considered to account for the disjunction of learned convex regions. The computational benefit of the proposed approach results from the GDP formulation that enables solvers to make use of branch–and–bound search to efficiently identify the global optimum with respect to the implemented rules instead of evaluating the disjunctive convex regions one by one.
Moreover, a twostep parameter search is proposed to address the challenge of rule inaccuracy: kfold cross validation is used to balance under and overfitting and a safety margin is computed to finally ensure rule accuracy for the complete uncertainty spectrum.
The specific challenges and the elements of the approach are illustrated using an IEEE 39 bus case study. In this paper, a steady state DC analysis is used instead of time domain AC simulations. This allows to compare the proposed approach against a globally costoptimal reference point (obtained by the SCOPF solution) thus simplifying the reproducibility. The case study is used to compare both, the computational benefits and the costeffectiveness.
The rest of the paper is structured as follows. In Section II, the objective of the control purpose and two specific challenges are illustrated. Subsequently, in Section III, the disjunctive approach is introduced including the parameter search for correcting the inaccuracy of disjunctive rules. The case study is carried out in Section IV and Section V concludes the paper.
Ii Embedding Rules in Control
Iia Objective
In broad terms, the objective is to build a control approach to finding an acceptable and costoptimal operating point of the power system given the training data containing samples , of operating points, each with features and a class label , where and corresponds to acceptable and unacceptable operating points, respectively. is assumed to be generated randomly using a Monte Carlo sampling process and is representative for expected operating conditions. The binary class label acceptable/unacceptable can be obtained from simulations and correspond to a userspecified stability criterion.
The objective of this paper is to find rules from in the form of inequality constraints limiting a control approach to acceptable operations. These rules will be embedded in the OPF optimization
(1)  
s.t.  
where contain all power system variables, such as generator powers, line flows, phase angles and bus voltages and are any other auxiliary variables. comprise the system’s operation cost, are the typical equality constraints, such as node balances, line flow equations and are the inequality constraints for physical limitations of the system. Note, the tilde symbol is denoted to the operational uncertainty surrounding the system (e.g., load injections can vary from sample to sample). In the ACOPF case, variables and constraints for active and reactive power are considered, whereas in the DCOPF case only constraints involving active power must be considered. We would like to emphasize: it is not the objective to provide a full control approach involving real postfault stability assessments and an AC setting, rather to obtain from in an efficient and accurate way with respect to the purpose of controlling the system. Through this purpose and through the fact of straddling both domains, mathematical optimization and machine learning, specific challenges arise.
IiB Challenges
Current methods to learn Monte Carlo samplederived rules for control purposes face two particular challenges, which do not arise when using rules solely for prediction: the online computational complexity and the inaccuracy of rules.
IiB1 Computational complexity
Current predictionoriented approaches typically comprise an offline training part, where a group of classifiers is trained on the data and an online part, where the current operating point of interest is evaluated using the pretrained classifiers. Usually, one DT is trained per fault since each fault can have specific characteristics (e.g. individual critical features and individual nonlinear boundary). The complexity to evaluate one operating point in one faultspecific tree is where is the maximal tree depth; consequently, to evaluate against potential faults involves computations with . This computational overhead of evaluating an operating point is negligible since it consists of evaluating simple algebraic equations.
In the online part (as in e.g. [6, 7]), the computations are more costly since many optimizations (one per terminal node ) have to be solved and achieving close to realtime performance can pose substantial challenges. In particular, when learning multiple trees (one tree per fault ) and combining the rules across all trees (e.g. as in [5]) results in high computational complexity with an exponential growth in the number optimizations to be solved. E.g, the example in Fig. 0(a) involves two unacceptable faults in terms of prefault operation state variables and and the final acceptable feasible operating region (shaded blue area) is the conjunction of the areas {(r1,r3),(r1,r4),(r2,r3),(r2,r4)}; this requires to solve four optimization problems, one for each conjunction. In contrast, the computational complexity gets reduced to optimizations when using one global tree for overall acceptability. In the example, the four linear rules from Fig. 0(a) are reduced to a single linear rule for the blue shaded area in Fig. 0(b) and resulting in solving a single optimization. A single global DT is also used in [1] and [7, 9]. It resulted in a rough reduction of in total number of terminal nodes to be considered in [1, p. 205]. Despite the computational reduction to solve optimizations, this still represents a significant computational burden for the online workflow.
IiB2 Accuracy of rules for control
The goal of using rules for control is to minimize the operating points obtained by the control model and falsely classified as acceptable  let’s call this control error. Without unnecessarily restricting the operating region When learning a tree, the algorithm aims to minimize the test error for the given training population by using e.g. the Gini impurity. However, it is very unlikely that this test error equals the control error (as assumed e.g. in [4]) since both metrics refer to radically different populations. In fact, the operating points in the control problem are the result of the optimization stated in Section IIA where the rules are implemented as linear inequality constraints . As such, these operating points are locally concentrated in specific areas of the statespace since the optimal point in a convex linear problem always lies on one of the vertices of the feasible region or on the line segment between two vertices if both are optimal (fundamental theorem of linear programming). For instance, let us consider Fig. 2. Even if the test error is low, e.g. for region r1 (enveloped by vertices V1V4) the test error is (since out of points are wrongly labelled by the rules as acceptable), the local error for vertex V1 is at .
To deal with this challenge, [4] proposed to heuristically determine a new unit commitment without cost considerations after the redispatched point is assessed as unacceptable. [8, 9] proposed to address this issue in the DT learning process by asymmetrically adjusting the weights of observations’ prior probabilities resulting in a conservative shift of the rules towards the acceptable region; this might lead to over or underfitting the data (in particular DTs are known to need tuning to avoid overfitting [11]). In fact, large factors (e.g., and in [9] for unacceptable and acceptable classes respectively) were used and no methodology for deriving this numbers has been presented. As in [5], the challenge of rule inaccuracy can also be addressed in the online workflow by incrementally scanning along a margin to the offlinelearned rule until an acceptable operating point is identified. This online scan requires evaluating the label of the operating points for all faults and would result in a prohibitively high computational burden for online control applications in the case of considering postfault stability as acceptability criterion.
Iii Disjunctive approach for control
Iiia Obtaining rules using machine learning
For binary classification, the CART learning algorithm [12] successively splits the feature space in two halfspaces in each iteration based on the training data . Considering a univariate DT, let be the singleentry vector of each branch node corresponding to the split position in the feature vector and be the split threshold of . The example of Fig. 3 illustrates the splitting scheme using . The algorithm starts by finding the first best split for the initial branch node n1 with threshold . Consequently, the regions and contain the unacceptable and acceptable classes, respectively. The algorithm terminates in the region since the purity criterion (e.g. gini impurity is less than a userdefined threshold) is met leading to a terminal node t1. In contrast, is not pure enough thus the algorithm continues in this branch to split in the same way until a stopping criterion holds (e.g., limit of tree depth, pureness of nodes, etc.). In total, this example tree has two branch nodes n1, n2, two acceptable terminal nodes t1, t3 and one unacceptable terminal node n2 corresponding to the three regions in Fig. 2(b).
To balance under and overfitting, a hyper–parameter gridsearch is proposed by using an nfold crossvalidation learning procedure (Fig. 3(a)). Many DTs () are trained through CART for many hyperparameter combinations using the given training population . These may include the maximal tree depth, the minimal number of samples in one terminal node, the terminal node’s Gini impurity and/or a maximal number of terminal nodes. The best performing is selected through outofsample testing by the use of a userspecified criterion (e.g, typically the ’f1’ score or alternatively the classical test error).
IiiB Accounting for disjunctive rules in the optimization
To reduce the online computations, the necessity of solving multiple optimizations is avoided (e.g., [6, 7] solves one optimization for each terminal node encapsulating the acceptable class). Here, the rules are accounted in a single optimization as a disjunction. The mutually exclusive disjunction contains one rule for each terminal node of the acceptable class. Each rule corresponds to all inequality constraints of all branch nodes from the initial node to the terminal node. Consequently, in the example in Fig. 3, two optimizations that take into account either the rule for t1 or the rule for t3 is replaced by one optimization accounting for the disjunction .
To reformulate disjunctions for optimizations, two different approaches from GDP can be adopted involving binary variables. Whereas the socalled BigM reformulation results in fewer variables and constraints, the convexhull reformulation [13] results in a relaxed linear problem with a feasible region at least as tight as the one from BigM reformulation [10]. Since in this univariate DT case the disjunctions are all simple and all variables are bounded, the BigM formulation results in the same tight relaxation. Let us define as the set of ancestor branch nodes whose left branch has been followed on the path from initial node to the terminal node . Similarly, is the set of rightbranch ancestors. The sets of the example of Fig. 3 would be , , and . The reformulation is presented below:
(2a)  
(2b) 
where is a binary optimization variable for all terminal nodes encapsulating the acceptable class . Further, the optimization is enforced to assign exactly one accepted terminal node by
(3) 
Since the use of a strict inequality in Equation (2b) is not possible in mixed integer optimizations, we propose to add a small value to change to a nonstrict inequality:
(4) 
Note, should be selected in accordance with the solver sensitivities since selecting it too small could cause numerical instabilities in the solver. The vectors of big constants and have to be selected smallest as possible that the relaxed problem has a small feasible region in order to speedup computations. The smallest possible big constants are obtained by
(5a)  
(5b) 
where is the negation of , and are elementwise comparison operators and all considered variables were assumed to be bounded .
The computational benefit of this formulation results from the following: (i) only one model must be initialized, (ii) only one presolve step is required, (iii) the solver can make use of branch–and–bound search, (iv) the solver still can further accommodate speedup methods (e.g. decomposition or batch methods). In addition, reduction in computation is achieved by learning a single tree for global acceptability instead of learning multiple fault–specific trees to reduce the total number of terminal nodes.
IiiC Correction of controloriented rules
To address the issue of rule accuracy in a control setting (as discussed in Section IIB2), a safety margin is used to shift the rule towards the acceptable region as illustrated in Fig. 5. For increasing , the two regions r1 and r2 of the figure are narrowed and therefore the adapted rules provide more reliability since the edges lie further inside the acceptable region. To account for the safety margins, the two inequality constraints Equations (2a), (4) are adapted to
(6a)  
(6b) 
The safety margin that guarantees acceptable operation for the complete range of uncertainty is searched offline (Fig. 3(a)). Hence, the critical computations of evaluating the acceptability label of an operating point are shifted from online (e.g., as in [5]) to offline. For this parameter search, different constrained optimization models with varying are initialized and each is assessed by the use of a set of samples drawn from the distribution of uncertain operation. For those samples, the dispatch decisions are computed by solving all created . For each of the dispatch decisions (samples ), the acceptability label is assessed. Finally, the best with the respective is selected by the TSO in consideration of the minimization of total operating cost (e.g., lost load and generation cost). Note, is the only model used in the online computations and the scale of the search for (and ) depends on the user’s experience of the specific accuracy of the learning algorithm. In this paper, an exhaustive search is used for the purpose of studying the effects of in the case study.
Iv Case study
Iva Test system and solution strategy
[]  1040  725  652  508  687  580  564  865  1100  4000 

[]  30  24  26  32  34  42  44  46  48  35 
The objective of this study is to demonstrate the methodology presented in this paper. The focus laid on the relevant aspects corresponding to the challenges caused by striving both domains, machine learning and mathematical optimization. Consequently, a DC power flow approximation were used and an operating point was considered to be acceptable if no postfault loss of load occurs following any of the possible faults, while the list of faults analyzed included all line outages (similar assumptions have been made by several authors e.g. [6]). These assumptions allowed a comparison with a reference solution (a global cost minimum among acceptable points) obtained using a direct solution of the SCOPF problem.
The IEEE 39 bus system was used in this case study. The network connectivity is shown in Fig. 6 and data such as nominal loads and line reactances were taken from [14]. The system was modified in some aspects: to ensure the feasibility of N1 secure solutions for the complete uncertainty spectrum, all generators allowed for postfault corrective redispatches of and all line flow limits were set to .
To generate a training set, operational uncertainty was considered in all loads with a deviation of from the nominal loads. Load samples were generated from a multivariate Gaussian distribution with Pearson’s correlation coefficient of between each load pair. The inverse transformation method was used to convert sampled values to a marginal Kumaraswamy distribution with the probability density function
where , and . Finally, the sampled values were scaled to the desired range of load variation. The generator power levels were randomly sampled from an uncorrelated uniform distribution within their specific operating range (lower bound was for all generators and upper bounds are shown in Table I). Since no power losses were assumed, the total power of loads must equal the total power generation. Any mismatch was distributed over all generators (positive as negative) by weighting based on the capacity of the generators . All samples that led to a physical inconsistency were disregarded.
In the hyper–parameter gridsearch under and overfitting were balanced for the DT (Fig. 4) via 5fold cross validation. samples were used in the training set and the features consisted of all generator, loads and line flow power levels. Each indivual DT for the global acceptability was learned using the CART algorithm [12] (as implemented in the package scikitklearn 0.18.1 [15] with Python 3.5.2). The best split was selected successively based on minimizing the Gini impurity and the selected non–default settings were to balance different population sizes in two classes and the parameters involved in the hyper–parameter gridsearch: the maximal DT depth and the maximal number of terminal nodes . The best was selected using the ’f1’ score; the optimal parameters found were a maximal tree depth of and a maximal number of terminal nodes of . The final resulting classifier had acceptable and unacceptable terminal nodes.
The safety margin was searched to balance acceptability and cost (Fig. 4). Here, an exhaustive search was undertaken by varying the safety margin in with a stepsize of to study its effect and resulted in models for . Note, this search scale is not required in a realistic control scenario. To convert the strict inequality Equation (2b) into a nonstrict inequality Equation (4), has been used. In each model the objective function to be minimized was the linear cost function of power generation, where in Table I were the cost coefficients. Apart from the disjunctive constraints, the objective function was subject to all node balances and line flow constraints of the DC approximation. For this second offline parameter search, operating points were sampled from the correlated load distribution and for all models, MILPs (in total ) were solved to compute the output of the control approach (dispatch decisions). Finally, the true label was evaluated based on the dispatch decisions. All MILPs were implemented using Pyomo 5.1.1 [16] and Gurobi 7.02 [17] was used as a solver.
IvB Computational complexity
For the online computational complexity, the computation time of solving one single MILP with the proposed disjunctive approach was on average lower than to solve all LPs with separate constraints from the terminal nodes. The main computational benefit results from training the safety margin offline and avoiding the online computation of the labels (e.g., with dynamic simulations as in [5]).
IvC Rule accuracy
For the rule accuracy, Fig. 7 shows the mismatch of control error and the test error based on the training population. For instance for , the average test error was , thus the DT was almost perfect as predictor. However, when the rules are used for control, the actual average control error was . The figure also shows the ability of the proposed method to reach operating points corresponding to the acceptable class by increasing . Specifically, by adding to the rule, acceptable operation could be guaranteed for correlated uncertainty in all loads.
IvD Costeffectiveness
For all samples, the cost of the proposed samplederived disjunctive approach was compared against the SCOPF solution. The average relative cost difference (in blue) is presented in Fig. 7. Note that only operating points that lead to an acceptable solution are used for this comparison. It can be seen that the economic cost difference to the SCOPF solution was generally small (relative difference is for and roughly for ). This was because all cost coefficients are in a very narrow range (Table I). The discontinuous jump around resulted from a particular costeffective terminal node that was excluded for . For , where the operator could guarantee acceptable operation under all potential faults for the complete uncertainty spectrum, the average Euclidean distance of the generator powers to the SCOPF dispatch solution was that was of the average total power.
V Conclusion
We presented the specific challenges when samplederived rules are embedded in optimal decision–making for control of a power system. Particularly challenging are the computational performance and accuracy of rules. We have introduced a novel disjunctive approach to deal with the computational challenge and a grid search strategy using a safety margin to deal with the computational challenge and the inaccuracy of rules. We studied the challenges and the solution strategies by using the IEEE 39 bus network. The proposed disjunctive approach is able to secure system’s dispatch decision against a userspecified (stability) criterion for a wide range of operational uncertainty. In a steadystate comparison, the approach resulted in higher costs than an oracle model and the onlinecomputational cost is low since online simulations are avoided. Moreover, the proposed samplederived disjunctive approach provides a framework capable of accommodating a wide variety of linear and ensemble classifiers. In future works, the implementation of ensembles could be studied and reduction in operational and online computational cost could be achieved by learning safety margins individually for the terminal nodes instead of a generalized margin. Further offline computations could be reduced by using importance sampling techniques.
References
 [1] L. A. Wehenkel, Automatic Learning Techniques in Power Systems. Kluwer Academic Publishers, 1998.
 [2] V. Krishnan, J. D. McCalley, S. Henry, and S. Issad, “Efficient database generation for decision tree based power system security assessment,” IEEE Transactions on Power Systems, vol. 26, no. 4, pp. 2319–2327, 2011.
 [3] I. Konstantelos, G. Jamgotchian, S. Tindemans, P. Duchesne, S. Cole, C. Merckx, G. Strbac, and P. Panciatici, “Implementation of a massively parallel dynamic security assessment platform for largescale grids,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp. 1–1, 2016.
 [4] E. S. Karapidakis and N. D. Hatziargyriou, “Online preventive dynamic security of isolated power systems using decision trees,” IEEE Transactions on Power Systems, vol. 17, no. 2, pp. 297–304, 2002.
 [5] Y. Xu, Z. Y. Dong, R. Zhang, and K. Po Wong, “A decision treebased online preventive control strategy for power system transient instability prevention,” International Journal of Systems Science, vol. 45, no. 2, pp. 176–186, 2014.
 [6] I. Genc, R. Diao, V. Vittal, S. Kolluri, and S. Mandal, “Decision treebased preventive and corrective control applications for dynamic security enhancement in power systems,” IEEE Transactions on Power Systems, vol. 25, no. 3, pp. 1611–1619, 2010.
 [7] D. C. L. Costa, M. V. A. Nunes, J. P. A. Vieira, and U. H. Bezerra, “Decision treebased security dispatch application in integrated electric power and naturalgas networks,” Electric Power Systems Research, vol. 141, pp. 442–449, 2016.
 [8] F. Thams, L. HalilbaÅ¡ic, P. Pinson, S. Chatzivasileiadis, and R. Eriksson, “Datadriven securityconstrained opf,” in 10th Bulk Power Systems Dynamics and Control Symposium, 2017, Conference Proceedings.
 [9] C. Liu, K. Sun, Z. H. Rather, Z. Chen, C. L. Bak, P. ThÃ¸gersen, and P. Lund, “A systematic approach for dynamic security assessment and the corresponding preventive control scheme based on decision trees,” IEEE Transactions on Power Systems, vol. 29, no. 2, pp. 717–730, 2014.
 [10] R. Raman and I. E. Grossmann, “Modelling and computational techniques for logic based integer programming,” Computers & Chemical Engineering, vol. 18, no. 7, pp. 563–578, 1994.
 [11] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R. Springer Publishing Company, Incorporated, 2014.
 [12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classification and regression trees,” Wadsworth & Brooks Monterey, CA, 1984.
 [13] E. Balas, “Disjunctive programming and a hierarchy of relaxations for discrete optimization problems,” SIAM Journal on Algebraic Discrete Methods, vol. 6, no. 3, pp. 466–486, 1985.
 [14] A. Pai, Energy function analysis for power system stability. Springer Science & Business Media, 2012.
 [15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg, “Scikitlearn: Machine learning in python,” Journal of Machine Learning Research, vol. 12, no. Oct, pp. 2825–2830, 2011.
 [16] W. E. Hart, C. D. Laird, J.P. Watson, D. L. Woodruff, G. A. Hackebeil, B. L. Nicholson, and J. D. Siirola, Pyomooptimization modeling in python. Springer Science & Business Media, 2017, vol. 67.
 [17] I. Gurobi Optimization, “Gurobi optimizer reference manual,” 2016.