Learning to Solve LargeScale SecurityConstrained Unit Commitment Problems
Abstract
SecurityConstrained Unit Commitment (SCUC) is a fundamental problem in power systems and electricity markets. In practical settings, SCUC is repeatedly solved via MixedInteger Linear Programming, sometimes multiple times per day, with only minor changes in input data. In this work, we propose a number of machine learning (ML) techniques to effectively extract information from previously solved instances in order to significantly improve the computational performance of MIP solvers when solving similar instances in the future. Based on statistical data, we predict redundant constraints in the formulation, good initial feasible solutions and affine subspaces where the optimal solution is likely to lie, leading to significant reduction in problem size. Computational results on a diverse set of realistic and largescale instances show that, using the proposed techniques, SCUC can be solved on average 4.3x faster with optimality guarantees, and 10.2x faster without optimality guarantees, but with no observed reduction in solution quality. Outofdistribution experiments provides evidence that the method is somewhat robust against dataset shift.
Keywords:
SecurityConstrained Unit Commitment MixedInteger Linear Programming Machine Learning1 Introduction
SecurityConstrained Unit Commitment (SCUC) is one of the most fundamental optimization problems in power systems and electricity markets, being solved a number of times daily by major reliability coordinators, independent system operators (ISOs), and regional transmission organizations (RTOs) to schedule 3.8 trillion kWh of energy production and clear a $400 billion electricity market annually in the United States (EiaWebsite). The problem asks for the most costefficient schedule for power generation units under a large number of physical, operational and economic constraints. Among other requirements, a feasible generation schedule must not only guarantee that enough power is being produced during each hour to satisfy the demand, but also that no transmission lines are overloaded during the delivery of the electric power. Furthermore, the schedule must be robust against small variations in demands and against the unexpected loss of a small number of network elements, such as generation units and transmission lines.
The computational performance of SCUC is an extremely important practical issue, given the very short clearing windows enforced in most electricity markets. For instance, in Midcontinent Independent System Operator (or MISO, one of the largest electricity markets in North America), systems operators only have 3 or 4 hours, after receiving all bids and offers, to produce a lowcost and fair generation schedule. During this short time window, SCUC must be solved multiple times, under a number of different scenarios. Using stateofart software, realistic largescale instances can be usually solved to 0.1% optimality within approximately 20 minutes of running time (ChenCastoWangWangWangWan2016). Improvements in computational performance would allow markets to implement numerous enhancements that could bring significant economic benefits and improved market efficiency, such as more accurate modelling of enhanced combined cycle units, higher resolution for production cost curves, subhourly dispatch, longer planning horizons, among others. Alternatively, a simple reduction of the optimality gap to 0.05% without sacrificing computational running times could lead to significant cost savings, considering the $400 billion market size in U.S. alone.
In the past, SCUC has been solved using various optimization techniques including priority lists (BurnsGibson1975), dynamic programming (Lowery1966) and Lagrangian relaxation (MerlinSandrin1983). Nowadays, SCUC is most commonly formulated as a MixedInteger Linear Programming (MIP). The introduction of MIP solvers allowed system operators to easily model more realistic constraints that had been traditionally hard to enforce with previous methods, and have benefited from the constantly advancing capabilities of modern solvers. Since the first MIP formulations, proposed in the 1960s (Garver1962), a number of alternative formulations and polyhedral results have been described (AtakanLulliSen2018; OstrowskiAnjosVannelli2012; RajanTakriti2005; GentileMoralesEspanaRamos2017; DamciKurtKucukyavuzRajanAtamturk2016; espana2015tight; lee2004min; pan2016polyhedral; Knueven2018). There has also been active research on more practical techniques for improving the computational performance of the MIP approach, such as fast identification of redundant constraints (ZhaiGuanChengWu2010; ArdakaniBouffard2015; XavierFengWangThimmapuram2018), decomposition methods (FeizollahiCostleyAhmedGrijalva2015; Kim18TemporalDecomp), among others.
One aspect of SCUC that has been overlooked in previous research is that, in most practical settings, this problem is repeatedly solved with only minor changes in input data, often multiple times per day. Most of the system characteristics, such as the parameters describing the generation units or the topology of the transmission network, are kept almost entirely unchanged from solve to solve. While modern MIP solvers have the ability to reuse previous solutions as warm starts, it is well known that, for SCUC, providing the previousday optimal solution brings only minor performance improvements (ChenCastoWangWangWangWan2016). In this work, we propose the use of machine learning to effectively extract information from previously solved instances, and show how to leverage this information to significantly accelerate the solution of similar instances in the future.
Machine learning (ML) is a discipline that studies how algorithms can automatically learn from large amounts of data and subsequently use the acquired knowledge to make decisions quickly and accurately. In the 1990s and early 2000s, researchers experimented using artificial neural networks (ANN) to solve simpler variants of SCUC on smallscale instances (SasakiWatanabeKubokawaYorinoYokoyama1992; WangShahidehpour1993; WalshOMalley1997; LiangKang2000). Even when considering very simplified versions of the problem, obtaining sufficiently highquality solutions to SCUC via ANN proved very challenging, and the approach failed to replace existing methods. Similar explorations with evolutionary algorithms proved equally challenging (JusteKitaTanakaHasegawa1999).
In more recent years, there has been a growing interest, within the broader mathematical optimization community, in applying ML techniques to enhance the performance of current MIP solvers. ML has been used, for example, to automatically construct branching strategies (AlvarezLouveauxWehenkel2017; KhalilLeBodicSongNemhauserDilkina2016; LodiGiulia2017), to better parallelize branchandbound methods (AlvarezWehenkelLouveaux2014), to decide when to run primal heuristics (KhalilDilkinaNemhauserAhmedShao2017), to predict resolution outcome (FischettiLodiZarpellon2019), to decide solver parameters (BonamiLodiZarpellon2018) and to automatically construct decompositions (BassoCeselliTettamanzi2017). More broadly, there has been a renewed interest in using ML to tackle hard combinatorial problems. To give a few examples, deep and reinforcement learning have been used to produce heuristic solutions for optimization problems over graphs (KhalilDaiZhangDilkina2017; kool2018attention; MichelPierreAlexandreYossiriLouisMartin2018) and for stochastic integer programs (nair2018learning). We refer to (bengio2018machine) for a more complete survey. From a more abstract perspective, there has been research on integrating the fields of predictive, prescriptive and descriptive analytics (lombardi2017empirical; bertsimas2019predictive). More directly related to our work, statistical learning has also been applied to the DC Optimal Power Flow Problem (misra2018learning).
In this work, instead of completely replacing existing MIP methods by ML models, as done in previous research with artificial neural networks and genetic algorithms, we propose the usage of ML techniques to significantly enhance the warmstart capabilities of modern MIP solvers when solving SCUC. More specifically, we develop three ML models that can automatically identify different types of patterns in previously solved instances and subsequentially use these patterns to improve MIP performance. The first model is designed to predict, based on statistical data, which constraints are necessary in the formulation and which constraints can be safely omitted. The second model constructs, based on a large number of previously obtained optimal solutions, a (partial) solution which is likely to work well as warm start. The third model identifies, with very high confidence, a smallerdimensional affine subspace where the optimal solution is likely to lie, leading to the elimination of a large number of decision variables and significantly reducing the complexity of the problem.
We start, in Section 2, by presenting the MIP formulation of SCUC and some brief introduction to the ML concepts we use. In Section 3, we formalize the setting in which the learning takes place and we describe the proposed learning strategies. In Section 4, we evaluate the practical impact of the proposed ML approach by performing extensive computational experiments on a diverse set of largescale and realistic SCUC instances, containing up to 6515 buses and 1388 units, under realistic uncertainty scenarios. Using the proposed technique, we show that SCUC can be solved on average 4.3x faster than conventional methods with optimality guarantees and 10.2x faster without optimality guarantees, but no observed reduction in solution quality. We also conduct some experiments where the training and test samples are drawn from similar, but not exactly the same distribution, to provide some evidence that the method is somewhat robust against moderate dataset shifts. Finally, in Section LABEL:sec:limitations, we discuss some limitations of the method, as well as directions for future work. Although the techniques are presented in the context of one particular application, they can be easily adapted to other challenging optimization problems.
2 Background
2.1 MixedInteger Linear Programming Formulation of SCUC
Unit Commitment (UC) refers to a broad class of optimization problems dealing with the scheduling of power generating units (CohenSherkat87). For each hour in the planning horizon, the problem is to decide which units should be operational and how much power should they produce (OstrowskiAnjosVannelli2012). Each unit typically has minimum and maximum power outputs, a quadratic production cost curve, and fixed startup and shutdown costs. Other constraints typically include ramping rates, which restrict the maximum variation in power production from hour to hour, and minimumup and minimumdown constraints, which prevents units from starting and shutting down too frequently. The SecurityConstrained Unit Commitment (SCUC) problem is a subclass of UC that, in addition to all the constraints previously mentioned, also guarantees the deliverability of power, by enforcing that the power flow in each transmission line does not exceed its safe operational limits (Shaw95). To guarantee sufficient security margin in case of component failure, the problem considers not only transmission constraints under normal operating conditions, but also when there is one transmission line failure in the system (socalled N1 transmission contingency) (BatutRenaud92). In this way, even if a transmission line unexpectedly fails and the power flow in the network changes, there will be no violations of the transmission line thermal limits, which is a reliability requirement enforced by North American Electric Reliability Corporation (NERCwebsite).
A number of MIP formulations and strong valid inequalities for SCUC have been proposed. In this work, we use the formulation proposed in (MoralesEspanaLatorreRamos2013), since it presents good computational performance even when transmission and N1 security constraints are present. The full MIP formulation is shown in Appendix A. Here, we present a simplified summary, containing key decision variables and constraints that are most influential to the computational performance of the problem. We note, however, that the methods we present later sections make very few assumptions about the formulation, and can still be used with additional decision variables and constraints.
Consider a power system composed by a set of buses, a set of generators and a set of transmission lines. Let be the set of hours within the planning horizon. For each generator and hour , let be a binary decision variable indicating if the generator is operational, and be a continuous decision variable indicating the amount of power being produced. The problem can then be formulated as
minimize  (1)  
subject to  (2)  
(3)  
(4)  
(5)  
(6)  
(7) 
In the objective function, is a piecewiselinear function which includes the startup, shutdown and production costs for generator . The notation is a shorthand for the vector , where . Constraint (2) enforces a number of constraints which are local for each generator, including production limits, minimumup and minimumdown running times, ramping rates, among others. Constraint (3) enforces that the total amount of power produced equals the sum of the demands at each bus. Constraints (4) enforces the deliverability of power under normal system conditions. The set denotes the set of generators attached to bus . We recall that power flows are governed by physical laws, i.e., Ohm’s laws and Kirchhoff’s Laws. Using the DC linearization of these laws, it is possible to express the thermal limits for each transmission line as (4), where are real numbers known as Injection Shift Factors (ISFs). Similarly, constraints (5) enforce the deliverability of power under the scenario that line has suffered an outage. Note that the bounds and the ISFs may be different under each outage scenario.
To improve the strength of the linear relaxation, the full formulation contains additional auxiliary binary decision variables. However, once is determined, all the other binary variables are immediately implied. Furthermore, once all the variables are fixed, the problem reduces to a linear programming problem (known as Economic Dispatch) which can be solved efficiently.
Constraints (4) and (5) have a significant impact on the computational performance of SCUC. The total number of such constraints is quadratic on the number of transmission lines. Since large power systems can have more than 10,000 lines, the total number of constraints can easily exceed hundreds of millions. To make matters worse, these constraints are typically very dense, which makes adding them all to the formulation impractical for largescale instances, not only to due to degraded MIP performance, but even due to memory requirements. Fortunately, it has been observed that enforcing only a very small subset of these constraints is already sufficient to guarantee that all the remaining ones are automatically satisfied (BouffardGalianaArroyo2005). Identifying this critical subset of constraints can be very challenging, and usually involves either solving a relaxation of SCUC multiple times (TejadaSanchezRamos2018; XavierFengWangThimmapuram2018) or solving auxiliary optimization problems (ArdakaniBouffard2015; ZhaiGuanChengWu2010). In practice, system operators still rely on experience and external processes to preselect a subset of constraints to include in the optimization problem.
2.2 Machine Learning Classifiers
In this work, we employ two classical supervised learning methods for binary classification: Nearest Neighbors (NN) and Support Vector Machines (SVM). Given a training dataset containing labeled examples , each method constructs a function which can be used to label new samples . In the following, we present a brief description of each method. For a more complete description, we refer to Alpaydin2014.
NN works by identifying the most similar examples in the training dataset and conducting a simple majority vote. More precisely, given a point , the method first sorts the labeled examples by increasing distance , according to some norm, to obtain a sequence . Then, it assigns label to if , and label otherwise. More efficient implementations, which avoid sorting the training dataset for each evaluation, have also been described. In Subsections 3.1 and 3.2, we use variations of this method for prediction of redundant constraints and warm starts.
Support Vector Machines construct a hyperplane that best separates training examples into two convex regions, according to their labels, and use this hyperplane for further classification. More precisely, the method first finds a hyperplane by solving the quadratic optimization problem
subject to  
where is a configurable penalty term. Then, it assigns label to if and otherwise. This description is commonly referred to SVMs with linear kernels. Nonlinear extensions have also been described. In Subsection 3.3, we use SVMs to predict affine subspaces.
2.3 Model Evaluation and CrossValidation
To evaluate the performance of a prediction method on a specific training dataset, we employ fold cross crossvalidation. Given a training dataset , first we partition into disjoint subsets of equal size (we assume that is divisible by ) and train predictors , using the set as training dataset for predictor . The performance of the method on is then defined as
where is a performance measure. The idea, for each sample , is to compare the actual label against the predicted label , produced by a predictor that did not have access to the pair during training. In this work, we use two classical performance measures: precision and recall, given, respectively, by
3 Setting and Learning Strategies
In this section, we formalize the setting in which the learning takes place, in addition to our proposed learning strategies. The theoretical setting below was designed to match the realistic conditions that energy market operators face on a daily basis.
Assume that every instance of SCUC is completely described by a vector of realvalued parameters . These parameters include only the data that is unknown to the market operators the day before the market clearing process takes places. For example, they may include the peak system demand and the production costs for each generator. In this work, we assume that network topology and the total number of generators is known ahead of time, so there is no need to encode them via parameters. We may, therefore, assume that , where is fixed. We are given a set of training instances , which are similar to the instance we expect to solve during actual market clearing. These training instances can either come directly from historical data, if enough similar instances have been solved in the past, or they can be artificially generated, by sampling the probability distribution of . Knowledge about this distribution is not an unreasonable assumption, since market operators have been solving SCUC daily for years and have large amounts of accumulated data which can be readily analyzed.
We also assume that we have at our disposal a customized MIP solver which can receive, in addition to the instance to be solved, a vector of hints. These hints may affect the performance of the MIP solver and will, hopefully, improve its running time. The hints may also affect the quality or optimality of the solutions produced by the MIP solver. Examples of hints may include a list of warm starts, a list variables that should be fixed, or a list of initial constraints to enforce.
The day before market clearing, a training phase takes place. During this phase, our task is to construct a prediction function which maps each possible parameter into a vector of hints. In order to build this function, we solve the training instances , capturing any data we may consider useful. During actual market clearing, when we are under time pressure to solve new instances of the problem, the test phase begins. In this phase, one particular vector of parameters is given, the instance is constructed, and our goal is to solve very quickly. To do this, the vector of hints is computed and the pair is given to the MIP solver. The total running time is measured, including the time taken to construct the vector of hints from the parameters, as well as the time taken by the MIP solver to solve the instance.
In the following, we propose three different ML models, which target different challenges of solving SCUC. The first model is focused on predicting violated transmission and N1 security constraints. The second model focuses on producing initial feasible solutions, which can be used as warm starts. The third model aims at predicting an affine subspace where the solution is likely to lie.
3.1 Learning Violated Transmission Constraints
As explained in Subection 2.1, one of the most complicating factors when solving SCUC is handling the large number of transmission and security constraints that need to be enforced. Our first model is designed to predict which transmission constraints should be initially added to the relaxation and which constraints can be safely omitted.
As baseline, we use the iterative contingency screening algorithm presented in (XavierFengWangThimmapuram2018), which was independently implemented and evaluated at MISO, where it presented better performance than internal methods. The method works by initially solving a relaxed version of SCUC where no transmission or N1 security constraints are enforced. At each iteration, all violated constraints are identified, but only a small and carefully selected subset is added to the relaxation. The method stops when no further violations are detected, in which case the original problem is solved.
In the following, we modify this baseline method to work with hints from a machine learning predictor. Let be the set of transmission lines. We recall that each transmission constraint takes the form
We assume that the customized MIP solver accepts a vector of hints
indicating whether the thermal limits of transmission line , under contingency scenario , at time , is likely to be binding in the optimal solution. Given these hints, the solver proceeds as described in Algorithm 1. In the first iteration, only the transmission constraints specified by the hints are enforced. For the remaining iterations, the solver follows exactly the same strategy as the baseline method, and keeps adding small subsets of violated constraints until no further violations are found. If the predictor is successful and correctly predicts a superset of the binding transmission constraints, then only one iteration is necessary for the algorithm to converge. If the predictor fails to identify some necessary constraints, the algorithm may require additional iterations, but still returns the correct solution. The predictor should still strive to keep a low false positive rate, since the inclusion of too many redundant constraints during the first iteration of the procedure may significantly slow down the MIP optimization time.
For this prediction task, we propose a simple learning strategy based on Nearest Neighbors. Let be the training parameters. During training, the instances are solved using Algorithm 1, where the initial hints are set to RELAX for all constraints. During the solution of each training instance , we record the set of constraints that was added to the relaxation by Algorithm 1. During test, given a vector of parameters , the predictor finds a list of the vectors which are the closest to and sets if appears in at least of the sets , where is a hyperparameter. When , this strategy corresponds to the traditional NN algorithm with simple majority voting. Note that the cost of incorrectly identifying a constraint as necessary (false positive) is typically much smaller than the cost of incorrectly identifying a constraint as redundant (false negative): in the former case, the dimension of the problem simply increases a little, while in the latter, an additional iteration may be necessary. For this reason, in our experiments we try to avoid false negatives (at the cost of more false positives) by setting . That is, a constraint is added if is necessary for at least 10% of the nearest training examples.
Note that this NN description generalizes a number of other very simple strategies. For example, when , the predictor looks for the most similar instance in the training data. When , the strategy is equivalent to counting how many times each transmission constraint was necessary during the solution of the entire training dataset. When and , the predictor simply memorizes all constraints that were ever necessary during training. In Subsection 4.3, we run experiments for several of these variations. As we discuss later, these simple NN strategies already performed very well in our experiments, which is the reason we do not recommend more elaborated methods.
We note that this NN transmission predictor is also very suitable for online learning, and can be used even when only a very small number of training samples is available. During the solution of the training instances, for example, after a small number of samples has been solved, subsequent samples can already use hints from this predictor to accelerate the solution process.
3.2 Learning Warm Starts
Another considerably challenging aspect of solving largescale SCUC instances is obtaining highquality initial feasible solutions to the problem. Over the last decades, researchers have proposed a number of strong mixedinteger linear programming formulations for SCUC, leading to very strong dual bounds being obtained fairly quickly during the optimization process. For challenging instances, however, we have observed that a significant portion of the running time is usually spent in finding a feasible solution that has an objective value close to this dual bound. These solutions are currently typically found through generic primal heuristics included in the MIP solver, such as feasibility pump (FischettiGloverLodi2005) or large neighborhood search (Shaw1998). Modern MIP solvers also allow users to specify userconstructed solutions, which can be used by the solver to accelerate various parts of the branchandcut process. Very importantly, the solutions provided by the user do not need to be complete, or even feasible. Modern solvers can solve restricted subproblems to find values of missing variables, or even repair infeasible solutions using various heuristics. The strategy we present in this subsection takes advantage of these capabilities, instead of trying to replace them. Our aim is to produce, based on a large number of previous optimal solution, a (partial) solution which works well as warm start.
For this prediction task we propose, once again, a strategy based on Nearest Neighbors. Let be the training parameters and let be their optimal solutions. In principle, the entire list of solutions could be provided as warm starts to the solver; however, processing each solution requires a nonnegligible amount of time, as we show in Subsection LABEL:subsec:eval_ws. We propose instead the usage of NN to construct a single (partial and possibly infeasible) solution, which should be repaired by the solver. The idea is to set the values only for the variables which we can predict with high confidence, and to let the MIP solver determine the values for all the remaining variables. To maximize the likelihood of our warm starts being useful, we focus on predicting values for the binary variables ; all continuous variables are determined by the solver. Given a test parameter , first we find the solutions corresponding to the nearest training instances . Then, for each variable , we verify the average of the values . If , where is a hyperparameter, then the predictor sets to one in the warm start solution. If , then the value is set to zero. In any other case, the value is left undefined.
When , this strategy corresponds to the traditional NN method, where all variables are defined, even if the neighboring solutions significantly disagree. As increases, a higher threshold for a consensus is required, and variables for which consensus is not reached are left undefined. More complete warm starts can be processed significantly faster by the solver, but have higher likelihood of being suboptimal or irreparably infeasible. In Subsection LABEL:subsec:eval_ws, we evaluate several choices of to quantify this tradeoff. In our computational experiments, provided the best results.
Like the transmission predictor presented previously, this warmstart predictor is also very suitable for online learning, and can be used with a very small number of training samples.
3.3 Learning Affine Subspaces
Through their experience, market operators have naturally learned a number of patterns in the optimal solutions of their own power systems. For example, they know that certain power generating units (known as base units) will almost certainly be committed throughout the day, while other units (known as peak units) typically come online only during peak demand. These and many other intuitive patterns, however, are completely lost when SCUC is translated into a mathematical problem. Given the fixed characteristics of a particular power system and a realistic parameter distribution, new constraints could probably be safely added to the MIP formulation of SCUC to restrict its solution space without affecting its set of optimal solutions. In this subsection, we develop a ML model for finding such constraints automatically, based on statistical data.
More precisely, let be a vector of parameters. Our goal is to develop a model which predicts a list of hyperplanes such that, with very high likelihood, the optimal solution of satisfies , for . In this work, we restrict ourselves to hyperplanes that can be written as
(8)  
(9)  
(10) 
where and . In other words, the predictor tries to determine whether each variable should be fixed to zero, to one, or to the next variable . Furthermore, to prevent conflicting assignments the predictor makes at most one of these three recommendations for each variable .
Let be the set of all hyperplanes considered for inclusion. During the training phase, our goal is to construct, for each hyperplane , a prediction function , which receives a vector of parameters and returns the label ADD to indicate that the equality constraint is very likely to be satisfied at the optimal solution, and SKIP otherwise. Given these hints and a vector of parameteres , our customized MIP solver simply adds to the relaxation all the equality constraints such that . These constraints are added at the very beginning of the optimization process and kept until the final solution is obtained; that is, they are never removed. While, in principle, adding these constraints could lead to suboptimal solutions, in our computational experiments we have observed that, by using a reasonably large number of training samples and by carefully constructing highprecision predictors, as described in detail below, this strategy does not lead to any noticeable degradation in solution quality, while providing significant speedups.
Next we describe how can such highprecision predictors can be constructed in practice. Let . Furthermore, let be the training instances and let be their respective optimal solutions. For every , let be the label indicating whether solution lies in the hyperplane or not. That is, if , and otherwise. We also denote by the average value of the variables .
The proposed algorithm for constructing the function is the following. First, if is very close to one, then the predictor always suggests adding this hyperplane to the formulation. More precisely, if , where is a fixed hyperparameter, then the prediction function returns ADD for every input. Next, the the algorithm verifies whether the labels are sufficiently balanced to reliably perform supervised training. More precisely, if , where and are hyperparameters, then the prediction function always returns SKIP. Finally, if , then the algorithm constructs a binary classifier and evaluates its precision and recall on the training set using fold cross validation. If the binary classifier proves to have sufficiently high precision and recall, the prediction function returns its predictions during the test phase. Otherwise, the algorithm discards the binary classifier, and the function returns SKIP for every input. The minimum acceptable recall is given by a fixed hyperparameter . The minimum acceptable precision is computed by the expression
where is a hyperparameter. Intuitively, the algorithm only accepts the trained binary classifier if it significantly outperforms a dummy classifier which always returns the same label for every input. When , the trained binary classifier is acceptable as long as it has same precision as the dummy classifier. When , the classifier is only acceptable if it has perfect precision.
Table 1 shows the precise hyperparameters used in our experiments, for each type of hyperplane. Due to the high dimensionality of the parameters, we do not train the binary classifiers directly the original vector of parameters . Instead, we propose using as training features only the following subset of parameters: the peak system load, the hourly system loads, the average production cost of generator and the average production costs of the remaining generators. We propose the usage of support vector machines (with linear kernels), since, in our experiments, these models performed better than logistic regression and random decision forests models, while remaining computationally friendly. Neural networks were not considered given the small number of training samples available.
Hyperplane  

1.000  0.250  0.750  0.90  0.90  
1.000  0.250  0.750  0.75  0.75  
0.975  0.025  0.975  0.50  0.75 
Unlike the previous two predictors, the affine subspace predictor described in this subsection requires a substantial number of samples in order to give reliable predictions, making it less suitable for online learning.
4 Computational Experiments
In this section we evaluate the practical impact of the ML models introduced in Section 3 by performing extensive computational experiments on a diverse set of largescale and realistic SCUC instances. In Subsections 4.1 and 4.2, we describe the our implementation of the proposed methods, the computational environment, and the instances used for benchmark. In Subsections 4.3, LABEL:subsec:eval_ws and LABEL:subsec:eval_aff we evaluate, respectively, the performance of the three predictors proposed in Section 3, in addition to several variations. In Subsection LABEL:subsec:ood, we evaluate the outofdistribution performance of these predictors, to measure their robustness against moderate dataset shift.
4.1 Computational Environment and Instances
The three proposed machinelearning models described in Section 3 were implemented in Julia 1.2 (julialang) and scikitlearn (scikitlearn). Package ScikitLearn.jl (scikitlearnjl) was used to access scikitlearn predictors from Julia, while package DataFrames.jl (dataframesjl) was used for tabular data manipulation. IBM ILOG CPLEX 12.8.0 (cplex) was used as the MIP solver. The code responsible for loading the instances, constructing the MIP, querying the machine learning models and translating the given hints into instructions for CPLEX was also written in Julia, using JuMP (JuMP) as modeling language. Solver features not available through JuMP were accessed through CPLEX’s C API. Training instances were generated and solved at the Bebop cluster at Argonne National Laboratory (Intel Xeon E52695v4 2.10GHz, 18 cores, 36 threads, 128GB DDR4). During the training phase, multiple training instances were solved in parallel at a time, on multiple nodes. To accurately measure the running times during the test phase, a dedicated node at the same cluster was used. CPLEX was configured to use a relative MIP gap tolerance of 0.1% during test and 0.01% during training.
Instance  Buses  Units  Lines 

case1888rte  1,888  297  2,531 
case1951rte  1,951  391  2,596 
case2848rte  2,848  547  3,776 
case3012wp  3,012  502  3,572 
case3375wp  3,374  596  4,161 
case6468rte  6,468  1,295  9,000 
case6470rte  6,470  1,330  9,005 
case6495rte  6,495  1,372  9,019 
case6515rte  6,515  1,388  9,037 
A total of nine realistic instances from MATPOWER (matpower) were selected to evaluate the effectiveness of the method. These instances correspond to realistic, largescale European test systems. Table 2 presents their main characteristics, including the number of buses, units and transmission lines. Some generator data necessary for SCUC was missing in these instances, and was artificially generated based on publicly available data from PjmWebsite and MisoWebsite.
4.2 Training and Test Samples
During training, 300 variations for each of the nine original instances were generated and solved. We considered four sources of uncertainty: i) uncertain production and startup costs; ii) uncertain geographical load distribution; iii) uncertain peak system load; and iv) uncertain temporal load profile. The precise randomization scheme is described below. The specific parameters were chosen based on our analysis of publicly available bid and hourly demand data from PjmWebsite, corresponding to the month of January, 2017.

Production and startup costs. In the original instances, the production cost for each generator is modelled as a convex piecewiselinear function described by the parameters , the cost of producing the minimum amount of power, and , the costs of producing each additional MW of power within the piecewise interval . In addition, each generator has a startup cost . In our data analysis, we observed that the daily changes in bid prices rarely exceeded . Therefore, random numbers were independently drawn from the uniform distribution in the interval , for each generator , and the costs of were linearly scaled by this factor. That is, the costs for generator were set to .

Geographical load distribution. In the original instances, each bus is responsible for a certain percentage of the total system load. To generate variations of these parameters, random numbers were independently drawn from the uniform distribution in the interval . The percentages were then linearly scaled by the factors and later normalized. More precisely, the demand for each bus was set to .

Peak system load and temporal load profile. For simplicity, assume . Let denote the systemwide load during each hour of operation, and let for , be the hourly variation in system load. In order to generate realistic distribution for these parameters, we analyzed hourly demand data from PjmWebsite. More specifically, we computed the mean and variance of each parameter , for . To generate instance variations, random numbers were then independently drawn from the Gaussian distribution . Note that the parameters only specify the variation in system load from hour to hour, and are not sufficient to determine . Therefore, in addition to these parameters, a random number , corresponding to the peak system load , was also generated. In the original instances, the peak system load is always 60% of the total capacity. Based on our data analysis, we observed that the actual peak load rarely deviates more than from the dayahead forecast. Therefore, to generate instance variations, was sampled from the uniform distribution in the interval , where is the total capacity. Note that the and parameters are now sufficient to construct . Figure 1 shows a sample of some artificially generated load profiles.
In this work, we do not consider changes to the network topology, since this information is assumed to be know by the operators the day before the market clearing takes places. If the network topology is uncertain, an additional parameter can be added for each transmission line , indicating whether the line is available or not.
During the test phase, 50 additional variations of each original instance were generated. In Subsection 4.3, LABEL:subsec:eval_ws and LABEL:subsec:eval_aff, the test samples were generated using the same randomization scheme outline above, but with a different random seed. In Subsection LABEL:subsec:ood, as described in more detailed in that section, the test samples were generated based on a similar, but different distribution.
4.3 Evaluation of Transmission Predictor
We start by evaluating the performance of different strategies for predicting necessary transmission and N1 security constraints. For this prediction task, we compared seven different strategies. Strategy zero corresponds to the case where no machine learning is used. In strategy tr:nearest, we add to the initial relaxation all constraints that were necessary during the solution of the most similar training variation. In tr:all, we add all constraints that were ever necessary during training. Strategies tr:knn: correspond to the method outlined in Subsection 3.1, where is the hyperparameter describing the number of neighbors. We recall that a constraint is initially added to the relaxation it was necessary for at least 10% of the nearest neighbors. Since there are only 300 variations for each instance, strategy tr:knn:300 is equivalent to simply counting, over all variations, how many times each constraint was necessary. Finally, strategy tr:perf shows the performance of the theoretically perfect predictor, which can predict the exact set of constraints found by the iterative method, with perfect accuracy, under zero running time. To implement this predictor, we gave it access to the optimal solution produced by the zero strategy. For each strategy, Table 4.3 shows the average wallclock running time (in seconds), the average number of constraints added to the relaxation per time period, as well as the average number of iterations required by Algorithm 1 to converge. Table LABEL:table:trSpeed shows a more detailed breakdown of the average running times per instance.
Method  Iterations  Violations  Time (s)  Speedup 

\csvreader[ head to column names, late after line=  
, late after last line=  
, filter expr= test 