An Active Set Algorithm for Robust Combinatorial Optimization Based on Separation Oracles
C. Buchheim and M. De Santis
Fakultät für Mathematik
TU Dortmund
Vogelpothsweg 87  44227 Dortmund  Germany
Dipartimento di Ingegneria Informatica, Automatica e Gestionale
Sapienza, Università di Roma
Via Ariosto, 25  00185 Roma Italy
email (Buchheim): christoph.buchheim@math.tudortmund.de
email (De Santis): marianna.desantis@uniroma1.it
Abstract
We address combinatorial optimization problems with uncertain coefficients varying over ellipsoidal uncertainty sets. The robust counterpart of such a problem can be rewritten as a secondoder cone program (SOCP) with integrality constraints. We propose a branchandbound algorithm where dual bounds are computed by means of an active set algorithm. The latter is applied to the Lagrangian dual of the continuous relaxation, where the feasible set of the combinatorial problem is supposed to be given by a separation oracle. The method benefits from the closed form solution of the active set subproblems and from a smart update of pseudoinverse matrices. We present numerical experiments on randomly generated instances and on instances from different combinatorial problems, including the shortest path and the traveling salesman problem, showing that our new algorithm consistently outperforms the stateofthe art mixedinteger SOCP solver of Gurobi.
Keywords. Robust Optimization, Active Set Methods, SOCP
1 Introduction
We address combinatorial optimization problems given in the general form
(CP) 
where is a compact convex set, say with , and the objective function vector is assumed to be uncertain. This setting appears in many applications where the feasible set is certain, but the objective function coefficients may have to be estimated or result from imprecise measurements. As an example, when searching for a shortest path in a road network, the topology of the network is usually considered fixed, but the travel times may vary depending on the traffic conditions.
A classical way of dealing with uncertain optimization problems is the strictly robust optimization approach, introduced in [3] for linear programming and in [2] for general convex programming; we also refer the reader to the book by BenTal and Nemirovski [4]. In strictly robust optimization, we look for a worstcase solution, where the uncertain parameter is assumed to belong to a bounded set , called the uncertainty set, and the goal of the robust counterpart is to compute the solution of the following minmax problem:
(RP) 
A natural choice in this approach are ellipsoidal uncertainty sets, defined as
where is a symmetric positive definite matrix and is the center of the ellipsoid. Assuming that the uncertain vector in ((CP)), considered as a random variable, follows a normal distribution, we can interpret the ellipsoid as a confidence set of ; in this case, is the inverse covariance matrix of and is its expected value. Unfortunately, for ellipsoidal uncertainty sets, the robust counterpart ((RP)) is usually much harder to solve than the original problem ((CP)): it is known that Problem ((RP)) is NPhard in this case for the shortest path problem, for the minimum spanning tree problem, and for the assignment problem [10] as well as for the unconstrained binary optimization problem [7].
Even in the case of a diagonal matrix , i.e., when ignoring correlations and only taking variances into account, no polynomial time algorithm for the robust shortest path problem is known. There exists however an FPTAS for the diagonal case whenever the underlying problem ((CP)) admits an FPTAS [15], and polynomial time algorithms for the minimum spanning tree problem and the unconstrained binary problem have been devised for the diagonal case.
For general ellipsoids , most exact solution approaches for ((RP)) are based on solving SOCPs. In fact, it is easy to see that the optimal solution of the inner maximization problem
for fixed is given by
Therefore, Problem ((RP)) is equivalent to the integer nonlinear problem
(P) 
where is the symmetric and positive definite inverse of and we replace by for ease of notation. Note that, when addressing so called valueatrisk models
we arrive at essentially the same formulation ((P)), assuming normally distributed coefficients again; see, e.g., [15].
In the following, we assume that the convex set is given by a separation algorithm, i.e., an algorithm that decides whether a given point belongs to or not, and, in the negative case, provides an inequality valid for but violated by . Even in cases where the underlying problem ((CP)) is tractable, the polytope may have an exponential number of facets, so that a full linear description cannot be used efficiently. This is true, e.g., for the standard formulation of the spanning tree problem. However, we do not require that a complete linear description of be known; it suffices to have an integer linear description, i.e., we allow . In particular, our approach can also be applied when the underlying problem is NPhard, e.g., when ((CP)) models the traveling salesman problem.
As soon as is given explicitly by linear constraints with and , the continuous relaxation of Problem ((P)) reduces to an SOCP of the form
(R2) 
Such SOCPs can be solved efficiently using interior point algorithms [14] and popular solvers for SOCPs such as SeDuMi [17] or MOSEK [1] are based on interior point methods. However, in our branchandbound algorithm, we need to address a sequence of related SOCPs. Compared with interior point methods, active set methods have the advantage to allow warmstarting rules.
For this reason, in order to solve the SOCP relaxations of Problem ((RP)), we devised the active set algorithm EllAS. It is applied to the Lagrangian dual of ((R2)) and exploits the fact that the active set subproblems can be solved by closed form expressions. For this, the main ingredient is the pseudoinverse of . Since the matrix is updated in each iteration of the active set method, an incremental update of the pseudoinverse is crucial for the running time of EllAS. Altogether, we can achieve a running time of per iteration. Combined with an intelligent embedding into the branchandbound scheme, we obtain an algorithm that consistently outperforms the MISOCP solver of Gurobi 7.5.1, where the latter is either applied to a full linear description of or, in case a compact linear description does not exist, uses the same separation oracle as EllAS.
The rest of the paper is organized as follows: the Lagrangian dual of ((RP)) is derived in Section 2. The closedform solution of the resulting active set subproblems is developed in Section 3. The active set algorithm EllAS is detailed and analyzed in Sections 4 and 5. In Section 6, we discuss how to embed EllAS into a branchandbound algorithm. Numerical results for random integer instances as well as instances of different combinatorial optimization problems are reported in Section 7. Section 8 concludes.
2 Dual problem
The algorithm we propose for solving Problem ((RP)) uses the Lagrangian dual of relaxations of the form ((R2)). Let be the Lagrangian function associated to ((R2)):
The Lagrangian dual of Problem ((R2)) is then
(1) 
After applying the bijective transformation , the inner minimization problem of (1) becomes
for fixed . It is easy to see that
if and otherwise. Therefore, Problem (1) reduces to
(D) 
Theorem 1.
Proof.
In order to solve Problem ((R2)), we have devised the dual active set algorithm EllAS detailed in Section 4. Along its iterations, EllAS produces dual feasible solutions of Problem ((D)), converging to a KKT point of Problem ((R2)) and therefore producing also a primal optimal solution when terminating.
3 Solving the Active Set Subproblem
At every iteration, the active set algorithm EllAS presented in the subsequent sections fixes certain dual variables to zero while leaving unconstrained the remaining variables. In the primal problem, this corresponds to choosing a set of valid linear constraints for and replacing inequalities by equations. We thus need to solve primaldual pairs of problems of the following type:
s.t.  (PAS)  
s.t.  (DAS)  
where , . For the efficiency of our algorithm, it is crucial that this pair of problems can be solved in closed form. For this, the pseudoinverse of will play an important role. It can be used to compute orthogonal projections onto the kernel and onto the range of as follows: we have
(2) 
and
(3) 
see e.g. [11]. We later explain how to update the pseudoinverse incrementally instead of computing it from scratch in every iteration, which would take time; see Section 5.2.
In the following, we assume that the dual problem ((DAS)) admits a feasible solution; this will be guaranteed in every iteration of our algorithm; see Lemma 1 below.
3.1 Dual Unbounded Case
If , or equivalently, if is not orthogonal to , then the dual problem ((DAS)) is unbounded, and the corresponding primal problem ((PAS)) is infeasible. When this case occurs, EllAS uses an unbounded direction of ((DAS)) to continue. The set of unbounded directions of ((DAS)) is . Consequently, the unbounded direction with steepest ascent can be obtained by projecting the gradient of the objective function to . According to (2), this projection is
3.2 Bounded Case
If , we first consider the special case . As we assume ((DAS)) to be feasible, its optimum value is thus . Therefore, the corresponding primal problem ((PAS)) admits as optimal solution. In the following, we may thus assume . The feasible set of problem ((DAS)) consists of all such that
i.e., such that the image of under belongs to the ball . Consider the orthogonal projection of to the subspace , which by (3) is
If , then the intersection is empty, so that Problem ((DAS)) is infeasible, contradicting our assumption. Hence, we have that this intersection is a ball with center and radius
and is feasible for ((DAS)) if and only if . Since , we have . This allows us to rewrite the objective function of ((DAS)) in terms of only, as
We can thus first compute the optimal solution of
which is unique since , and then solve . We obtain
(4) 
so that we can state the following
Proposition 1.
From , it is possible to compute an optimal solution of the primal problem ((PAS)) as explained in the following result.
Theorem 2.
Proof.
Let be a primaldual optimal pair, which exists by Theorem 1. Since and , it follows that . The gradient equation yields
which is equivalent to
and hence to
for some . Since , we have . By strong duality, we then obtain
Now if , also the right hand side of this equation is nonzero, and we obtain as claimed. Otherwise, it still holds that there exists such that is optimal. In particular, is primal feasible and hence . As , we derive , as . This in particular shows that is uniquely defined by . ∎∎
Note that the proof (and hence the statement) for case (b) in Theorem 2 are formally applicable also in case (a). However, in the much more relevant case (a), we are able to derive a closed formula for in a more direct way.
4 The Dual Active Set Method EllAS
As all active set methods, our algorithm EllAS tries to forecast the set of constraints that are active at the optimal solution of the primaldual pair ((R2)) and ((D)), adapting this forecast iteratively: starting from a subset of primal constraints , where and , one constraint is removed or added per iteration by performing a dual or a primal step; see Algorithm 1. We assume that a corresponding dual feasible solutions is given when starting the algorithm; we explain below how to obtain this initial solution.
At every iteration , in order to decide if performing the primal or the dual step, the dual subproblem is addressed, namely Problem ((D)) where only the subset of active constraints is taken into account. This leads to the following problem:
(DASk) 
The solution of Problem ((DASk)) has been explained in Section 3. Note that formally Problem ((DASk)) is defined in a smaller space with respect to Problem ((D)), but its solutions can also be considered as elements of by setting the remaining variables to zero.
In case the dual step is performed, the solution of Problem ((DASk)) gives an ascent direction along which we move in order to produce a new dual feasible point with better objective function value. We set
where the steplength is chosen to be the largest value for which nonnegativity is maintained at all entries. Note that the feasibility with respect to the ellipsoidal constraint in ((D)), i.e.,
is guaranteed from how is computed, using convexity. Therefore, can be derived by considering the negative entries of . In order to maximize the increase of , we ask to be as large as possible subject to maintaining nonnegativity; see Steps 9–10 in Algorithm 2.
The constraint index computed in Step 9 of Algorithm 2 corresponds to the primal constraint that needs to be released from the active set. The new iterate is then obtained from , by dropping the th entry.
Proof.
The primal step is performed in case the solution of Problem ((DASk)) gives us a dual feasible solution. Starting from this dual feasible solution, we compute a corresponding primal solution according to the formula in Theorem 2. If belongs to we are done: we have that is a KKT point of Problem ((R2)) and, by convexity of Problem ((R2)), is its global optimum. Otherwise, we compute a cutting plane violated by that can be considered active and will be then taken into account in defining the dual subproblem ((DASk)) at the next iteration. The new iterate is obtained from by adding an entry to and setting this additional entry to zero.
Theorem 3.
Whenever Algorithm EllAS terminates, the result is correct.
Proof.
If Algorithm EllAS stops at the primal step, the optimality of the resulting primaldual pair follows from the discussion in Section 3. If Algorithm EllAS stops at the dual step, it means that the ascent direction computed is a feasible unbounded direction for Problem ((D)), so that Problem ((D)) is unbounded and hence Problem ((R2)) is infeasible. ∎∎
It remains to describe how to initialize EllAS. For this, we use the assumption of boundedness of and construct , , and as follows: for each , we add the constraint if , with corresponding , and the constraint otherwise, with . These constraints are valid since we assumed and it is easy to check that by construction, so that is dual feasible for ((D)). Moreover, we can easily compute in this case, as is a diagonal matrix with entries: this implies .
5 Analysis of the Algorithm
In this section, we show that Algorithm EllAS converges in a finite number of steps if cycling is avoided. Moreover, we prove that the running time per iteration can be bounded by , if implemented properly.
5.1 Convergence Analysis
Our convergence analysis follows similar arguments to those used in [16] for the analysis of primal active set methods for strictly convex quadratic programming problems. In particular, as in [16], we assume that we can always take a nonzero steplength along the ascent direction. Under this assumption we will show that Algorithm EllAS does not undergo cycling, or, in other words, this assumption prevents from having and in two different iterations and . As for other active set methods, it is very unlikely in practice to encounter a zero steplength. However, there are techniques to avoid cycling even theoretically, such as perturbation or lexicographic pivoting rules in Step 9 of Algorithm 2.
Lemma 1.
At every iteration of Algorithm EllAS, Problem ((DASk)) admits a feasible solution.
Proof.
It suffices to show that the ellipsoidal constraint
(5) 
is satisfied for each . For , this is explicitely required for the input of Algorithm EllAS. Let be computed from by moving along the direction . The feasibility of with respect to (5) then follows from the definition of and from the convexity of the ellipsoid. ∎∎
Proposition 3.
At every iteration of Algorithm EllAS, the vector is feasible for ((D)).
Proof.
Taking into account the proof of Lemma 1, it remains to show nonnegativity of , which is guaranteed by the choice of the steplength . ∎∎
Proposition 4.
Assume that the steplength is always nonzero in the dual step. If Algorithm EllAS does not stop at iteration , then one of the following holds:

;

and .
Proof.
In the primal step, suppose that solves Problem ((DASk)) and that the corresponding unique primal solution satisfies . After adding a violated cutting plane, the optimal value of Problem ((PAS)) strictly increases and the same is true for the optimal value of Problem ((DAS)) by strong duality. Then,
is a strict ascent direction for and case (i) holds.
In the dual step, if is an unbounded direction, case (i) holds again. Otherwise, observe that , as is not feasible with respect to the nonnegativity constraints. Then, since is the unique optimal solution for Problem ((DASk)) with minimal norm, is either a strict ascent direction for , or and is a strict descent direction for , so that case (ii) holds. ∎∎
Lemma 2.
At every iteration of Algorithm EllAS, we have . Furthermore, if Algorithm EllAS terminates at iteration with an optimal primaldual pair, then .
Proof.
As only violated cuts are added, the primal constraints either form an infeasible system or are linearly independent. If , the primal problem is hence infeasible. Thus Problem ((DASk)) is unbounded, so that at iteration a dual step is performed and a dependent row of is deleted, leading to an independent set of constraints again. ∎∎
Theorem 4.
5.2 Running time per iteration
The running time in iteration of EllAS is and hence linear in the size of the matrix , if implemented properly. The main work is to keep the pseudoinverse uptodate. Since is only extended or shrunk by one row in each iteration, an update of is possible in time by a generalization of the ShermanMorrisonformula [12]. Exploiting the fact that the matrix has full row rank in most iterations, we can proceed as follows. If is obtained from by adding a new row , we first compute the row vectors
Now if and only if has full row rank, and in the latter case
Otherwise, if , we are adding a linearly dependent row to , making the primal problem ((PAS)) infeasible. In this case, an unbounded direction of steepest ascent of ((DAS)) is given by and the next step will be a dual step, meaning that a row will be removed from and the resulting matrix will have full row rank again. We can thus update to by first removing and then adding a row, in both cases having full row rank.
It thus remains to deal with the case of deleting the th row of a matrix with full row rank. Here we obtain by deleting the th column in
where is the th column of .
Theorem 5.
The running time per iteration of Algorithm EllAS is .
Proof.
This follows directly from Lemma 2 and the discussion above. ∎∎
Clearly, the incremental update of the pseudoinverse may cause numerical errors. This can be avoided by recomputing it from scratch after a certain number of incremental updates. Instead of a fixed number of iterations, we recompute whenever the primal solution computed in a primal step is infeasible, i.e., violates the current constraints, where we allow a small tolerance.
In order to avoid wrong solutions even when pseudoinverses are not precise, we make sure in our implementation that the dual solution remains feasible for ((DAS)) in each iteration, no matter how big the error of is. For this, we slightly change the computation of : after computing exactly as explained, we determine the largest such that is dual feasible. Such must exist since is dual feasible, and it can easily be computed using the midnight formula. We then replace by and go on as before.
6 BranchandBound Algorithm
For solving the integer Problem ((RP)), the method presented in the previous sections must be embedded into a branchandbound scheme. The dual bounds are computed by Algorithm EllAS and the branching is done by splitting up the domain of some variable . Several properties of Algorithm EllAS can be exploited to improve the performance of such a branchandbound approach.
Warm starts
Clearly, as branching adds new constraints to the primal feasible region of the problem, while never extending it, all dual solutions remain feasible. In every node of the branchandboundtree, the active set algorithm can thus be warm started with the optimal set of constraints of the parent node. As in [5, 6], this leads to a significant reduction of the number of iterations compared to a cold start. Moreover, the newly introduced bound constraint is always violated and can be directly added as a new active constraint, which avoids resolving the same dual problem and hence saves one more iteration per node. Finally, the data describing the problem can either be inherited without changes or updated quickly; this is particularly important for the pseudoinverse .
Early pruning
Since we compute a valid dual bound for Problem ((RP)) in every iteration of Algorithm EllAS, we may prune a subproblem as soon as the current bound exceeds the value of the best known feasible solution.
Avoiding cycling or tailing off
Last but not least, we may also stop Algorithm EllAS at every point without compromising the correctness of the branchandbound algorithm. In particular, we can stop as soon as an iteration of Algorithm EllAS does not give a strict (or a significant) improvement in the dual bound. In particular, this avoids cycling.
7 Numerical Results
To test the performance of our algorithm EllAS, we considered random binary instances with up to one million constraints (Section 7.1) as well as combinatorial instances of Problem ((RP)), where the underlying problem is the Shortest Path problem (Section 7.2), the Assignment problem (Section 7.3), the Spanning Tree problem (Section 7.4), and the Traveling Salesman problem (Section 7.5). Concerning our approach, these combinatorial problems have different characteristics: while the first two problems have compact and complete linear formulations, the standard models for the latter problems use an exponential number of constraints that can be separated efficiently. In the case of the Spanning Tree problem, this exponential set of constraints again yields a complete linear formulation, while this is not the case for the NPhard Traveling Salesman problem. In the latter case, however, we still have a complete integer programming formulation, which suffices for the correctness of our approach.
For all problems, we consider instances where the positive definite matrix is randomly generated. For this, we chose eigenvalues uniformly at random from and orthonormalized random vectors , each entry of which was chosen uniformly at random from , then we set . For the random binary instances, the entries of were chosen uniformly at random from , while for all remaining instances the vector was uniformly one.
In the following, we present a comparison of BBEllAS, a C++ implementation of the branchandboundalgorithm based on EllAS, with the MISOCP solver of Gurobi 7.5.1 [9]. According to the latest benchmark results of Hans D. Mittelmann [13], Gurobi is currently the fastest solver for MISOCPs. We use Gurobi with standard settings, except that we use the same optimality tolerance as in BBEllAS, setting the absolute optimality tolerance MIPGapAbs to . All other standard parameters are unchanged. In particular, Gurobi uses presolve techniques that decrease the solution times significantly. In case of the Spanning Tree problem and the Traveling Salesman problem, we apply dynamic separation algorithms using a callback adding lazy constraints.
All our experiments were carried out on Intel Xeon processors running at 2.60 GHz. All running times were measured in CPU seconds and the timelimit was set to one CPU hour for each individual instance. All tables presented in this section include the following data for the comparison between BBEllAS and Gurobi: the number of instances solved within the time limit, the average running time, and the average number of branchandbound nodes. For BBEllAS, we also report the average total number of active set iterations and the average number of times the pseudoinverse is recomputed from scratch, the latter in percentage with respect to the number of iterations. All averages are taken over the set of instances solved within the time limit. For all applications, we also present performance profiles, as proposed in [8]. Given our set of solvers ={BBEllAS, Gurobi} and a set of problems , we compare the performance of a solver on problem against the best performance obtained by any solver in on the same problem. To this end we define the performance ratio where is the computational time, and we consider a cumulative distribution function
The performance profile for is the plot of the function .
7.1 Random Instances
For a first comparison, we consider instances of Problem ((P)) where the objective function vector and the positive definite matrix are randomly generated as described above. The set is explicitely given as , where and are also randomly generated: the entries of were chosen uniformly at random from the integers in the range and was defined by , . Altogether, we generated 160 different problem instances for ((P)): for each combination of and , we generated 10 instances. Since the set is explicitely given here, the linear constraints are separated by enumeration in BBEllAS. More precisely, at Step 8 of Algorithm 3, we pick the linear constraint most violated by . We report our results in Table 1.
BBEllAS  Gurobi  

#sol  time  nodes  iter  %ps  #sol  time  nodes  
25  10  0.00  3.9e+01  2.5e+02  0.12  10  0.76  1.3e+01  
25  10  0.03  6.5e+01  4.9e+02  0.45  10  9.39  2.1e+01  
25  10  0.60  1.0e+02  9.7e+02  1.22  10  156.63  2.8e+01  
25  10  16.91  2.5e+02  2.5e+03  0.85  10  1973.87  8.6e+01  
50  10  0.01  6.3e+01  3.5e+02  0.54  10  0.96  1.1e+01  
50  10  0.05  6.7e+01  4.4e+02  0.62  10  11.93  1.8e+01  
50  10  0.85  7.8e+01  6.8e+02  1.13  10  246.32  2.1e+01  
50  10  18.84  1.7e+02  1.7e+03  1.22  0  —  —  
100  10  0.40  1.3e+02  8.3e+02  6.79  10  2.35  2.5e+01  
100  9  0.36  1.6e+02  1.1e+03  1.78  10  27.51  7.7e+01  
100  9  4.26  2.6e+02  2.0e+03  1.60  10  761.07  2.3e+02  
100  7  94.88  4.9e+02  4.8e+03  3.68  0  —  —  
200  10  1.00  1.3e+02  7.6e+02  3.98  10  4.55  3.1e+01  
200  8  2.04  1.6e+02  1.3e+03  3.92  10  23.63  4.1e+01  
200  9  11.84  3.1e+02  2.6e+03  3.15  10  899.84  1.3e+02  
200  7  61.25  1.9e+02  1.4e+03  14.11  0  —  — 
From the results in Table 1, note that the average number of branchandbound nodes enumerated by BBEllAS is generally larger than the number of nodes needed by Gurobi, but always by less than a factor of 10 on average. However, in terms of running times, BBEllAS outperforms Gurobi on all instance types except for the larger instances with a medium number of constraints, i.e., for and . On all other instance classes, BBEllAS either solves significantly more instances than Gurobi within the time limit or has a faster running time by many orders of magnitude. This in confirmed by the performance profiles presented in Figure 1. The low number of iterations performed by EllAS per node (less than 10 on average) highlights the benefits of using warmstarts.
7.2 Shortest Path Problem
Given a directed graph , where is the set of vertices and is the set of edges, and weights associated with each edge, the Shortest Path problem is the problem of finding a path between two vertices and such that the sum of the weights of its constituent edges is minimized. Our approach uses the following flow based formulation of the Robust Shortest Path problem: