Chordal decomposition in operator-splitting methods for sparse semidefinite programs ††thanks: YZ and GF contributed equally. A preliminary version of part of this work appeared in ZFPGWpd2016 (); ZFPGWhsde2016 (). YZ is supported by Clarendon Scholarship and Jason Hu Scholarship. GF was supported in part by the EPSRC grant EP/J010537/1. AP was supported in part by EPSRC Grant EP/J010537/1 and EP/M002454/1.
We employ chordal decomposition to reformulate a large and sparse semidefinite program (SDP), either in primal or dual standard form, into an equivalent SDP with smaller positive semidefinite (PSD) constraints. In contrast to previous approaches, the decomposed SDP is suitable for the application of first-order operator-splitting methods, enabling the development of efficient and scalable algorithms. In particular, we apply the alternating directions method of multipliers (ADMM) to solve decomposed primal- and dual-standard-form SDPs. Each iteration of such ADMM algorithms requires a projection onto an affine subspace, and a set of projections onto small PSD cones that can be computed in parallel. We also formulate the homogeneous self-dual embedding (HSDE) of a primal-dual pair of decomposed SDPs, and extend a recent ADMM-based algorithm to exploit the structure of our HSDE. The resulting HSDE algorithm has the same leading-order computational cost as those for the primal or dual problems only, with the advantage of being able to identify infeasible problems and produce an infeasibility certificate. All algorithms are implemented in the open-source MATLAB solver CDCS. Numerical experiments on a range of large-scale SDPs demonstrate the computational advantages of the proposed methods compared to common state-of-the-art solvers.
Keywords:sparse SDPs chordal decomposition operator-splitting first-order methods
Msc:90C0690C22 90C25 49M27 49M29
Semidefinite programs (SDPs) are convex optimization problems over the cone of positive semidefinite (PSD) matrices. Given , , and matrices , the standard primal form of an SDP is
while the standard dual form is
In the above and throughout this work, is the usual -dimensional Euclidean space, is the space of symmetric matrices, is the cone of PSD matrices, and denotes the inner product in the appropriate space, i.e., for and for . SDPs have found applications in a wide range of fields, such as control theory, machine learning, combinatorics, and operations research boyd1994linear (). Semidefinite programming encompasses other common types of optimization problems, including linear, quadratic, and second-order cone programs boyd2004convex (). Furthermore, many nonlinear convex constraints admit SDP relaxations that work well in practice vandenberghe1996semidefinite ().
It is well-known that small and medium-sized SDPs can be solved up to any arbitrary precision in polynomial time vandenberghe1996semidefinite () using efficient second-order interior-point methods (IPMs) alizadeh1998primal (); helmberg1996interior (). However, many problems of practical interest are too large to be addressed by the current state-of-the-art interior-point algorithms, largely due to the need to compute, store, and factorize an matrix at each iteration.
A common strategy to address this shortcoming is to abandon IPMs in favour of simpler first-order methods (FOMs), at the expense of reducing the accuracy of the solution. For instance, Malick et al. introduced regularization methods to solve SDPs based on a dual augmented Lagrangian malick2009regularization (). Wen et al. proposed an alternating direction augmented Lagrangian method for large-scale SDPs in the dual standard form wen2010alternating (). Zhao et al. presented an augmented Lagrangian dual approach combined with the conjugate gradient method to solve large-scale SDPs zhao2010newton (). More recently, O’Donoghue et al. developed a first-order operator-splitting method to solve the homogeneous self-dual embedding (HSDE) of a primal-dual pair of conic programs ODonoghue2016 (). The algorithm, implemented in the C package SCS scs (), has the advantage of providing certificates of primal or dual infeasibility.
A second major approach to resolve the aforementioned scalability issues is based on the observation that the large-scale SDPs encountered in applications are often structured and/or sparse boyd1994linear (). Exploiting sparsity in SDPs is an active and challenging area of research andersen2011interior (), with one main difficulty being that the optimal (primal) solution is typically dense even when the problem data are sparse. Nonetheless, if the aggregate sparsity pattern of the data is chordal (or has sparse chordal extensions), Grone’s grone1984positive () and Agler’s theorems agler1988positive () allow one to replace the original, large PSD constraint with a set of PSD constraints on smaller matrices, coupled by additional equality constraints. Having reduced the size of the semidefinite variables, the converted SDP can in some cases be solved more efficiently than the original problem using standard IPMs. These ideas underly the domain-space and the range-space conversion techniques in fukuda2001exploiting (); kim2011exploiting (), implemented in the MATLAB package SparseCoLO fujisawa2009user ().
The problem with such decomposition techniques, however, is that the addition of equality constraints to an SDP often offsets the benefit of working with smaller semidefinite cones. One possible solution is to exploit the properties of chordal sparsity patterns directly in the IPMs: Fukuda et al. used Grone’s positive definite completion theorem grone1984positive () to develop a primal-dual path-following method fukuda2001exploiting (); Burer proposed a nonsymmetric primal-dual IPM using Cholesky factors of the dual variable and maximum determinant completion of the primal variable burer2003semidefinite (); and Andersen et al. developed fast recursive algorithms to evaluate the function values and derivatives of the barrier functions for SDPs with chordal sparsity andersen2010implementation (). Another attractive option is to solve the sparse SDP using FOMs: Sun et al. proposed a first-order splitting algorithm for partially decomposable conic programs, including SDPs with chordal sparsity sun2014decomposition (); Kalbat & Lavaei applied a first-order operator-splitting method to solve a special class of SDPs with fully decomposable constraints Kalbat2015Fast (); Madani et al. developed a highly-parallelizable first-order algorithm for sparse SDPs with inequality constraints, with applications to optimal power flow problems Madani2015ADMM ().
In this work, we embrace the spirit of ODonoghue2016 (); sun2014decomposition (); Kalbat2015Fast (); Madani2015ADMM () and exploit sparsity in SDPs using a first-order operator-splitting method known as the alternating direction method of multipliers (ADMM). Introduced in the mid-1970s glowinski1975approximation (); gabay1976dual (), ADMM is related to other FOMs such as dual decomposition and the method of multipliers, and it has recently found applications in many areas, including covariance selection, signal processing, resource allocation, and classification; see boyd2011distributed () for a review. More precisely, our contributions are:
Using Grone’s theorem grone1984positive () and Agler’s theorem agler1988positive (), we formulate domain-space and range-space conversion frameworks for primal- and dual-standard-form sparse SDPs with chordal sparsity, respectively. These resemble the conversion methods developed in fukuda2001exploiting (); kim2011exploiting () for IPMs, but are more suitable for the application of FOMs. One major difference with fukuda2001exploiting (); kim2011exploiting () is that we introduce two sets of slack variables, so that the conic and the affine constraints can be separated when using operator-splitting algorithms.
We apply ADMM to solve the domain- and range-space converted SDPs, and show that the resulting iterates of the ADMM algorithms are the same up to scaling. The iterations are cheap: the positive semidefinite (PSD) constraint is enforced via parallel projections onto small PSD cones — a computationally cheaper strategy than that in sun2014decomposition (), while imposing the affine constraints requires solving a linear system with constant coefficient matrix, the factorization/inverse of which can be cached before iterating the algorithm.
We formulate the HSDE of a converted primal-dual pair of sparse SDPs. In contrast to sun2014decomposition (); Kalbat2015Fast (); Madani2015ADMM (), this allows us to compute either primal and dual optimal points, or a certificate of infeasibility. In particular, we extend the algorithm proposed in ODonoghue2016 () to exploit the structure of our HSDE, reducing its computational complexity. The resulting algorithm is more efficient than a direct application of the method of ODonoghue2016 () to either the original primal-dual pair (i.e. before chordal sparsity is taken into account), or the converted problems: in the former case, the chordal decomposition reduces the cost of the conic projections; in the latter case, we speed up the projection onto the affine constraints using a series of block-eliminations.
We present the MATLAB solver CDCS (Cone Decomposition Conic Solver), which implements our ADMM algorithms. CDCS is the first open-source first-order solver that exploits chordal decomposition and can detect infeasible problems. We test our implementation on large-scale sparse problems in SDPLIB borchers1999sdplib (), selected sparse SDPs with nonchordal sparsity pattern andersen2010implementation (), and randomly generated SDPs with block-arrow sparsity patterns sun2014decomposition (). The results demonstrate the efficiency of our algorithms compared to the interior-point solvers SeDuMi sturm1999using () and the first-order solver SCS scs ().
The rest of the paper is organized as follows. Section 2 reviews chordal decomposition and the basic ADMM algorithm. Section 3 introduces our conversion framework for sparse SDPs based on chordal decomposition. We show how to apply the ADMM to exploit domain-space and range-space sparsity in primal and dual SDPs in Section 4. Section 5 discusses the ADMM algorithm for the HSDE of SDPs with chordal sparsity. CDCS and our numerical experiments are presented in Section 6. Section 7 concludes the paper.
2.1 A review of graph theoretic notions
We start by briefly reviewing some key graph theoretic concepts (see godsil2013algebraic (); blair1993introduction () for more details). A graph is defined by a set of vertices and a set of edges . A graph is called complete if any two nodes are connected by an edge. A subset of vertices such that for any distinct vertices , i.e., such that the subgraph induced by is complete, is called a clique. The number of vertices in is denoted by . If is not a subset of any other clique, then it is referred to as a maximal clique. A cycle of length in a graph is a set of pairwise distinct nodes such that and for . A chord is an edge joining two non-adjacent nodes in a cycle.
An undirected graph is called chordal (or triangulated, or a rigid circuit vandenberghe2014chordal ()) if every cycle of length greater than or equal to four has at least one chord. Chordal graphs include several other classes of graphs, such as acyclic undirected graphs (including trees) and complete graphs. Algorithms such as the maximum cardinality search tarjan1984simple () can test chordality and identify the maximal cliques of a chordal graph efficiently, i.e., in linear time in terms of the number of nodes and edges. Non-chordal graphs can always be chordal extended, i.e., extended to a chordal graph, by adding additional edges to the original graph. Computing the chordal extension with the minimum number of additional edges is an NP-complete problem yannakakis1981computing (), but several heuristics exist to find a good chordal extensions efficiently vandenberghe2014chordal ().
2.2 Sparse matrix cones and chordal decomposition
The sparsity pattern of a symmetric matrix can be represented by an undirected graph , and vice-versa. For example, the graphs in Fig. 2 correspond to the sparsity patterns illustrated in Fig. 3. With a slight abuse of terminology, we refer to the graph as the sparsity pattern of . Given a clique of , we define a matrix as
where is the -th vertex in , sorted in the natural ordering. Given , the matrix can be used to select a principal submatrix defined by the clique , i.e., . In addition, the operation creates an symmetric matrix from a matrix. For example, the chordal graph in Fig. 1(b) has a maximal clique , and for we have
Given an undirected graph , let be a set of edges that includes all self-loops. We define the space of sparse symmetric matrices represented by as
and the cone of sparse PSD matrices as
Moreover, we consider the cone
given by the projection of the PSD cone onto the space of sparse matrices with respect to the usual Frobenius matrix norm (this is the norm induced by the usual trace inner product on the space of symmetric matrices). In is not difficult to see that if and only if it has a positive semidefinite completion, i.e., if there exists such that when .
For any undirected graph , the cones and are dual to each other with respect to the trace inner product in the space of sparse matrices vandenberghe2014chordal (). In other words,
If is chordal, then and can be equivalently decomposed into a set of smaller but coupled convex cones according to the following theorems:
Theorem 2.1 (Grone’s theorem grone1984positive ())
Let be a chordal graph, and let be the set of its maximal cliques. Then, if and only if
Theorem 2.2 (Agler’s theorem agler1988positive ())
Let be a chordal graph, and let be the set of its maximal cliques. Then, if and only if there exist matrices for such that
Note that these results can be proven individually, but can also be derived from each other using the duality of the cones and kim2011exploiting (). In this paper, the terminology chordal (or clique) decomposition of a sparse matrix cone will refer to the application of Theorem 2.1 or Theorem 2.2 to replace a large sparse PSD cone with a set of smaller but coupled PSD cones. Chordal decomposition of sparse matrix cones underpins much of the recent research on sparse SDPs fukuda2001exploiting (); kim2011exploiting (); andersen2010implementation (); sun2014decomposition (); Madani2015ADMM (); vandenberghe2014chordal (), most of which relies on the conversion framework for IPMs proposed in fukuda2001exploiting (); kim2011exploiting ().
To illustrate the concept, consider the chordal graph in Fig. 1(b). According to Grone’s theorem,
Similarly, Agler’s theorem guarantees that (after eliminating some of the variables)
Note that the PSD contraints obtained after the chordal decomposition of (resp. ) are coupled via the elements , and (resp. , and ).
2.3 The Alternating Direction Method of Multipliers
The computational “engine” employed in this work is the alternating direction method of multipliers (ADMM). ADMM is an operator-splitting method developed in the 1970s, and it is known to be equivalent to other operator-splitting methods such as Douglas-Rachford splitting and Spingarn’s method of partial inverses; see boyd2011distributed () for a review. The ADMM algorithm solves the optimization problem
where and are convex functions, and . Given a penalty parameter and a dual multiplier , the ADMM algorithm finds a saddle point of the augmented Lagrangian
by minimizing with respect to the primal variables and separately, followed by a dual variable update:
Under very mild conditions, the ADMM converges to a solution of (3) with a rate (boyd2011distributed, , Section 3.2). ADMM is particularly suitable when (4a) and (4b) have closed-form expressions, or can be solved efficiently. Moreover, splitting the minimization over and often allows distributed and/or parallel implementations of steps (4a)–(4c).
3 Chordal decomposition of sparse SDPs
The sparsity pattern of the problem data for the primal-dual pair of standard-form SDPs (1)-(2) can be described using the so-called aggregate sparsity pattern. We say that the pair of SDPs (1)-(2) has aggregate sparsity pattern if
In other words, the aggregate sparsity pattern is the union of the individual sparsity patterns of the data matrices , . Throughout the rest of this paper, we assume that the aggregate sparsity pattern is chordal (or that a suitable chordal extension has been found), and that it has maximal cliques . In addition, we assume that the matrices , , are linearly independent.
It is not difficult to see that the aggregate sparsity pattern defines the sparsity pattern of any feasible dual variable in (2), i.e. any dual feasible must have sparsity pattern . Similarly, while the primal variable in (1) is usually dense, the value of the cost function and the equality constraints depend only on the entries with , and the remaining entries simply guarantee that . Recalling the definition of the sparse matrix cones and , we can therefore recast the primal-form SDP (1) as
and the dual-form SDP (2) as
This nonsymmetric formulation was first proposed by Fukuda et al. fukuda2001exploiting (), and was later discussed in andersen2010implementation (); sun2014decomposition (); kim2011exploiting (). Note that (6) and (7) are a primal-dual pair of linear conic problems because the cones and are dual to each other.
3.1 Domain-space decomposition
As we have seen in Section 2, Grone’s theorem allows us to decompose the sparse matrix cone constraint into standard PSD constraints on the submatrices of defined by the cliques . In other words,
These constraints are implicitly coupled since and have overlapping elements if . Upon introducing slack variables , , we can rewrite this as
The primal optimization problem (6) is then equivalent to the SDP
Redundant constraints in (11) can be eliminated using the running intersection property blair1993introduction () of the cliques fukuda2001exploiting (), and the decomposed SDP can be solved efficiently by IPMs in certain cases fukuda2001exploiting (); kim2011exploiting (). However, effectively applying FOMs to (9) after eliminating is not straightforward. In sun2014decomposition () an SDP with a quadratic objective had to be solved at each iteration to impose the PSD constraints, requiring an additional iterative solver. Even when this problem is resolved, e.g. by using the algorithm of ODonoghue2016 (), the size of the KKT system enforcing the affine constraints is increased dramatically by the consensus conditions (11), sometimes so much that memory requirements are prohibitive on desktop computing platforms fukuda2001exploiting (). In contrast, we show in Section 4 that if a set of slack variables are introduced in (8) and is retained in (9), then the PSD constraint can be imposed via projections onto small PSD cones. At the same time, the affine constraints require the solution of an linear system of equations, as if no consensus constraints were introduced. This makes our conversion framework more suitable to FOMs than that of fukuda2001exploiting (); kim2011exploiting ().
3.2 Range-space decomposition
We then introduce slack variables , and rewrite
Similar comments as in Remark 1 hold, and the slack variables are essential to formulate a decomposition framework suitable for the application of FOMs. The range-space decomposition of (2) is then given by
Although the domain- and range-space decompositions (9) and (12) have been derived individually, they are in fact a primal-dual pair of SDPs. The duality between the original SDPs (1) and (2) is inherited by the decomposed SDPs (9) and (12) by virtue of the duality between Grone’s and Agler’s theorems. This elegant picture is illustrated in Fig. 4.
4 ADMM for domain- and range-space decompositions of sparse SDPs
In this section, we demonstrate how ADMM can be applied to solve the domain-space decomposition (9) and the range-space decomposition (12) efficiently. Furthermore, we show that the resulting domain- and range-space algorithms are equivalent, in the sense that one is just a scaled version of the other. Throughout this section, will denote the indicator function of a set , i.e.
For notational neatness, however, we write when .
To ease the exposition further, we consider the usual vectorized forms of (9) and (12). Specifically, we let be the usual operator mapping a matrix to the stack of its column and define the vectorized data
Note that the assumption that , , are linearly independent matrices means that has full row rank. For all , we also introduce the vectorized variables
and define “entry-selector” matrices for that project onto the subvectors , i.e. such that
Note that for each , the rows of are orthonormal, and that the matrix is diagonal. Upon defining
such that if and only if , we can rewrite (9) as
while (12) becomes
4.1 ADMM for the domain-space decomposition
We start by moving the constraints and in (13) to the objective using the indicator functions and , respectively, i.e., we write
This problem is in the standard form for the application of ADMM. Given a penalty parameter and a Lagrange multiplier for each constraint , , we consider the (modified) augmented Lagrangian
and group the variables as , , and . According to (4), each iteration of the ADMM requires the minimization of the Lagrangian in (16) with respect to the - and -blocks separately, and follows by an update of the multipliers . At each step, the variables not being optimized over are fixed to their most current value. Note that splitting the primal variables in the two blocks and defined above is essential to solving the and minimization subproblems (4a) and (4b); more details will be given in Remark 3 after describing the -minimization step in Section 4.1.2.
4.1.1 Minimization over
Minimizing the augmented Lagrangian (16) over is equivalent to the equality-constrained quadratic program
Letting be the multiplier for the equality constraint (we scale the multiplier by for convenience), and defining
the optimality conditions for (17) can be written as the KKT system
Recalling that the product is a diagonal matrix for all we conclude that so is , and since has full row rank by assumption (19) can be solved efficiently, for instance by block elimination. In particular, eliminating shows that the only matrix to be inverted/factorized is
Incidentally, we note that the first-order algorithms of wen2010alternating (); ODonoghue2016 () require the factorization of a similar matrix with the same dimension. Since this matrix is the same at every iteration, its Cholesky factorization (or any other factorization of choice) can be computed and cached before starting the ADMM iterations. For some families of SDPs, such as the SDP relaxation of MaxCut problems and sum-of-squares (SOS) feasibility problems zheng2017Exploiting (), the matrix is diagonal, so solving (19) is inexpensive even when the SDPs are very large. If factorizing is too expensive, the linear system (19) can alternatively be solved by an iterative method, such as the conjugate gradient method saad2003iterative ().
4.1.2 Minimization over
Minimizing the augmented Lagrangian (16) over is equivalent to solving independent conic problems of the form
In terms of the original matrix variables , each of these sub-problems amounts to a projection on a PSD cone. More precisely, if denotes the projection onto the PSD cone and , we have
Since the projection can be computed with an eigenvalue decomposition, and since the size of each cone is small for typical sparse SDPs (such as SDP relaxations of MaxCut problems), the variables can be updated efficiently. Moreover, the computation can be carried out in parallel. In contrast, the algorithms for generic SDPs developed in wen2010alternating (); ODonoghue2016 (); malick2009regularization () require projections onto the original large PSD cone .
As anticipated in Remark 1, retaining the global variable in the domain-space decomposed SDP to enforce the consensus constraints between the entries of the subvectors (i.e., ) is fundamental. In fact, it allowed us to separate the conic constraints from the affine constraints in (13) when applying the splitting strategy of ADMM, making the minimization over easy to compute and parallelizable. In contrast, when is eliminated as in the conversion method of fukuda2001exploiting (); kim2011exploiting (), the conic constraints and the affine constraints cannot be easily decoupled when applying the first-order splitting method: in sun2014decomposition () a quadratic SDP had to be solved at each iteration, impeding the scalability of the algorithm.
4.1.3 Updating the multipliers
The final step in the -th ADMM iteration is to update the multipliers with the usual gradient ascent rule: for each ,
This computation is cheap and easily parallelized.
4.1.4 Summary & Stopping conditions
The ADMM algorithm is stopped after the -th iteration if the relative primal/dual error measures
are smaller than a specified tolerance, . The reader is referred to boyd2011distributed () for a detailed discussion of stopping conditions for ADMM algorithms. In conclusion, a primal-form SDP with domain-space decomposition (13) can be solved using the steps summarized in Algorithm 1.
4.2 ADMM for the range-space decomposition
An ADMM algorithm similar to Algorithm 1 can be developed for the range-space decomposition (14) of a dual-standard-form sparse SDP. As in Section 4.1, we start by moving all but the consensus equality constraints , , to the objective using indicator functions. This leads to
Given a penalty parameter and a Lagrange multiplier for each of the constraints , , we consider the (modified) augmented Lagrangian
and consider three groups of variables, , , and . Similar to Section 4.1, each iteration of the ADMM algorithm for (14) consists of minimizations over and , and an update of the multipliers . Each of these steps admits an inexpensive closed-form solution, as we demonstrate next.
4.2.1 Minimization over
Minimizing (26) over block is equivalent to solving the equality-constrained quadratic program
Let be the multiplier for the equality constraint. After some algebra, the optimality conditions for (4.2.1) can be written as the KKT system
plus a set of uncoupled equations for the variables ,