On the regularity and conditioning of low rank semidefinite programs
Low rank matrix recovery problems appear widely in statistics, combinatorics, and imaging. One celebrated method for solving these problems is to formulate and solve a semidefinite program (SDP). It is often known that the exact solution to the SDP with perfect data recovers the solution to the original low rank matrix recovery problem. It is more challenging to show that an approximate solution to the SDP formulated with noisy problem data acceptably solves the original problem; arguments are usually ad hoc for each problem setting, and can be complex.
In this note, we identify a set of conditions that we call regularity that limit the error due to noisy problem data or incomplete convergence. In this sense, regular SDPs are robust: regular SDPs can be (approximately) solved efficiently at scale; and the resulting approximate solutions, even with noisy data, can be trusted. Moreover, we show that regularity holds generically, and also for many structured low rank matrix recovery problems, including the stochastic block model, synchronization, and matrix completion. Formally, we call an SDP regular if it has a surjective constraint map, admits a unique primal and dual solution pair, and satisfies strong duality and strict complementarity.
However, regularity is not a panacea: we show the Burer-Monteiro formulation of the SDP may have spurious second-order critical points, even for a regular SDP with a rank 1 solution.
We consider a semidefinite program (SDP) in the standard form
where denotes the matrix trace inner product. The primal variable is the symmetric positive semidefinite (PSD) matrix . The problem data comprises a symmetric (but possibly indefinite) cost matrix , a righthand side , and a linear constraint map with rank operating on any by for some fixed symmetric . Denote an arbitrary solution of () as and the optimal value as .
The optimization problem () appears in problems in statistics [SS05], combinatorics [GW95], and imaging [CMP10], among others. Due to the nature of these applications, practical instances of () such as matrix completion [SS05, UT19] and MaxCut [GW95] are often expected to have low rank solutions. It is also notable that any instance of () admits a solution with rank satisfying [Bar95, Pat98].
Formally, we say an SDP is regular if it has a surjective constraint map, admits a unique primal and dual solution pair, and satisfies both strong duality and strict complementarity. (See Section 1.1 for more detail.) These conditions suffice to guarantee many useful properties about the resulting SDP.
Regularity was found by [AHO97] to hold generically: for almost all , and , () is regular so long as a primal and dual solution pair exists. A followup work [DIL16, Section 5] strengthens this result: for every surjective , regularity holds for almost all and , again conditioning on the existence of a primal and dual solution pair.
However, realistic applications of semidefinite programming may place structural constraints on , , and : for example, in matrix completion, the cost matrix ; in MaxCut type SDPs, the constraint map and the right hand side is the vector of all ones. We will show in Section 2, and 4 that many of these SDPs, including synchronization and the stochastic block model, are still regular. We also show in Section 5 that matrix completion is primal regular: it satisfies all conditions for regularity except (possibly) for dual uniqueness.
Conditioning and regularity
Many authors have shown that instances of the primal SDP () appearing statistical or signal processing problems [CR09, WdM15, Ban18], admit a unique low rank solution which coincides with (or is close to) the underlying true signal. However, this analysis does not fully solve the original problem: optimization procedures give reliable solutions only when the problem is well-conditioned; otherwise, inaccuracies in the problem data or incomplete convergence can lead to wildly different reconstructions of the underlying signal. Here we consider two different notions of problem conditioning:
Measurement error: Suppose we solve must obtain the problem data , , and via noisy measurements that result in perturbed problem data , , and . We solve () with with perturbed problem data and obtain a perturbed solution . To ensure that the perturbed solution is meaningful for the original problem, we must ensure the error in the solution is controlled by the size of the perturbation in the data.
We can describe the sensitivity of the solution to measurement error by finding constants such that for all small ,
Optimization error: Most optimization algorithms offer guarantees on the suboptimality of the putative solution they return, but many cannot guarantee bounds on the distance to the solution, . However, the distance to the solution is usually the more important metric for statistical and signal processing applications. Hence it is important to understand how (and when) guarantees in suboptimality translate into guarantees on the distance to the solution.
We may seek to bound the distance to the solution, , in terms of simpler metrics of optimization error: the infeasibility with respect to conic constraints, , and linear constraints, , and the suboptimality, . (Throughout the paper we define for .) We produce an error bound on the solution by finding constants such that for all near ,
Regular SDPs obey useful bounds on these condition numbers. In the literature, it has been found that if the SDP () is regular, then [NO99] and [Stu00]. We note that only requires primal regularity. An upcoming work of ours [DU] shows that under the weaker condition of primal regularity. Estimates of and for regular SDPs based on problem data and solutions are also available respectively in [NO99] and our upcoming work [DU]. When the SDP () is not regular but only feasible, then the exponent of can become as large as which is shown to be tight [Stu00, Example 2]. In such cases, the SDP is very ill-conditioned. Thus if the SDP () is regular or primal regular, neither measurement error nor optimization error impede signal recovery, as the distance to the solution (which is or close to the true signal) grows at most quadratically in the measurement or optimization error.
Regularity and algorithmic convergence
Regularity also plays an important role in the convergence analysis of algorithms of SDP. For example:
Regular SDP can be solved efficiently at scale: for example, the storage-optimal algorithm of [DYC19] requires regularity to ensure the limit of the dual iterates produces a meaningful approximation of the primal solution .
Regularity can also improves the convergence rate for many algorithms:
For the exact penalty formulation of the dual SDP [DYC19], subgradient-type methods with constant or diminishing stepsize require iterations to reach an -suboptimal dual solution. But for regular SDP, subgradient methods achieve faster sublinear rates , using the quadratic error bound induced by regularity for the analysis [Stu00, JM17].
Regularity and the Burer-Monteiro method
The Burer-Monteiro (BM) [BM03] approach solves the SDP () by factoring the decision variable, building on earlier work by Homer and Peinado [HP97] that introduced the approach for the MaxCut SDP. The BM approach factors the decision variable , with factor , and solves the following (nonconvex) problem:
When exceeds the rank of any solution to (), (BM) and () have the same solution set.
Usually, (BM) is solved using a Riemannian gradient or trust region method [BAC18], which requires that the feasible set forms a smooth Riemannian manifold. Following [BVB18], we call such an SDP smooth: the feasible set forms a smooth Riemannian manifold. In this paper we will consider many interesting smooth SDPs: including MaxCut, OrthogonalCut, and an SDP relaxation of a problem optimizing over a product of spheres; and statistical problems like synchronization and the stochastic block model. Notice that many interesting large scale SDPs, such as matrix completion [CT10] and phase retrieval [CSV13], may not be smooth.
Since these Riemannian optimization methods are only guaranteed to find second order stationary points, we will say the BM method succeeds for a smooth SDP when all second order stationary points are globally optimal (and fails otherwise). A recent result [BVB18] shows that for smooth SDP (and under a few more technical conditions), for almost all objectives , BM succeeds if .
Does the BM method succeed for every (smooth) regular SDP? Alas, no: we show the Burer-Monteiro approach (BM) can fail when , even if () is regular. This result extends a recent counterexamples due to [WW18] by showing uniqueness of the dual solution. Hence storage optimal algorithms for SDP, such as [DYC19], that operate directly on the SDP (without factorization) have advantages over BM.
We formally define regular SDPs in Section 1.1. Section 1.2 introduces the notation used in this paepr. In Section 2, we show that every PSD matrix solves a regular SDP and that primal regularity holds for almost all objectives under Slater’s condition. In Section 3, we construct regular SDPs for which the Burer-Monteiro approach fails. In Section 4, we use regularity to show that the SDPs corresponding to the stochastic block model and synchronization can recover the ground truth from noisy data. Notably, we show recovery is possible at higher noise thresholds than those for which the BM approach is known to succeed. Finally, in Section 5, we show that the celebrated matrix completion SDP is primal regular, but not (usually) regular.
Here is the dot product in . The decision variable is the vector . The map is the adjoint of the linear map , which satisfies . Explicitly, for .
We now formally state the conditions that define a regular SDP. The first two conditions, strong duality and linear independence, are standard in the literature.
Definition 1 (Strong Duality).
Notably, strong duality holds under Slater’s condition: existence of feasible primal and dual with .
Linear independence ensures that there are no redundant linear constraints.
Definition 2 (Linear independence).
Regularity also requires strict complementary slackness.
Definition 3 (Strict complementarity).
Linear programs always have some strictly complementary solution whenever they exist: there is always some primal optimal and dual optimal so
where nnz is the number of nonzeros [GT56]. In contrast, semidefinite programs may not satisfy strict complementarity.
Finally, regularity requires that both () and () have unique solutions[AHO97, Example 1].
Definition 4 (Regularity).
Definition 5 (Primal regularity).
The dual of a primal regular SDP may admit multiple solutions. Notice every regular SDP is primal regular. Primal regularity is practically important: for example, the matrix completion SDP [CR09], introduced in Section 5, is primal regular but not regular. Primal regular SDPs inherit some (but not all) of the nice properties of regular SDPs.
Regularity under generic problem data
Norms and Eigenvalues
For a matrix , we denote its Frobenius, operator two norm, and nuclear norm (sum of singular values) as , and respectively. The operator norm of a linear operator is defined as . We write the eigenvalues of a symmetric matrix in decreasing order as
We define the singular values similarly.
We use the Euclidean inner product for vectors: for ,, . We use the trace inner product for matrices: for and or and , .
Transposes and adjoints
For a vector or a matrix , and denote the transpose. The adjoint map of a linear map from is defined as the unique linear map such that for every , .
2 Regular SDPs are generic
In this section, we first show that any psd matrix solves a regular SDP. We also demonstrate that for almost all , if SDP () satisfies Slater’s condition and linear independence and has a primal solution, then it is primal regular. We then show that interesting SDPs, including MaxCut, OrthogonalCut, and ProductSDP (introduced in Section 2.3), are regular for almost all . Finally, we demonstrate numerically, MaxCut SDP of many graphs are indeed regular.
2.1 Any PSD matrix solves a regular SDP
Given any rank positive semidefinite matrix , we can construct a regular SDP with as its unique solution.
Write the eigenvalue decomposition of as . Here the eigenvalues satisfy , and we define the diagonal matrix and the orthonormal matrix .
We are now ready to construct the SDP and state our first theorem:
For any rank positive semidefinite matrix with eigenvalue decomposition , the SDP
with variable is regular and has as its unique solution.
Let us first write down the dual, with variables , :
We now verify each property required for regularity.
The matrices , are orthogonal and hence they are linearly independent.
Strong duality and strict complementarityDefine with for and for . To verify strong duality and the strict complementarity, we claim and are solutions to the primal SDP, Eq. 1, and dual SDP, Eq. 2, respectively. Indeed, it is easy to verify that is primal feasible. Furthermore, by writing the slack matrix we see is dual feasible and has rank . Since the primal and dual objective match, , we see and are a primal-dual optimal solution pair. Since has rank and , we see strict complementarity holds.
UniquenessSuppose that solves the dual problem (2). We will show that , and hence the dual has a unique solution. Using strong duality, we know . Moreover, and are psd. Hence has rank at most . By the definition of , we see (3) The lower right block of the inner matrix above is the identity . Hence we see has rank at least . Thus must have rank exactly . This fact forces the upper left block of in (3) to be . Hence, we must have . To show the primal solution is unique, introduce the new variable so that . Using this change of variables in (1), we see uniquely solves (1) if and only if uniquely solves (4) (Notice that is optimal for Eq. 4, using the same argument we used to show the optimality of for Eq. 1 above.) Since the optimal value of Eq. 4 is , from the constraints of (4), we see that any feasible of (4) has objective value . To achieve optimality, we must have for . Now use the fact that to see is the unique solution. ∎
2.2 Almost all cost matrices yield a primal regular SDP
We utilize [DL11, Corollary 3.5]: for a convex extended value function , for almost all , the perturbed function admits at most one minimizer and satisfies , the relative interior of .
To exploit this theorem, we set and take to be the function
where is the indicator function of a convex set : if and otherwise. Using [DL11, Corollary 3.5], we see that for almost all , the problem has at most one solution , and
which implies that for some slack matrix satisfying . Here step uses Slater’s condition to apply the sum rule of the subdifferential. Step uses [Ber09, Proposition 1.3.6]: the sum rule for the relative interior. Step uses basic sub-differential calculus. Hence, there is some and such that , , and
Hence is dual optimal and strict complementarity holds. ∎
2.3 MaxCut-type SDP are regular for almost all
In this section, we introduce three classes of SDPs that generalize the SDP relaxation of the MaxCut problem [GW95], with applications in statistical signal recovery, optics, and subproblems of important algorithms. We show in Corollary 1 that they are regular for almost all based on Theorem 2.
We call an SDP a MaxCut-type SDP if it is of the form
Here we do not require the cost matrix to be a negative Laplacian matrix.
For any , we denote by the -th digonal block of . An OrthogonalCut-type problem has decision variable for some integer and , or , and is of the form
ProductSDP: optimization over a product of spheres
Finally, we introduce (ProductSDP), an SDP relaxation of a quadratic program over a product of spheres. Let be a positive integer and let be a partition of the set : for all , and . A (ProductSDP)-type problem, with decision variable , takes the form
To explain the name of this SDP, suppose for each . The constraint ensures that is on the sphere in . Now stack the variables for as a vector . The SDP (ProductSDP) is a relaxation of the quadratic program
with replaced by . Problems of this form can appear as trust-region subproblems, e.g., [BVB18, Section 5.3].
Having defined these three classes of SDP, we show all of these problems are almost always regular.
We first checks dual uniqueness and linear independence, and then verify primal regularity to conclude that these three classes of SDP are regular.
Dual uniqueness and linear independence
First, note linear independence follows directly from the unqiueness of the dual solution. We show dual uniqueness by contradiction: if the dual is not unique, there is some such that and for some is still optimal. Using [WW18, Proposition 9], we know there is no nonzero such that
It is then immediate the dual is unique by noting for any dual optimal .
The primal solution exists because the feasible region of each class is compact and nonempty. Slater’s condition for these three classes of SDP can be easily verified using a well-chosen diagonal matrix. Hence Theorem 2 asserts these three classes are primal regular for almost all . ∎
2.4 Numerical verification for real-world SDP
In this section, we numerically verify that the MaxCut problems (MaxCut) corresponding to several graphs are regular. In particular, we use the Gset graphs G1 to G20 [Gse]; in the MaxCut relaxation, the cost matrix is the negative graph Laplacian. Each graph has vertices, so the MaxCut SDP (MaxCut) has a decision variable of size .
To verify strict complementarity, we must compute the rank of the primal and dual solution and , and , and see whether .
To verify uniqueness of the primal solution, define a matrix whose columns form an orthonormal basis for the null space of . Define the linear operator , , where . According to [AHO97][Theorem 9 & 10], the primal solution is unique if the smallest singular value is nonzero.
To verify uniqueness of the dual solution,
define a matrix whose columns form
an orthonormal basis for the column space of
and whose columns form a basis for the null space of .
Define the matrix where
the -th column of is
Numerically, we obtain and using the MOSEK solver [Mos10]. We estimate the rank by the number of eigenvalues larger than , and denote the smallest eigenvalue larger than as and respectively. We compute their condition numbers defined as and . We compute the condition numbers of and defined as and . The results are reported in Table 1. As can be seen, regularity is indeed satisfied for every MaxCut problem from G1 to G20. For graph G11, the condition number is about and its is actually only (not shown here) meaning that strict complementarity holds in a very weak sense.
3 Burer-Monteiro may fail for regular SDP
In this section, we show that the (BM) formulation of () admits second order stationary points that are not globally optimal even for regular SDPs with low rank ( or or ) solutions.
Recall from the introduction the Burer and Monteiro approach (BM approach) to semidefinite programming, which replaces the SDP () by the following nonlinear optimization problem with decision variable :
This problem is in general nonconvex.
Nonlinear optimization solvers such as Riemannian trust regions [BAC18] can guarantee that they find a second order stationary point (SOSP) of such a problem, but cannot guarantee (or even check) that they have found a global solution. When the constraint set is a manifold, as it is for all the examples discussed in the previous section, a putative solution is second order stationary if its Riemannian gradient is and its Riemannian Hessian is positive semidefinite. See Appendix A for further discussion.
Hence we can guarantee that the BM approach finds the global optimum if we can prove that all SOSPs are globally optimal. The following definition serves as a useful shorthard as we understand when this condition holds.
Note that as a practical matter, a nonlinear solver for (BM) might produce a globally optimal SOSP even for a problem that admits non-optimal SOSPs.
Recall from the introduction that for almost all , when , any SOSP of (BM) is globally optimal [BVB18]. On the other hand, building on results by [WW18], we will demonstrate a positive measure set of regular SDP of each of the three classes described in Section 2.3 for which BM fails whenever .
3.1 Examples: MaxCut, OrthogonalCut, and ProductSDP
As demonstrated in [WW18, Corollary 1], if
then there is a positive measure set of the cost matrix for which (MaxCut) has a unique rank solution but the BM approach fails.
Are these SDP particularly nasty? On the contrary! Our contribution, stated in the following theorem, is to show that these SDPs are regular. We also generalize these results to (OrthogonalCut) and (ProductSDP).
Fix a positive integer . If
then there is a set of cost matrices with positive measure for which (MaxCut) admits a unique rank solution and is regular, but the BM approach fails.
The proofs of dual uniqueness and linear independence are the same as in the proof of Corollary 1. We next verify the failure of BM, and primal regularity.
Failure of BM, and Primal regularityWaldspurger and Waters show that there is a positve measure set of cost matrices for which (MaxCut) satisfies: (1) strong duality [WW18, Proposition 4], (2) uniqueness of a primal solution with rank [WW18, Corollary 2], (3) strict complementarity for a dual solution [WW18, Lemma 2, Lemma 9 and ], (4) the BM approach fails [WW18, Corollary 1]. Together with dual uniqueness and linear independence, these results verify the theorem statement for (MaxCut).
OrthogonalCut and ProductSDPThe proof for the other two SDPs follows exactly the same argument as above, using [WW18, Corollary 2] for (OrthogonalCut) and [WW18, Corollary 3] for (ProductSDP). ∎
4 Noisy SDPs are regular
In section 2, we saw that many interesting SDPs are regular for almost any cost matrix . In this section, we show that the (very structured) cost matrices that appear in certain statistical problems also yield regular SDPs. In these problems, the objective measures agreement with observations of aground-truth object, while the constraints restrict the complexity of the solution. Importantly, regularity of these problems guarantees that the solution of the SDP recovers the ground truth.
More precisely, we consider the SDP relaxations of the following statistical problems:
Stochastic Block Model
We show that these SDP relaxations are regular with high probability.
We also demonstrate a strong advantage to solving the original SDP rather than using the BM approach (when applicable): these SDP provably recover the ground truth under much higher noise that is (provably) allowable using the BM approach.
Consider a binary vector . The synchronization problem is to to recover the vector up to a sign from the observations , where is symmetric with iid standard normal upper diagonal entries, and diagonal entries. The value is the noise level. The SDP proposed in the literature with decision variable is
The corresponding Burer-Monteiro formulation with variable is
It is intuitive that the problem is more challenging as the noise level increases. For ( Sync), if the noise level satisfies for some numerical constant , it admits as its unique solution with high probability [Ban18, Proposition 3.6]. But for (BM Sync) with , the best known theoretical results state that the noise level must be less than for some small numerical constant to ensure the BM formulation succeeds, i.e., all second order stationary points satisfy [BBV16]. The gap between and is polynomially large.
where is the adjoint operator of . Note that using the fact that . Using the proof of [Ban18, Proposition 3.6, proof on pp356] and , we find that with high probability, there is a numerical constant such that
We see and is optimal as . Moreover, strict complementarity is satisfied, as . The linear independence relation and the uniqueness of the dual can be verified in the same way as in the proof of Theorem 3. We summarize our findings as the following theorem.
For the synchronization problem, if the noise level for some numerical constant , then with high probability the SDP ( Sync) is regular with primal solution . Moreover, the dual solution satisfies for some numerical constant .
4.2 Stochastic Block Model
The stochastic block model (SBM) is structurally quite similar to synchronization. The SBM posits that we observe the edges and vertices of a graph with vertices that are split into two clusters according to a binary membership vector . For each pair of vertices with , the undirected edge is formed with probability if vertices and are in the same cluster () and with probability otherwise. The goal is to recover the cluster membership vector . For simplicity, we further assume that is even and that the clusters are balanced: entries of are and are . Let be the adjacency matrix of with diagonal entries set to be . The SDP proposed to recover by [BBV16], with variable , is
where the matrix . The corresponding Burer-Monteiro formulation with variable is
(There are other SDP formulations for SBM which make weaker assumptions; see [Ban18]. However, there are no guarantees for the corresponding Burer-Monteiro relaxations.)
where the error matrix has zero diagonal, expectation , and satisfies that for , ,
and for ,
We may rescale the cost matrix by to form
which has the same form as the observation matrix in Section 4.1.