DSOS and SDSOS Optimization:
More Tractable Alternatives to
Sum of Squares and Semidefinite Optimization^{†}^{†}thanks: Submitted to the editors on June 9, 2017.
Funding: Amir Ali Ahmadi was partially supported by a DARPA Faculty Award, a Sloan Fellowship, an NSF CAREER Award, an AFOSR Young Investigator Program Award, and a Google Research Award.
Abstract
In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on largescale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as more tractable alternatives to sum of squares optimization that rely instead on linear and second order cone programs respectively. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas—polynomial optimization, statistics and machine learning, derivative pricing, and control theory—that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently far beyond the reach of sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The appendix of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization.
Key words. Sum of squares optimization, polynomial optimization, nonnegative polynomials, semidefinite programming, linear programming, second order cone programming.
AMS subject classifications. 65K05, 90C25, 90C22, 90C05, 90C90, 12Y05, 93C85, 90C27
1 Introduction
For which values of the real coefficients is the polynomial
(1) 
nonnegative, i.e., satisfies for all ? The problem of optimizing over nonnegative polynomials—of which our opening question is a toy example—is fundamental to many problems of applied and computational modern mathematics. In such an optimization problem, one would like to impose constraints on the coefficients of an unknown polynomial so as to make it nonnegative, either globally in (as in the above example), or on a certain basic semialgebraic set, i.e., a subset of defined by a finite number of polynomial inequalities. We will demonstrate shortly (see Section LABEL:sec:intro_applications) why optimization problems of this kind are ubiquitous in applications and universal in the study of questions dealing in one way or another with polynomial equations and inequalities.
Closely related to nonnegative polynomials are polynomials that are sums of squares. We say that a polynomial is a sum of squares (sos) if it can be written as for some (finite number of) polynomials . For example, the polynomial in (LABEL:eq:opening.poly.ex) with is sos since it admits the decomposition
The relationship between nonnegative and sum of squares polynomials has been a classical subject of study in real algebraic geometry. For example, a result of Hilbert from 1888 [37] states that all nonnegative bivariate polynomials of degree four are sums of squares. It follows, as a special case, that the sets of coefficients for which the polynomial in (LABEL:eq:opening.poly.ex) is nonnegative or a sum of squares in fact coincide. In the general situation, however, while sum of squares polynomials are clearly always nonnegative, it is not true that the converse always holds. This was shown in the same paper of Hilbert [37], where he gave a nonconstructive proof of existence of nonnegative polynomials that are not sums of squares. Explicit examples of such polynomials appeared many years later, starting with the work of Motzkin [59] in the 1960s. Hilbert’s interest in this line of research is also showcased in his 17th problem, which asks whether every nonnegative polynomial is a sum of squares of rational functions. We refer the interested reader to an outstanding survey paper of Reznick [74], which covers many historical aspects around Hilbert’s 17th problem, including the affirmative solution by Artin [6], as well as several later developments.
The classical questions around nonnegative and sum of squares polynomials have been revisited quite extensively in the past 1015 years in different communities among applied and computational mathematicians. The reason for this renewed interest is twofold: (i) the discovery that many problems of modern practical interest can be cast as optimization problems over nonnegative polynomials, and (ii) the observation that while optimizing over nonnegative polynomials is generally NPhard, optimization over the set of sum of squares polynomials can be done via semidefinite programming (SDP); see Theorem LABEL:thm:sos.sdp in Section LABEL:sec:sos.SDP.background. The latter development, originally explored in the pioneering works of Shor [79], Nesterov [61], Parrilo [65],[66], and Lasserre [44], has led to the creation of sum of squares optimization—a computational framework, with semidefinite programming as its underlying engine, that can tackle many fundamental problems of real algebra and polynomial optimization.
The dependence of sum of squares approaches on semidefinite programming can be viewed as both a strength and a weakness depending on one’s perspective. From a computational complexity viewpoint, semidefinite programs can be solved with arbitrary accuracy in polynomial time using interior point methods (see [85] for a comprehensive survey). As a result, sum of squares techniques offer polynomial time algorithms that approximate a very broad class of NPhard problems of interest. From a more practical viewpoint however, SDPs are among the most expensive convex relaxations to solve. The speed and reliability of the current SDP solvers lag behind those for other more restricted classes of convex programs (such as linear or second order cone programs) by a wide margin. With the added complication that the SDPs generated by sos problems are large (see Section LABEL:sec:sos.SDP.background), scalability has become the single most outstanding challenge for sum of squares optimization in practice.
In this paper, we focus on the latter of the two viewpoints mentioned above and offer alternatives to sum of squares optimization that while more conservative in general, are significantly more scalable. Our hope is that by doing so, we expand the use and applicability of algebraic techniques in optimization to new areas and share its appeal with a broader audience. We call our new computational frameworks, which rely on linear and second order cone programming, DSOS and SDSOS optimization^{1}^{1}1We use and recommend the pronunciation “dsauce” and “sdsauce”.. These are short for diagonallydominantsumofsquares and scaleddiagonallydominantsumofsquares; see Section LABEL:sec:dsos.sdsos for precise definitions. While these tools are primarily designed for sum of squares optimization, they are also applicable to general applications of semidefinite programming where tradeoffs between scalability and performance may be desirable. In the interest of motivating our contributions for a diverse audience, we delay a presentation of these contributions until Section LABEL:sec:dsos.sdsos and start instead with a portfolio of problem areas involving nonnegative polynomials. Any such problem area is one to which sum of squares optimization, as well as its new DSOS and SDSOS counterparts, are directly applicable.
1.1 Why optimize over nonnegative polynomials?
We describe several motivating applications in this section at a high level. These will be revisited later in the paper with concrete computational examples; see Section LABEL:sec:experiments.
Polynomial optimization
A polynomial optimization problem (POP) is an optimization problem of the form
(2) 
where , , and are multivariate polynomials. The special case of problem (LABEL:eq:POP) where the polynomials all have degree one is of course linear programming, which can be solved very efficiently. For higher degrees, POP contains as special cases many important problems in operations research; e.g., the optimal power flow problem in power engineering [39], the computation of Nash equilibria in game theory [43], [67], problems of Euclidean embedding and distance geometry [50], and a host of problems in combinatorial optimization. We observe that if we could optimize over the set of polynomials that are nonnegative on a basic semialgebraic set, then we could solve the POP problem to global optimality. To see this, note that the optimal value of problem (LABEL:eq:POP) is equal to the optimal value of the following problem:
(3) 
Here, we are trying to find the largest constant such that the polynomial is nonnegative on the set ; i.e., the largest lower bound on problem (LABEL:eq:POP).
Combinatorial optimization
Proofs of nonnegativity of polynomials of degree as low as four provide infeasibility certificates for all decision problems in the complexity class NP, including many wellknown combinatorial optimization problems. Consider, e.g., the simpletodescribe, NPcomplete problem of PARTITION [29]: Given a set of integers , decide if they can be split into two sets with equal sums. It is straightforward to see that a PARTITION instance is infeasible if and only if the degreefour polynomial
satisfies , . Indeed, is by construction a sum of squares and hence nonnegative. If it were to have a zero, each square in would have to evaluate to zero; but this can happen if and only if there is a binary vector , which makes , giving a yes answer to the PARTITION instance. Suppose now that for a given instance, and for some , we could prove that is nonnegative. Then we would have proven that our PARTITION instance is infeasible, potentially without trying out all ways of splitting the integers into two sets.
Control systems and robotics
Numerous fundamental problems in nonlinear dynamics and control, such as stability, robustness, collision avoidance, and controller design can be turned into problems about finding special functions—the socalled Lyapunov functions (see, e.g., [41])—that satisfy certain sign conditions. For example, given a differential equation , where , and with the origin as an equilibrium point (i.e., satisfying ), consider the “region of attraction (ROA) problem”: Determine the set of initial states in from which the trajectories flow to the origin. Lyapunov’s stability theorem (see, e.g., [41, Chap. 4]) tells us that if we can find a (Lyapunov) function , which together with its gradient satisfies
(4) 
then the set is part of the region of attraction. If is a polynomial function (a very important case in applications [65]), and if we parameterize as a polynomial function, then the search for the coefficients of satisfying the conditions in (LABEL:eq:poly_Lyap_inequalities) is an optimization problem over the set of nonnegative polynomials. Designing stable equilibrium points with large regions of attraction is a fundamental problem in control engineering and robotics; see, e.g., [40], [54].
Statistical regression with shape constraints
A problem that arises frequently in statistics is that of fitting a function to a set of data points with minimum error, while ensuring that it meets some structural properties, such as nonnegativity, monotonicity, or convexity [32], [34]. Requirements of this type are typically imposed either as regularizers (to avoid overfitting), or more importantly as a result of prior knowledge that the true function to be estimated satisfies the same structural properties. In economics, for example, a regression problem for estimating the utility of consumers from sample measurements would come with a natural concavity requirement on the utility function. When the regression functions are polynomials, many such structural properties lead to polynomial nonnegativity constraints. For example, a polynomial can be constrained to be concave by requiring its Hessian matrix , which is an symmetric matrix with polynomial entries, to be negative semidefinite for all . This condition is easily seen to be equivalent to the polynomial in the variables and being nonnegative.
Probability bounds given moments
The wellknown inequalities of Markov and Chebyshev in probability theory bound the probability that a univariate random variable takes values in a subset of the real line given the first, or the first and second moments of its distribution [10]. Consider a vast generalization of this problem where one is given a finite number of moments of a multivariate multivariate random variable and is interested in bounding the probability of an event , where is a general basic semialgebraic subset of the sample space . A sharp upper bound on this probability can be obtained by solving the following optimization problem for the coefficients of an variate degree polynomial (see [14], [70]):
(5) 
It is easy to see that any feasible solution of problem (LABEL:eq:probability.bound.opt) gives an upper bound on the probability of the event as the polynomial is by construction placed above the indicator function of the set and the expected value of this indicator function is precisely the probability of the event . Note that the constraints in (LABEL:eq:probability.bound.opt) are polynomial nonnegativity conditions.
Other applications
The application areas outlined above are only a few examples on a long list of problems that can be tackled by optimizing over the set of nonnegative polynomials. Other interesting problems on this list include: computation of equilibria in games with continuous strategy spaces [67], distinguishing separable and entangled states in quantum information theory [25], computing bounds on sphere packing problems in geometry [7], software verification [77], robust and stochastic optimization [12], filter design [82], and automated theorem proving [36]. A great reference for many applications in this area, as well as the related theoretical and computational background, is a recent volume edited by Blekherman, Parrilo, and Thomas [15].
Organization of the paper
The remainder of this paper is structured as follows. In Section LABEL:sec:sos.SDP.background, we briefly review the sum of squares approach and its relation to semidefinite programming for the general reader. We also comment on its challenges regarding scalability and prior work that has tackled this issue. The familiar reader can skip to Section LABEL:sec:dsos.sdsos, where our contributions begin. In Section LABEL:subsec:dsos.sdsos.cones, we introduce the cone of dsos and sdsos polynomials and demonstrate that we can optimize over them using linear and second order cone programming respectively. Section LABEL:subsec:rdsos.rsdsos introduces hierarchies of cones based on dsos and sdsos polynomials that can better approximate the cone of nonnegative polynomials. Asymptotic guarantees on these cones based on classical Positivstellensätze are also presented. Section LABEL:sec:experiments presents numerical experiments that use our approach on largescale examples from polynomial optimization (Section LABEL:subsec:experiments.pop), combinatorial optimization (Section LABEL:sec:copositivity), statistics and machine learning (Sections LABEL:subsec:experiments.convex.regression and LABEL:subsec:experiments.sparse.pca), derivative pricing (Section LABEL:subsec:experiments.options.pricing), and control theory and robotics (Section LABEL:sec:controls_apps). Section LABEL:sec:improvements provides a review of recent techniques for bridging the gap between the DSOS/SDSOS approach and the SOS approach at the expense of additional computational costs. Section LABEL:sec:conclusions concludes the paper. The Appendix to the paper introduces iSOS (“inside SOS”), an accompanying MATLAB package for DSOS and SDSOS optimization written using the SPOT toolbox [57].
2 Review of the semidefinite programmingbased approach and computational considerations
As mentioned earlier, sum of squares optimization handles a (global) nonnegativity constraint on a polynomial by replacing it with the stronger requirement that be a sum of squares. The situation where is only constrained to be nonnegative on a certain basic semialgebraic set^{2}^{2}2In this formulation, we have avoided equality constraints for simplicity. Obviously, there is no loss of generality in doing this as an equality constraint can be imposed by the pair of inequality constraints .
can also be handled with the help of appropriate sum of squares multipliers. For example, if we succeed in finding sos polynomials , such that
(6) 
then we have found a certificate of nonnegativity of on the set . Indeed, if we evaluate the above expression at any , nonnegativity of the polynomials imply that . A Positivstellensatz from real algebraic geometry due to Putinar [72] states that if the set satisfies the socalled Archimedean property, a property only slightly stronger than compactness (see, e.g., [47] for a precise definition), then every polynomial positive on has a representation of the type (LABEL:eq:p=s0+sigi), for some sos polynomials of high enough degree (see also [62] for degree bounds). Even with no qualifications whatsoever regarding the set , there are other Positivstellensätze (e.g., due to Stengle [80]) that certify nonnegativity of a polynomial on a basic semialgebraic set using sos polynomials. These certificates are only slightly more complicated than (LABEL:eq:p=s0+sigi) and involve sos multipliers associated with products among polynomials that define [66]. A great reference for the interested reader is the survey paper by Laurent [47].
The computational advantage of a certificate of (global or local) nonnegativity via sum of squares polynomials is that it can be automatically found by semidefinite programming. The following wellknown theorem establishes the link between sos polynomials and SDP. Let us denote the set of symmetric matrices by . Recall that a matrix is positive semidefinite (psd) if , and that semidefinite programming is the problem of optimizing a linear function over psd matrices subject to affine inequalities on their entries [85]. We denote the positive semidefiniteness of a matrix with the standard notation .
Theorem 1 (see, e.g., [65],[66]).
A multivariate polynomial in variables and of degree is a sum of squares if and only if there exists a symmetric matrix (often called the Gram matrix) such that
(7) 
where is the vector of monomials of degree up to :
Searching for a matrix satisfying the positive semidefiniteness constraint, as well as the linear equality constraints coming from (LABEL:eq:p=z'Qz), amounts to solving a semidefinite program. The size of the matrix in this theorem is
which approximately equals . While this number is polynomial in for fixed , it can grow rather quickly even for lowdegree polynomials. For example, a degree polynomial () in variables has 316251 coefficients and its Gram matrix, which would need to be positive semidefinite, contains 879801 decision variables. A semidefinite constraint of this size is quite expensive, and in fact a problem with variables is far beyond the current realm of possibilities in SOS optimization. In the absence of problem structure, sum of squares problems involving degree or polynomials are currently limited, roughly speaking, to a handful or a dozen variables.
There have been many contributions already to improvements in scalability of sum of squares techniques. One approach has been to develop systematic techniques for taking advantage of problem structure, such as sparsity or symmetry of the underlying polynomials, to reduce the size of the SDPs [30], [84], [23], [76]. These techniques have proven to be very useful as problems arising in practice sometimes do come with a lot structure. Another approach which holds promise has been to design customized solvers for special classes of SOS programs and avoid restoring to an offtheshelf interior point solver. Examples in this direction include the works in [11], [63], [86], [49]. There has also been recent work by Lasserre et al. that increases scalability of sum of squares optimization problems at the cost of accuracy of the solutions obtained. This is done by bounding the size of the largest SDP constraint appearing in the sos formulation, and leads to what the authors call the BSOS (bounded SOS) hierarchy [46].
The approach we take in this paper for enhancing scalability is orthogonal to the ones mentioned above (and can potentially later be combined with them). We propose to eschew sum of squares decompositions to begin with. In our view, almost all application areas of this field use sum of squares decompositions not for their own sake, but because they provide a means to polynomial nonnegativity. Hence, this paper is motivated by a natural question: Can we give other sufficient conditions for polynomial nonnegativity that are perhaps stronger than a sum of squares decompostion, but cheaper to work with?
In the next section, we identify some subsets of the cone of nonnegative polynomials that one can optimize over using linear programming (LP) and second order cone programming (SOCP) [51], [5]. Not only are these much more efficiently solvable classes of convex programs, but they are also superior to semidefinite programs in terms of numerical conditioning. In addition, working with these classes of convex programs allows us to take advantage of highperformance LP and SOCP solvers (e.g., CPLEX [20], Mosek [58]) that have been matured over several decades because of industry applications. We remark that while there have been other approaches to produce LP hierarchies for polynomial optimization problems (e.g., based on the KrivineStengle certificates of positivity [42, 80, 45]), these LPs, though theoretically significant, are typically quite weak in practice and often numerically illconditioned [46].
3 DSOS and SDSOS Optimization
We start with some basic definitions. A monomial in variables is a function of the form , where each is a nonnegative integer. The degree of such a monomial is by definition . A polynomial is a finite linear combination of monomials . The degree of a polynomial is the maximum degree of its monomials. A form or a homogeneous polynomial is a polynomial whose monomials all have the same degree. For any polynomial of degree , one can define its homogenized version by introducing a new variable and defining . One can reverse this operation (dehomogenize) by setting the homogenizing variable equal to one: . The properties of being nonnegative and a sum of squares (as defined earlier) are preserved under homogenization and dehomogenization [74]. We say that a form is positive definite if for all . We denote by and the set of nonnegative (a.k.a. positive semidefinite) and sum of squares polynomials in variables and degree respectively. (Note that odddegree polynomials can never be nonnegative.) Both these sets form proper (i.e., convex, closed, pointed, and solid) cones in with the obvious inclusion relationship .
The basic idea behind our approach is to approximate from the inside with new sets that are more tractable for optimization purposes. Towards this goal, one may think of several natural sufficient conditions for a polynomial to be a sum of squares. For example, the following may be natural first candidates:

The set of polynomials that are sums of 4th powers of polynomials:
, 
The set of polynomials that are a sum of a constant number of squares of polynomials; e.g., a sum of three squares:
Even though both of these sets clearly reside inside the set of sos polynomials, they are not any easier to optimize over. In fact, they are much harder: testing whether a polynomial is a sum of 4th powers is NPhard already for quartics [38] (in fact, the cone of 4th powers of linear forms is dual to the cone of nonnegative quartic forms [75]) and optimizing over polynomials that are sums of three squares is intractable (as this task even for quadratics subsumes the NPhard problem of positive semidefinite matrix completion with a rank constraint [48]). These examples illustrate that in general, an inclusion relationship does not imply anything with respect to optimization complexity. Hence, we need to take some care when choosing which subsets of to work with. On the one hand, these subsets have to be “big enough” to be useful in practice; on the other hand, they should be more tractable to optimize over.
3.1 The cone of dsos and sdsos polynomials
We now describe two interesting cones inside that lend themselves to linear and second order cone representations and are hence more tractable for optimization purposes. These cones will also form the building blocks of some more elaborate cones (with improved performance) that we will present in Subsection LABEL:subsec:rdsos.rsdsos and Section LABEL:sec:improvements.
Definition 2.
A polynomial is diagonallydominantsumofsquares (dsos) if it can be written as
(8) 
for some monomials and some nonnegative scalars . We denote the set of polynomials in variables and degree that are dsos by .
Definition 3.
A polynomial is scaleddiagonallydominantsumofsquares (sdsos) if it can be written as
(9) 
for some monomials and some scalars with . We denote the set of polynomials in variables and degree that are sdsos by .
From the definition, the following inclusion relationships should be clear:
In general, all these containment relationships are strict. Figure LABEL:fig:dsos.sdsos.sos shows these sets for a family of bivariate quartics parameterized by two coefficients :^{3}^{3}3It is well known that all psd bivariate forms are sos [74] and hence the outer set exactly characterizes all values of and for which this polynomial is nonnegative.
(10) 
Some elementary remarks about the two definitions above are in order. It is not hard to see that if has degree , the monomials in (LABEL:eq:dsos) or (LABEL:eq:sdsos) never need to have degree higher than . In the special case where is homogeneous of degree (i.e., a form), the monomials can be taken to have degree exactly . We also note that decompositions given in (LABEL:eq:dsos) or (LABEL:eq:sdsos) are not unique; there can even be infinitely many such decompositions but the definition requires just one.
Our terminology in Definitions LABEL:def:dsos and LABEL:def:sdsos comes from a connection (which we will soon establish) to the following classes of symmetric matrices.
Definition 4.
A symmetric matrix is diagonally dominant (dd) if
for all . A symmetric matrix is scaled diagonally dominant (sdd) if there exists a diagonal matrix , with positive diagonal entries, such that is dd.^{4}^{4}4Checking whether a given matrix is sdd can be done by solving a linear program since the definition is equivalent to existence of an elementwise positive vector such that We denote the set of dd and sdd matrices with and respectively.
It follows from Gershgorin’s circle theorem [31] that diagonally dominant matrices are positive semidefinite. This implies that scaled diagonally dominant matrices are also positive semidefinite as the eigenvalues of have the same sign as those of . Hence, if we denote the set of positive semidefinite matrices by , the following inclusion relationships are evident:
These containments are strict for . In Figure LABEL:fig:dd.sdd.psd, we illustrate the set of and for which the matrix
is diagonally dominant, scaled diagonally dominant, and positive semidefinite. Here, is the identity matrix and the symmetric matrices and were generated randomly with iid entries sampled from the standard normal distribution. Our interest in these inner approximations to the set of psd matrices stems from the fact that optimization over them can be done by linear and second order cone programming respectively (see Theorem LABEL:thm:dsos.sdsos.LP.SOCP below, which is more general). For now, let us relate these matrices back to dsos and sdsos polynomials.
Theorem 5.
A polynomial of degree is dsos if and only if it admits a representation as , where is the standard monomial vector of degree and is a dd matrix.
The following lemma by Barker and Carlson provides an extreme ray characterization of the set of diagonally dominant matrices and will be used in the proof of the above theorem.
Lemma 6 (Barker and Carlson [8]).
Let be the set of all nonzero vectors in with at most nonzero components, each equal to . Then, a symmetric matrix is diagonally dominant if and only if it can be written as
for some .
Proof (Proof of Theorem LABEL:thm:dsos.dd).
Suppose first that with diagonally dominant. Lemma LABEL:lem:dd.corners implies that
where and is the set of all nonzero vectors in with at most nonzero components, each equal to . In this sum, the vectors that have only one nonzero entry lead to the first sum in the dsos expansion (LABEL:eq:dsos). The vectors with two s or two s lead to the second sum, and those with one and one lead to the third.
For the reverse direction, suppose is dsos, i.e., has the representation in (LABEL:eq:dsos). We will show that
where each is dd and corresponds to a single term in the expansion (LABEL:eq:dsos). As the sum of dd matrices is dd, this would establish the proof. For each term of the form we can take to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial in , with this entry equaling . For each term of the form , we can take to be a matrix of all zeros except for four entires corresponding to the monomials and in . All these four entries will be set to . Similarly, for each term of the form , we can take to be a matrix of all zeros except for four entires corresponding to the monomials and in . In this submatrix, the diagonal elements will be and the offdiagonal elements will be . Clearly, all the matrices we have constructed are dd.
Theorem 7.
A polynomial of degree is sdsos if and only if it admits a representation as , where is the standard monomial vector of degree and is a sdd matrix.
Proof.
Suppose first that with scaled diagonally dominant. Then, there exists a diagonal matrix , with positive diagonal entries, such that is diagonally dominant and
Now an argument identical to that in the first part of the proof of Theorem LABEL:thm:dsos.dd (after replacing with ) gives the desired sdsos representation in (LABEL:eq:sdsos).
For the reverse direction, suppose is sdsos, i.e., has the representation in (LABEL:eq:sdsos). We show that
where each is sdd and corresponds to a single term in the expansion (LABEL:eq:sdsos). As the sum of sdd matrices is sdd,^{5}^{5}5This claim is not obvious from the definition but is apparent from Lemma LABEL:lem:sdd=sum.2x2 below which implies that is a convex cone. this would establish the proof. Indeed, for each term of the form we can take (once again) to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial in , with this entry equaling .
For each term of the form , we can take to be a matrix of all zeros except for four entires corresponding to the monomials and in . This block will then be . Similarly, for each term of the form , we can take to be a matrix of all zeros except for four entires corresponding to the monomials and in . This block will then be . It remains to show that a rank1 positive semidefinite matrix is sdd.^{6}^{6}6This also implies that any positive semidefinite matrix is sdd. Let be such a matrix. Observe that
is dd and hence is by definition sdd. (Note that if or are zero, then is already dd.)
The following characterizations of sdd matrices will be important for us.
Theorem 8 (see theorems 8 and 9 by Boman et al. [16]).
A symmetric matrix is sdd if and only if it has “factor width” at most 2; i.e., a factorization , where each column of has at most two nonzero entries.
Lemma 9.
A symmetric matrix is sdd if and only if it can be expressed as
(11) 
where each is an matrix with zeros everywhere except for four entries , which make the matrix symmetric and positive semidefinite.
Proof.
This is almost immediate from Theorem LABEL:thm:sdd.factorwidth2: If for some matrix , then where denotes the th column of . Since each column has at most two nonzero entries, say in positions and , each matrix will be exactly of the desired form in the statement of the lemma. Conversely, if can be written as in (LABEL:eq:sdd.Q=sum.2x2), then we can write each as where the vectors and have at most two nonzero entries. If we then construct a matrix which has the collection of the vectors and as columns, we will have
Theorem 10.
For any fixed , optimization over (resp. ) can be done with a linear program (resp. second order cone program) of size polynomial in .
Proof.
We will use the characterizations of dsos/sdsos polynomials given in theorems LABEL:thm:dsos.dd and LABEL:thm:sdsos.sdd. In both cases, the equality can be imposed by a finite set of linear equations in the coefficients of and the entries of (these match the coefficients of with those of ). The constraint that be dd can be imposed, e.g., by a set of linear inequalities
in variables and . This gives a linear program.
The constraint that be sdd can be imposed via Lemma LABEL:lem:sdd=sum.2x2 by a set of equality constraints that enforce equation (LABEL:eq:sdd.Q=sum.2x2). The constraint that each matrix be psd is a “rotated quadratic cone” constraint and can be imposed using SOCP [5, 51]:
In both cases, the final LP and SOCP are of polynomial size because the size of the Gram matrix is , which is polynomial in for fixed .
We have written publiclyavailable code that automates the process of generating LPs (resp. SOCPs) from dsos (resp. sdsos) constraints on a polynomial. More information about this can be found in the appendix.
The ability to replace SDPs with LPs and SOCPs is what results in significant speedups in our numerical experiments (see Section LABEL:sec:experiments). For the special case where the polynomials involved are quadratic forms, we get a systematic way of inner approximating semidefinite programs. A generic SDP that minimizes subject to the constraints and can be replaced by

a diagonally dominant program (DDP); i.e., a problem of the form
(12) s.t. which is an LP, or

a scaled diagonally dominant program (SDDP); i.e., a problem of the form
(13) s.t. which is an SOCP.
DDP and SDDP fit nicely into the framework of conic programming as they are optimization problems over the intersection of an affine subspace with a proper cone (see, e.g., [9, Chap. 2] for a review of conic programming). In sections LABEL:subsec:experiments.options.pricing and LABEL:subsec:experiments.sparse.pca, we show applications of this idea to problems in finance and statistics.
3.2 The cone of rdsos and rsdsos polynomials and asymptotic guarantees
We now present a hierarchy of cones based on the notions of dsos and sdsos polynomials that can be used to better approximate the cone of nonnegative polynomials.
Definition 11.
For an integer , we say that a polynomial is rdsos (resp. rsdsos) if
is dsos (resp. sdsos). We denote the set of polynomials in variables and degree that are rdsos (resp. rsdsos) by (resp. ).
Note that for we recover our dsos/sdsos definitions. Moreover, because the multiplier is nonnegative, we see that for any , the property of being rdsos or rsdsos is a sufficient condition for nonnegativity:
Optimizing over rdsos (resp. rsdsos) polynomials is still an LP (resp. SOCP). The proof of the following theorem is identical to that of Theorem LABEL:thm:dsos.sdsos.LP.SOCP and hence omitted.
Theorem 12.
For any fixed and , optimization over the set (resp. ) can be done with linear programming (resp. second order cone programming) of size polynomial in .
If we revisit the parametric family of bivariate quartics in (LABEL:eq:parametric.bivariate.quartic), the improvements obtained by going one level higher in the hierarchy are illustrated in Figure LABEL:fig:1dsos.1sdsos. Interestingly, for , rdsos and rsdsos tests for nonnegativity can sometimes outperform the sos test. The following two examples illustrate this.
Example 3.1.
Consider the Motzkin polynomial [59], which historically is the first known example of a nonnegative polynomial that is not a sum of squares. One can give an LPbased nonnegativity certificate of this polynomial by showing that . Hence, .
Example 3.2.
Consider the polynomial . Once again this polynomial is nonnegative but not a sum of squares [74]. However, by solving an LP feasibility problem, one can check that , which proves that is nonnegative, and that .
It is natural to ask whether every nonnegative polynomial is rdsos (or rsdsos) for some ? The following theorems deal with this question and provide asymptotic guarantees on rdsos (and hence rsdsos) hierarchies. Their proofs rely on classical Positivstellensätze from real algebraic geometry.
Theorem 13.
Let be an even^{7}^{7}7An even polynomial is a polynomial whose individual variables are raised only to even degrees. positive definite form. Then, there exists an integer for which is rdsos (and hence rsdsos).
Proof.
A wellknown theorem of Pólya [69] states that if a form is positive on the simplex (i.e., the set ), then there exists an integer such that all coefficients of
are positive. Under the assumptions of the theorem, this implies that there is an for which the form and hence the form
have positive coefficients. But this means that is a nonnegative weighted sum of squared monomials and therefore clearly dsos (see (LABEL:eq:dsos)). (In a Gram matrix representation , the argument we just gave shows that we can take to be diagonal.)
We remark that even forms already constitute a very interesting class of polynomials since they include, e.g., all polynomials coming from copositive programming; see Subsection LABEL:sec:copositivity. In fact, any NPcomplete problem can be reduced in polynomial time to the problem of checking whether a degree4 even form is nonnegative [60]. An application of this idea to the independent set problem in combinatorial optimization is presented in Subsection LABEL:sec:copositivity.
The next proposition shows that the evenness assumption cannot be removed from Theorem LABEL:thm:even.forms.rdsos.
Proposition 14.
For any , the quadratic form
(14) 
is positive definite but not rsdsos for any .
Proof.
Positive definiteness is immediate from having . Let us show that the quadratic form is not rsdsos for any . Consider the polynomial
which is a form of degree . Note that the coefficient of in is . Observe also that has the following two monomials: and . Suppose for the sake of contradiction that was sdsos. Then it could be written as
where all monomials and are of degree . So to produce the monomials and in , the right hand side above must include the terms
But this also produces , which cannot be canceled. This contradicts the fact that the coefficient of in is .
To see that for any the polynomial in the statement of the proposition cannot be rsdsos either, one can repeat the exact same argument and observe that the coefficient of would still not match.
We remark that the form in (LABEL:eq:not.rsdsos.poly) can be proved to be positive the improvements presented in Section LABEL:sec:improvements, in particular with a “factor width 3” proof system from Subsection LABEL:subsec:factor.width. Alternatively, one can work with the next theorem, which removes the assumption of evenness from Theorem LABEL:thm:even.forms.rdsos at the cost of doubling the number of variables and the degree.^{8}^{8}8Instead of Theorem LABEL:thm:p.dsos.Polya, an earlier version of this paper was quoting a result by Habicht [33] from [73, p. 1], [71, p. 8]. We thank Claus Scheiderer for informing us that the statement of Habicht [33] in German was being misquoted by us and the above references.
Theorem 15 (see Section 4 of [2]).
An variate form of degree is positive definite if and only if there exists a positive integer that makes the following form dsos:
In [2, Section 4], this theorem is used to construct LP and SOCPbased converging hierarchies of lower bounds for the general polynomial optimization problem (LABEL:eq:POP) when the feasible set is compact.
4 Numerical examples and applications
In this section, we consider several numerical examples that highlight the scalability of the approach presented in this paper. Comparisons to existing methods on examples considered in the literature are provided whenever possible. A software package written using the Systems Polynomial Optimization Toolbox (SPOT) [57] includes a complete implementation of the presented methods and is available online^{9}^{9}9Link to spotless_isos software package: https://github.com/anirudhamajumdar/spotless/tree/spotless_isos. The toolbox features efficient polynomial algebra and allows us to setup the largescale LPs and SOCPs arising from our examples. The Appendix provides a brief introduction to the software package and the associated functionality required for setting up the DSOS/SDSOS programs considered in this paper. The LPs and SOCPs resulting from our programs are solved with the MOSEK solver on a machine with 4 processors and a clock speed of 3.4 GHz and 16GB RAM. Running times for comparisons to SDPbased approaches are presented for the MOSEK SDP solver^{10}^{10}10We note that the MOSEK solver is much faster on our problems than the more commonly used SDP solver SeDuMi [81]..
4.1 Lower bounds on polynomial optimization problems
An important application of sum of squares optimization is to obtain lower bounds on polynomial optimization problems. In this subsection, we consider the particular problem of minimizing a homogeneous polynomial of degree on the unit sphere (i.e., the set ). This wellstudied problem is strongly NPhard even when . Its optimal value is easily seen to be equivalent to the optimal value of the following problem:
(15)  
s.t. 
By replacing the nonnegativity condition with a SOS/DSOS/SDSOS constraint, the resulting problem becomes an SDP/LP/SOCP. Tables LABEL:tab:min_hom_opts and LABEL:tab:min_hom_runtimes compare optimal values and running times respectively on problems of increasing size. We restrict ourselves to quartic forms and vary the number of variables between and . Table LABEL:tab:min_hom_opts compares the optimal values for the DSOS, SDSOS, 1DSOS, 1SDSOS and SOS relaxations of problem (LABEL:eq:min_hom_sphere) on quartic forms whose coefficients are fully dense and drawn independently from the standard normal distribution (we consider a single random instance for each ). We note that the optimal values presented here are representative of numerical experiments we performed for a broader set of random instances (including coefficients drawn from a uniform distribution instead of the normal distribution). Also, in all our experiments, we could verify that the SOS bound actually coincides with the true global optimal value of the problem. This was done by finding a point on the sphere that produced a matching upper bound.
As the tables illustrate, both DSOS and SDSOS scale up to problems of dimension . This is well beyond the size of problems handled by SOS programming, which is unable to deal with problems with larger than due to memory (RAM) constraints. While the price we pay for this increased scalability is suboptimality, this is tolerable in many applications (as we show in other examples in this section).
Table LABEL:tab:min_hom_opts also compares the approach presented in this paper to the lower bound obtained at the root node of the branchandbound procedure in the popular global optimization package BARON [78].^{11}^{11}11We thank Aida Khajavirad for running the BARON experiments for us. This comparison was only done up to dimension 30 since beyond this we ran into difficulties passing the large polynomials to BARON’s user interface.
n = 10  n = 15  n = 20  n = 25  n = 30  n = 40  n = 50  n = 60  n = 70  
DSOS  5.31  10.96  18.012  26.45  36.85  62.30  94.26  133.02  178.23 
SDSOS  5.05  10.43  17.33  25.79  36.04  61.25  93.22  131.64  176.89 
1DSOS  4.96  9.22  15.72  23.58  NA  NA  NA  NA  NA 
1SDSOS  4.21  8.97  15.29  23.14  NA  NA  NA  NA  NA 
SOS  1.92  3.26  3.58  3.71  NA  NA  NA  NA  NA 
BARON  175.41  1079.89  5287.88    28546.1         
Table LABEL:tab:min_hom_runtimes presents running times^{12}^{12}12The BARON instances were implemented on a 64bit Intel Xeon X5650 2.66Ghz processor using the CPLEX LP solver. averaged over random instances for each . There is a significant difference in running times between DSOS/SDSOS and SOS beginning at . While we are unable to run SOS programs beyond , DSOS and SDSOS programs for take only a few minutes to execute. Memory constraints prevent us from executing programs for larger than . The results indicate, however, that it may be possible to run larger programs within a reasonable amount of time on a machine equipped with more RAM.
n = 10  n = 15  n = 20  n = 25  n = 30  n = 40  n = 50  n = 60  n = 70  
DSOS  0.30  0.38  0.74  15.51  7.88  10.68  25.99  58.10  342.76 
SDSOS  0.27  0.53  1.06  8.72  5.65  18.66  47.90  109.54  295.30 
1DSOS  0.92  6.26  37.98  369.08  
1SDSOS  1.53  14.39  82.30  538.5389  
SOS  0.24  5.60  82.22  1068.66  
BARON  0.35  0.62  3.69             
4.2 Copositive programming and combinatorial optimization
A symmetric matrix is copositive if for all (i.e., for all in the nonnegative orthant). The problem of optimizing a linear function over affine sections of the cone of copositive matrices has recently received a lot of attention from the optimization community [27], since it can exactly model several problems of combinatorial and nonconvex optimization [17], [27], [19]. For instance, the size of the largest independent set ^{13}^{13}13An independent set of a graph is a subset of its nodes no two of which are connected. The problem of finding upper bounds on has many applications in scheduling and coding theory [52]. of a graph on nodes is equal to the optimal value of the following copositive program [22]:
s.t.  (16) 
where is the adjacency matrix of the graph, is the identity matrix, and is the matrix of all ones.
It is easy to see that a matrix is copositive if and only if the quartic form , with , is nonnegative. Hence, by requiring this form to be rsos/rsdsos/rdsos, one obtains inner approximations to the cone that can be optimized over using semidefinite, second order cone, and linear programming. The sos version of this idea goes back to the PhD thesis of Parrilo [65].
In this section, we compare the quality of these bounds using another hierarchical LPbased method in the literature [17], [22]. The th level of this LP hierarchy (referred to as the Pólya LP from here onwards) requires that the coefficients of the form be nonnegative. This is motivated by a theorem by Pólya, which states that this constraint will be satisfied for large enough (assuming for all nonzero ). It is not difficult to see that the Pólya LP is always dominated by the LP which requires to be rdsos. This is because the latter requires the underlying Gram matrix to be diagonally dominant, while the former requires it to be diagonal, with nonnegative diagonal entries.
In [22], de Klerk and Pasechnik show that when the Pólya LP is applied to approximate the copositive program in (LABEL:eq:copositive.stable.set), then the precise independent set number of the graph is obtained within steps of the hierarchy. By the previous argument, the same claim immediately holds for the rdsos (and hence rsdsos) hierarchies.
Table LABEL:tab:indep.set.icos revisits Example 5.2 of [17], where upper bounds on the independent set number of the complement of the graph of an icosahedron are computed. (This is a graph on 12 nodes with independent set number equal to 3; see [17] for its adjacency matrix.) As the results illustrate, SOS programming provides an exact upper bound (since the size of the largest stable set is necessarily an integer). The second levels of the rdsos and rsdsos hierarchies also provide an exact upper bound. In contrast, the LP based on Polya’s theorem in [17] gives the upper bound for the th and the st level, and an upper bound of at the second level. In contrast to the Pólya LP, one can show that the th level of the rdsos LP always produces a finite upper bound on the independent set number. In fact, this bound will always be smaller than , where is the degree of node [1, Theorem 5.1] .
Polya LP  rDSOS  rSDSOS  SOS  
6.000  6.000  3.2362  
4.333  4.333  NA  
6.000  3.8049  3.6964  NA 
4.3 Convex regression in statistics
Next, we consider the problem of fitting a function to data subject to a constraint on the function’s convexity. This is an important problem in statistics and has a wide domain of applications including value function approximation in reinforcement learning, options pricing and modeling of circuits for geometric programming based circuit design [34, 35]. Formally, we are given pairs of data points with , and our task is to find a convex function from a family that minimizes an appropriate notion of fitting error (e.g., error):
s.t. 
The convexity constraint here can be imposed by requiring that the function in variables associated with the Hessian matrix of be nonnegative. Restricting ourselves to polynomial functions of bounded degree, we obtain the following optimization problem:
s.t. 
where is again the Hessian of . As before, we can replace the nonnegativity constraint with a dsos/sdsos/sos constraint. For our numerical experiment, we generated random vectors in drawn i.i.d. from the standard normal distribution. The function values were computed as follows:
where was chosen i.i.d. from the standard normal distribution. Tables LABEL:tab:convex_reg_errors and LABEL:tab:convex_reg_runtimes present the fitting errors and running times for the DSOS/SDSOS/SOS programs resulting from restricting the class of functions