Decomposition Methods for Nonlinear Optimization and Data Mining
BRANDON EMMANUEL DUTRA
B.S. (University of California, Davis) 2012
M.S. (University of California, Davis) 2015
Submitted in partial satisfaction of the requirements for the degree of
DOCTOR OF PHILOSOPHY
OFFICE OF GRADUATE STUDIES
UNIVERSITY OF CALIFORNIA
Jesus De Loera
Committee in Charge
© Brandon E. Dutra, 2016. All rights reserved.
- 1.1 Polyhedra and their representation
- 1.2 Generating functions for integration and summation
- 1.3 Handelman’s Theorem and polynomial optimization
- 1.4 Ehrhart Polynomials
- 1.5 MILP Heuristics
- 1.6 Using computational geometry in neuro-immune communication
- 2.1 Integrand: powers of linear forms
- 2.2 Integrand: products of affine functions
- 3 Polynomial Optimization
4 Top coefficients of the Ehrhart polynomial for knapsack polytopes
- 4.1 The residue formula for
- 4.2 Using the poset structure of the
- 4.3 Polyhedral reinterpretation of the generating function
- 4.4 Periodicity of coefficients
- 4.5 Summary of the algorithm to compute top coefficients
- 4.6 Experiments
- 5 MILP heuristic for finding feasible solutions
- 6 Application in distance geometry for neuro-immune communication
- A Computer source code
We focus on two central themes in this dissertation. The first one is on decomposing polytopes and polynomials in ways that allow us to perform nonlinear optimization. We start off by explaining important results on decomposing a polytope into special polyhedra. We use these decompositions and develop methods for computing a special class of integrals exactly. Namely, we are interested in computing the exact value of integrals of polynomial functions over convex polyhedra. We present prior work and new extensions of the integration algorithms. Every integration method we present requires that the polynomial has a special form. We explore two special polynomial decomposition algorithms that are useful for integrating polynomial functions. Both polynomial decompositions have strengths and weaknesses, and we experiment with how to practically use them.
After developing practical algorithms and efficient software tools for integrating a polynomial over a polytope, we focus on the problem of maximizing a polynomial function over the continuous domain of a polytope. This maximization problem is NP-hard, but we develop approximation methods that run in polynomial time when the dimension is fixed. Moreover, our algorithm for approximating the maximum of a polynomial over a polytope is related to integrating the polynomial over the polytope. We show how the integration methods can be used for optimization.
We then change topics slightly and consider a problem in combinatorics. Specifically, we seek to compute the function that counts the number of nonnegative integer solutions to the equation where the are given positive integers. It is known that this function is a quasi-polynomial function, and computing every term is -hard. Instead of computing every term, we compute the top terms of this function in polynomial time in varying dimension when is fixed. We review some applications and places where this counting function appears in mathematics. Our new algorithm for computing the largest order terms of is based on the polyhedral decomposition methods we used in integration and optimization. We also use an additional polyhedral decomposition: Barvinok’s fast decomposition of a polyhedral cone into unimodular cones.
The second central topic in this dissertation is on problems in data science. We first consider a heuristic for mixed-integer linear optimization. We show how many practical mixed-integer linear have a special substructure containing set partition constraints. We then describe a nice data structure for finding feasible zero-one integer solutions to systems of set partition constraints.
Finally, we end with an applied project using data science methods in medical research. The focus is on identifying how T-cells and nervous-system cells interact in the spleen during inflammation. To study this problem, we apply topics in data science and computational geometry to clean data and model the problem. We then use clustering algorithms and develop models for identifying when a spleen sample is responding to inflammation. This project’s lifetime surpasses the author’s involvement in it. Nevertheless, we focus on the author’s contributions, and on the future steps.
My advisers: Jesus De Loera and Matthias Köppe. These two people have played a big role during my time at UC Davis. I appreciated working with two experts with diverse backgrounds. They greatly enriched my experience in graduate school. My advisers have also been supportive of my career development. As I went through graduate school, my career and research goals changed a few times, but my advisers have been agile and supportive. Some of the most important career development experiences I had were from my three summer internships, and I’m grateful for their support in getting and participating in these opportunities.
Through undergraduate and graduate school at UC Davis, I have spent about 7 years working on the LattE project. I am very grateful for this enriching experience. I have enjoyed every aspect of working within the intersection of mathematical theory and mathematical software. LattE was my first experience with the software life cycle and with real software development tools like GNU Autotools, version control, unit testing, et cetera. I have grown to love software development through this work, and how it has enriched my graduate experience. I also want to acknowledge the LattE users. I am thankful they find our work interesting, useful, and care enough to tell us how they use it and how to improve it. I hope that future graduate students get the experience of working on this great project.
Professors. I wish to thank David Woodruff for being in my dissertation and qualifying exam committees. I also thank Dan Gusfield and Angela Cheer for also taking part in my qualifying exam committee. I also owe a thank you to Janko Gravner and Ben Morris for their support and letters of recommendation.
Co-authors. I wish to thank Velleda Baldoni (University of Rome Tor Vergata, Rome, Italy), Nicole Berline (École Polytechnique, Palaiseau, France), and Michèle Vergne (The Mathematical Institute of Jussieu, Paris Rive Gauche, Paris, France) for their work in Chapter 4. I also thank my collaborators from the UC Davis School of Veterinary Medicine who contributed to Chapter 6: Colin Reardon and Ingrid Brust-Mascher.
Life. I am also very lucky to have found my wife in graduate school. I am very happy, and I look forward to our life together. She is an amazing person, and my best friend.
Money. I am thankful for the funding I received from my advisers and the mathematics department. I am especially thankful for the funding I received over my first year and summer, which allowed me to focus on passing the preliminary examinations. I also owe a big thank you to my friend Swati Patel who played a monumental role in editing my NSF Graduate Research Fellowship Program application, which resulted in me obtaining the award! The financial support I received over the years greatly reduced stress and made the experience great. Money makes all the difference. I also wish to thank some organizations for their financial support for conferences: American Institute of Mathematics, Institute for Mathematics and Its Applications, and the Rocky Mountain Mathematics Consortium at the University of Wyoming. I was partially supported by NSF award number 0914107, and a significant amount of this dissertation was supported by NSF grant number DGE-1148897.
Internships. I am grateful for three great summer internships: two at SAS Institute (in 2013 and 2014 under Manoj Chari and Yan Xu, respectively), and one at Google, Inc. (in 2015 under Nicolas Mayoraz). All three showed me how diverse the software industry can be.
People. All family members alive and dead. Angel Castro. Lorenzo Medina. Andy Tan. Greg Webb. Anne Carey. Julia Mack. Tom Brounstein. Travis Scrimshaw. Gordon Freeman. Jim Raynor. Sarah Kerrigan. Dan Gusfield. Sean Davis. Mohamed Omar. Yvonne Kemper. Robert Hildebrand. Mark Junod.
I would like to end with a quote that perfectly captures why I like mathematics,
Despite some instances where physical application may not exist, mathematics has historically been the primary tool of the social, life, and physical sciences. It is remarkable that a study, so potentially pure, can be so applicable to everyday life. Albert Einstein questions, “How can it be that mathematics, being after all a product of human thought which is independent of experience, is so admirably appropriate to the objects of reality?” This striking duality gives mathematics both power and charm. [wapner2005pea, p. 171]
Chapter 1 Introduction
The first three chapters of this thesis are focused on the optimization of a polynomial function where the domain is a polytope. That is, we focus on the continuous optimization problem
where is a polytope and is a polynomial. As we review in Section 1.3.2, exactly computing the maximum of a polynomial over a polytopal domain is hard, and even approximating the maximum is still hard. However, this has not damped research in this area, and many of the popular methods for approximating the optimum depend on decomposing the polynomial function, approximating the polynomial function with similar functions, or decomposing the domain. References are numerous in the literature [anjos2011handbook, deKlerk2015, lasserre2009momentsBook, Lasserre2000929, Lasserre01globaloptimization, lasserre2002semidefinite, lasserre2011NewLook, marshall2008positive, parrilo2003SDP]. A common characteristic between these methods is their reliance on ideas in real semialgebraic geometry and semidefinite programming. A key contribution of this thesis is another algorithm for approximating the maximum of a polynomial function over . Unlike previous methods, our method is based on combinatorial results. When convenient, to help develop our tools for the continuous optimization problem, we also state analogous results for the discrete optimization problem
One key step of our method for approximating the polynomial optimization problem requires computing the integral where is some integer power. Chapter 2 is devoted to this step. Then Chapter 3 connects the pieces together and culminates in an efficient algorithm for the continuous polynomial optimization problem. Some of the tools developed, namely the way we apply polyhedral decompositions and generating functions, can also be applied to a different type of problem: computing the Ehrhart polynomial of a knapsack polytope. Chapter 4 addresses this idea.
The remaining chapters cover the second part of this thesis: topics in data science. In particular, Chapter 5 develops a useful heuristic for finding solutions to set partition constraints, which are a common constraint type in linear integer programming. Then Chapter 6 applies tools from distance geometry and cluster analysis to identify disease in spleens.
In this chapter, we review the background material used in all the other chapters. In the figure below, we suggest possible reading orders and identify which chapters builds upon topics in other chapters.
1.1. Polyhedra and their representation
Polytopes and polyhedra appear as a central object in this thesis. We state just the basic definitions and results that we need. For a complete review, see [barvinokzurichbook, de2013algebraic, schrijver, zieglerpolybook].
Let , then the combination with is called
linear with no restrictions on the
convex if it is affine and conical.
We can define a polytope as a special kind of convex set.
A set is convex if . This means, the line segment between and is in .
Let , the convex hull of is
Let , be a finite point set, then a polytope is .
Polytopes are the convex hull of finite point sets. But there are other ways to represent a polytope. Instead of looking at convex combinations, we can look at halfspaces:
Let , then is a halfspace. A halfspace is “one side” of a linear function.
Let be a finite intersection of halfspaces, then is called a polyhedron.
Take the unit square with vertices and in . The interior of the square is given by all convex combinations of the vertices. It is also given by all such that
but this can be rewritten as
or in matrix form as in
The unit square can be described by two different objects: as convex combinations of a point set, and the bounded intersection of finitely many half spaces. By the next theorem, these descriptions are equivalent as every polytope has these two representations.
Theorem 1.1.8 (Finite basis theorem for polytopes, Minkowski-Steinitz-Weyl, see Corollary 7.1c in [schrijver]).
A set is a polytope if and only if it is a bounded polyhedron.
Because both polytope representations are very important, there are many ways or algorithms to convert one to the other. Instead of describing convex hull algorithms and others, we will consider them a technology and seek an appropriate software tool when needed. For more details about transferring from one representation to another, see [4ti2, avis2000revised, cddlib-094a, polymake-software].
1.1.1. Polyhedral cones
A special type of unbounded polyhedra that will appear often is a polyhedral cone. Generally, a cone is a set that is closed under taking nonnegative scalar multiplication, and a convex cone is also closed under addition. For example the set is closed under nonnegative scalar multiplication because if then for any , but is not closed under addition. But if , then is and is a convex cone. We will always want cones to be convex, and we will use cone to mean convex cone.
A finitely generated cone has the form
for some finite collections of points .
A polyhedral cone is a cone of the form . Therefore, a polyhedral cone is a finite set of homogeneous linear inequalities.
Just as bounded polyhedra and polytopes are the same object, a polyhedral cone and a finitely generated cone are two descriptions of the same object.
Theorem 1.1.11 (Farkas-Minkowski-Weyl, see Corollary 7.1a in [schrijver]).
A convex cone is polyhedral if and only if it is finitely generated.
Let be a convex set, then the polar cone is .
The polar of a finitely generated cone is easy to compute.
(Polar of a finitely generated cone) Let , then the polar cone is the interception of a finite number of halfspaces: . Likewise, if is given by , then is generated by the rows of .
1.2. Generating functions for integration and summation
Chapters 2, 3, and 4 make great use of encoding values in generating functions. This section gives a general introduction to how they are used in the later chapters. For a more complete description of the topics in this section, see [barvinokzurichbook, BarviPom].
1.2.1. Working with generating functions: an example
Let us start with an easy example. Consider the one dimensional polyhedra in given by . We encode the lattice points of by placing each integer point as the power of a monomial, thereby obtaining the polynomial . The polynomial is called the generating function of . Notice that counting is equivalent to evaluating .
In terms of the computational complexity, listing each monomial in the polynomial results in a polynomial with exponential length in the bit length of . However, we can rewrite the summation with one term:
Counting the number of points in is no longer as simple as evaluating at because this is a singularity. However, this singularity is removable. One could perform long-polynomial division, but this would result in a exponentially long polynomial in the bit length of . Another option that yields a polynomial time algorithm would be to apply L’Hospital’s rule:
Notice that can be written in two ways:
The first two rational expressions have a nice description in terms of their series expansion:
For the second two rational functions, we have to be careful about the domain of convergence when computing the series expansion. Notice that in the series expansion,
adding the terms when or results in the desired polynomial: . But we can also get the correct polynomial by adding the series that correspond to different domains of convergence. However, to do this we must now add the series which corresponds to the polyhedra that is the entire real line:
Hence by including the series , we can perform the series expansion of by computing the series expansion of each term on potentially different domains of convergence.
In the next sections, we will develop a rigorous justification for adding the series .
1.2.2. Indicator functions
The indicator function, , of a set takes two values: if and otherwise.
The set of indicator functions on spans a vector space with point-wise additions and scalar multiplication. The set also has an algebra structure where , and .
Recall the cone of a set is all conic combinations of the points from :
Let be a polyhedron and . Then the tangent cone, of at is the polyhedral cone
Let be a polyhedron and . Then the cone of feasible directions, of at is the polyhedral cone .
Note that if is a vertex of , and is given by an inequality description, then the tangent cone is the intersection of inequalities that are tight at . Also, includes the affine hull of the smallest face that is in, so the tangent cone is pointed only if is a vertex. The difference between a tangent cone and a cone of feasible directions is that the latter is shifted to the origin.
When is a face of , we will also use the notation to denote where is any interior point of .
Theorem 1.2.4 ([brianchon1837], [gram1871]).
Let be a polyhedron, then
where the sum ranges over all faces of including but excluding
This theorem is saying that if the generating function of a polytope is desired, it is sufficient to just find the generating function for every face of . The next corollary takes this a step further and says it is sufficient to just construct the generating functions associated at each vertex. This is because, as we will see, the generating functions for non-pointed polyhedra can be ignored.
Let be a polyhedron, then
where is the vertex set of .
1.2.3. Generating functions of simple cones
In this section, we quickly review the generating function for summation and integration when the polyhedron is a cone.
And there’s still confusion regarding multiplication: To make a vector space, you need addition of two elements and multiplication of an element by a scalar (field element). The multiplication of two indicator functions is NOT a multiplication by a scalar. Instead, multiplication by a scalar is really just scaling a function: Take indicator function of positive real numbers: f(x) = 1 if x¿=0; 0 if x ¡ 0. Take a real number, say 7. Then (7 . f)(x) = 7 if x ¿= 0; 0 if x ¡ 0. This makes the ”algebra of polyhedra” a real vector space. But the algebra of polyhedra is also an ”algebra”. For that you need another multiplication, namely the multiplication of two elements; and that is the multiplication that you describe (actually it’s the bilinear extension of what you describe – because the multiplication needs to be defined not only for two indicator functions, but for two R-linear combinations of indicator functions).
Let and be vector spaces. Let be the real vector space spanned by the indicator functions of all polyhedra in where scalar multiplication is with a real number and an indicator function, and the addition operator is addition of indicator functions. When is equipped with the additional binary operation from to representing multiplication of indicator functions, then is called the algebra of polyhedra. A valuation is a linear transformation .
The next Proposition serves as a basis for all the summation algorithms we will discus. Its history can be traced to Lawrence in [lawrence91-2], and Khovanskii and Pukhlikov in [pukhlikov1992riemann]. It is well described as Theorem 13.8 in [barvinokzurichbook].
There exists a unique valuation which associates to every rational polyhedron a meromorphic function in so that the following properties hold
If such that is summable over the lattice points of , then
For every point , one has
If contains a straight line, then .
A consequence of the valuation property is the following fundamental theorem. It follows from the Brion–Lasserre–Lawrence–Varchenko decomposition theory of a polyhedron into the supporting cones at its vertices [barvinokzurichbook, beck-haase-sottile:theorema, Brion88, lasserre-volume1983].
Let be a polyhedron with set of vertices . For each vertex , let be the cone of feasible directions at vertex . Then
This last lemma can be identified as the natural result of combining Corollary 1.2.5 and Proposition 1.2.7 part (3). A non-pointed polyhedron is another characterization of a polyhedron that contains a line.
Note that the cone in Lemma 1.2.8 may not be simplicial, but for simplicial cones there are explicit rational function formulas. As we will see in Proposition 1.2.12, one can derive an explicit formula for the rational function in terms of the geometry of the cones.
For a simplicial full-dimensional pointed cone generated by rays (with vertex ) where and for any point
where This identity holds as a meromorphic function of and pointwise for every such that for all .
The set is often called the half-open fundamental parallelepiped of . It is also common to force each ray to be primitive, meaning that the greatest common divisor of the elements in is one, and this can be accomplished by scaling each ray.
The continuous generating function for almost mirrors the discrete case. It can again be attributed to Lawrence, Khovanskii, and Pukhlikov, and appears as Theorem 8.4 in [barvinokzurichbook].
There exists a unique valuation which associates to every polyhedron a meromorphic function so that the following properties hold
If is a linear form such that is integrable over with the standard Lebesgue measure on , then
For every point , one has
If contains a line, then .
Let be a polyhedron with set of vertices . For each vertex , let be the cone of feasible directions at vertex . Then
For a simplicial full-dimensional pointed cone generated by rays (with vertex ) and for any point
These identities holds as a meromorphic function of and pointwise for every such that for all .
Let , then it is a well known fact from calculus that
However, the domain cannot be decomposed in every way. For example
Notice that not only is the expression on the right hand side of the equation not equal to the left hand side, but there is no value for that makes the three integrals finite. However, results in this section allow us to assign numbers (or meromorphic functions) to the integrals that do not converge!
We now consider an example in dimension two. Consider the triangle below with coordinates at , , and . This domain can we written as the sum of 7 polyhedrons: addition of three tangent cones, subtraction of three halfspaces, and the addition of one copy of .
For example, the point is part of the triangle, so it is counted once. In the decomposition, the point is counted positively four times (once in each tangent cone and once in ), and is counted negatively three times (once in each halfspace), resulting in being counted exactly once. A similar calculation shows that is counted negatively in one of the halfspaces, and positively in , resulting in a total count of zero, meaning is not part of the triangle.
The integral of over the triangle clearly exist because the function is continuous and the domain is compact. As the triangle can be written as the sum of 7 other polyhedrons, we want to integrate over each of the 7 polyhedrons. However, the integral of over some of them does not converge! Instead, we map each domain to a meromorphic function using Propositions 1.2.10 and 1.2.12. Because is a valuation, we can apply to each domain. The fact that if contains a line, simplifies the calculation to just the three tangent cones: , , and .
Propositions 1.2.10 and 1.2.12 say that the meromorphic function associated with the triangle is equal to the sum of the three meromorphic functions associated at each tangent cone. Moreover, because the integral over the triangle exist, the meromorphic function associated with the triangle gives the integral. For example, evaluating the meromorphic function at results in , which is the integral of over the triangle.
There is one problem. The integral of over the triangle is a holomorphic function, and we have written it as the sum of three meromorphic functions, so this means the poles of the meromorphic functions must cancel in the sum. Consider evaluating at . This would produce division by zero, and so is among the poles. A common approach is to instead evaluate at , and take the limit as . Hence
Notice that for each , is a meromorphic in , but is a holomorphic function (as it is the integral of over the triangle). This means that in the Laurent series expansion of , any terms where has a negative exponent will cancel out in the sum. Thus the limit can be computed by finding the Laurent series expansion at for each and summing the coefficient of in each Laurent series. Chapter 2 will show that computing the Laurent series is easy in this case.
This is a common technique, and we will see it used many times in this manuscript.
1.2.4. Generating function for non-simple cones
Lemma 1.2.8 and Proposition 1.2.9 (or Lemma 1.2.11 and Proposition 1.2.12) can be used for computing the summation (or integral) over a polytope only if the polytope is a simple polytope. Meaning, for a -dimensional polytope, every vertex of the polytope is adjacent to exactly edges.
In this section, we review the generating function of and for a general polytope . When is not simple, the solution is to triangulate it the tangent cones.
A triangulation of a cone is the set of simplicial cones of the same dimension as the affine hull of such that
the union of all the simplicial cones in is ,
the intersection of any pair of simplicial cones in is a common face of both simplicial cones,
and every ray of every simplicial cone is also a ray of .
There are many references and software tools for computing a triangulation of a polytope or polyhedral cone, see [4ti2, avis2000revised, DRStriangbook, cddlib-094a, polymake-software, leeRegularTriangulations, rambau2002topcom].
Let be a full-dimensional pointed polyhedral cone, and be a triangulation into simplicial cones where is a finite index set. It is true that , but is false as points on the boundary of two adjacent simplicial cones are counted multiple times. The correct approach is to use the inclusion-exclusion formula:
Also, note that this still holds true when (and the ) is shifted by a point . When , as is not full-dimensional, and the integration is done with respect to the Lebesgue measure on . This leads us to the next lemma.
For any triangulation of the feasible cone at each of the vertices of the polytope we have
Lemma 1.2.14 states that we can triangulate a polytope’s feasible cones and apply the integration formulas on each simplicial cone without worrying about shared boundaries among the cones. Note that there is no restriction on how the triangulation is performed.
More care is needed for the discrete case as when . We want to avoid using the inclusion-exclusion formula as it contains exponentially many terms (in size of ).
The discrete case has another complication. Looking at Proposition 1.2.9, we see that the sum
has to be enumerated. However, there could be an exponential number of points in in terms of the bit length of the simplicial cone .
We will illustrate one method for solving these problems called the Dual Barvinok Algorithm.
Recall that the polar of a set is the set . Cones enjoy many properties under the polar operation. If is a finitely generated cone in , then
is also a cone,
if , then is generated by the columns of .
The next lemma is core to Brion’s “polarization trick” [Brion88] for dealing with the inclusion-exclusion terms.
Lemma 1.2.15 (Theorem 5.3 in [barvinokzurichbook]).
Let be the vector space spanned by the indicator functions of all closed convex sets in . Then there is a unique linear transformation from to itself such that for all non-empty closed convex sets .
Instead of taking the non-simplicial cone and triangulating it, we first compute and triangulate it to . Then
Applying the fact that we get
Notice that the polar of a full-dimensional pointed cone is another full-dimensional pointed cone. For each with , is not a full-dimensional cone. The polar of a cone that is not full dimensional is a cone that contains a line. Hence . By polarizing a cone, triangulating in the dual space, and polarizing back, the boundary terms from the triangulation can be ignored.
Next, we address the issue that for a simplicial cone , the set contains too many terms for an enumeration to be efficient. The approach then is to decompose into cones that only have one lattice point in the fundamental parallelepiped. Such cones are called unimodular cones. Barvinok in [bar] first developed such a decomposition and showed that it can be done in polynomial time when the dimension is fixed. We next give an outline of Barvinok’s decomposition algorithm.
Given a pointed simplicial full-dimensional cone , Barvinok’s decomposition method will produce new simplicial cones such that and values such that
Let be generated by the rays . The algorithm first constructs a vector such that
where the columns of are the . This is done with integer programming or using a lattice basis reduction method [latte1]. Let , then it can be shown that , meaning that these new cones have less integer points in the fundamental parallelepiped than the old cone. This process can be recursively repeated until unimodular cones are obtained.
Theorem 1.2.16 (Barvinok [bar]).
Let be a simplicial full-dimensional cone generated by rays . Collect the rays into the columns of a matrix . Then the depth of the recursive decomposition tree is at most
Because at each node in the recursion tree has at most children, and and the depth of the tree is doubly logarithmic in , only polynomial many unimodular cones are constructed.
In [bar], the inclusion-exclusion formula was applied to boundaries between the unimodular cones in the primal space. However, like in triangulation, the decomposition can be applied in the dual space where the lower dimensional cones can be ignored. For the full details of Barvinok’s decomposition algorithm, see [latte1], especially Algorithm 5 therein. This variant of Barvinokâs algorithm has efficient implementations in LattE [latte-1.2] and the library barvinok [barvinok-noversion].
1.2.5. Generating functions for full-dimensional polytopes
In this section, we explicitly combine the results from the last two sections and write down the polynomial time algorithms for computing the discrete and continuous generating function for a polytope .
Output: the rational generating function for in the form
where , , and is polynomially bounded in the input size of when is fixed. Each corresponds to a simplicial unimodular cone where are the rays of the cone .
Compute all vertices and corresponding supporting cones of
Polarize the supporting cones to obtain
Triangulate into simplicial cones , discarding lower-dimensional cones
Apply Barvinokâs signed decomposition (see [latte1]) to the cones to obtain cones , which results in the identity
Stop the recursive decomposition when unimodular cones are obtained. Discard all lower-dimensional cones
Polarize back to obtain cones
is the unique integer point in the fundamental parallelepiped of every resulting cone
Write down the above formula
The key part of this variant of Barvinok’s algorithm is that computations with rational generating are simplified when non-pointed cones are used. The reason is that the rational generating function of every non-pointed cone is zero. By operating in the dual space, when computing the polar cones, lower-dimensional cones can be safely discarded because this is equivalent to discarding non-pointed cones in the primal space.
Triangulating a non-simplicial cone in the dual space was done to avoid the many lower-dimensional terms that arise from using the inclusion-exclusion formula for the indicator function of the cone. Other ways to get around this exist. In [beck-haase-sottile:theorema, beck-sottile:irrational, koeppe:irrational-barvinok], irrational decompositions were developed which are decompositions of polyhedra whose proper faces do not contain any lattice points. Counting formulas for lattice points that are constructed with irrational decompositions do not need the inclusion-exclusion principle. The implementation of this idea [latte-macchiato] was the first practically efficient variant of Barvinokâs algorithm that works in the primal space.
For an extremely well written discussion on other practical algorithms to solve these problems using slightly different decompositions, see [koeppe:irrational-barvinok]. For completeness, we end with the algorithmic description for the continuous generating function.
Output: the rational generating function for in the form
where are the rays of cone .
Compute all vertices and corresponding supporting cones
Triangulate into a collection of simplicial cones using any method
Write down the above
Note that in fixed dimension, the above algorithms compute the generating functions in polynomial time. We will repeatedly use the next lemma to multiply series in polynomial time in fixed dimension. The idea is to multiply each factor, one at a time, truncating after total degree .
1.2.6. A power of a linear form
Above, we developed an expression for , and . Later in Chapters 2 and 3, we will compute similar sums and integrals where instead of an exponential function, the summand or integrand is a power of a linear form, or more generally, a product of affine functions. The common trick will be to introduce a new variable and compute or . If the series expansion in about is computed, we get a series in where the coefficient of is or . To compute these series expansions, many polynomials will be multiplied together while deleting monomials whose degree is larger than some value . The next lemma shows that this process is efficient when the number of variables is fixed, and we repeatedly apply it in Chapters 2 and 3.
Lemma 1.2.17 (Lemma 4 in [baldoni-berline-deloera-koeppe-vergne:integration]).
For polynomials in variables, the product can be truncated at total degree by performing elementary rational operations.
1.3. Handelman’s Theorem and polynomial optimization
In this section, we comment on the problem
where is a polynomial and is a polytope. Handelman’s Theorem is used in Chapters 2 and 3 as a tool for rewriting a polynomial in a special way. This section introduces Handelman’s theorem along with how it can directly be used for polynomial optimization. Section 1.3.1 briefly illustrates how Handelman’s theorem can be used instead of sum of squares polynomials. Then finally in Section 1.3.2, we review the computational complexity of the polynomial optimization problem.
In Chapter 3, Handelman’s theorem is not used to perform optimization. It is used as a tool to decompose the polynomial into a form that makes integrating more practical. These integrals are then used to produce bounds on the optimal value. With this view, we are using Handelman’s theorem in a novel way.
Theorem 1.3.1 (Handelman [handelman1988]).
Assume that are linear polynomials and that the semialgebraic set
is compact and has a non-empty interior. Then any polynomial strictly positive on can be written as for some nonnegative scalars .
We define the degree of a Handelman decomposition be , where the maximum is taken over all the exponent vectors of that appear in a decomposition.
Note that this theorem is true when is a polytope , and the polynomials correspond to the rows in the constraint matrix . See [Castle20091285, deKlerk2015, monique2014] for a nice introduction to the Handelman decomposition. The Handelman decomposition is only guaranteed to exist if the polynomial is strictly greater than zero on , and the required degree of the Handelman decomposition can grow as the minimum of the polynomial approaches zero [sankaranarayanan2013lyapunov]. The next three examples are taken from Section 3.1 of [sankaranarayanan2013lyapunov].
Consider the polynomial given by on . Because , Handelman’s theorem does not say that must have a Handelman decomposition. However,
where and , so has a Handelman decomposition.
To apply Handelman’s theorem, we must have that on .
Consider the polynomial given by on . Because , Handelman’s theorem does not say that must have a Handelman decomposition. If had a decomposition, then there would be numbers and integers such that
with being a finite subset of . Evaluating both sides at zero produces the contradiction . Hence does not have a Handelman decomposition on .
For every fixed , must have a Handelman decomposition on . Let be the total degree of a Handelman representation. Then the next table lists what is the smallest value for which has a degree decomposition.
There are many questions relating to Handelman’s theorem. For instance, answers to these questions are not well known or completely unknown.
Given a nonnegative polynomial on a polytope , how large does the Handelman degree have to be?
By adding a positive shift to , how can the Handelman degree change for ?
Fix . Can a large enough shift be added to so that has a degree Handelman decomposition?
How can redundant inequalities in ’s description lower the Handelman degree or reduce the number of Handelman terms?
However, these questions do not prevent us from using Handelman’s theorem as an effective tool for polynomial optimization. We now present a hierarchy of linear relaxations as described in [monique2014] for maximizing a polynomial over a polytope. This is the most traditional way Handelman’s theorem can directly be applied for optimization. Let denote the set of polynomials . For an integer , define the Handelman set of order t as
and the corresponding Handelman bound of order as
Clearly, any polynomial in is nonnegative on and one has the following chain of inclusions:
giving the chain of inequalities: for . When is a polytope with non-empty interior, the asymptotic convergence of the bounds