Generalized Ehrhart polynomials
Let be a polytope with rational vertices. A classical theorem of Ehrhart states that the number of lattice points in the dilations is a quasi-polynomial in . We generalize this theorem by allowing the vertices of to be arbitrary rational functions in . In this case we prove that the number of lattice points in is a quasi-polynomial for sufficiently large. Our work was motivated by a conjecture of Ehrhart on the number of solutions to parametrized linear Diophantine equations whose coefficients are polynomials in , and we explain how these two problems are related.
In this article, we relate two problems, one from classical number theory, and one from lattice point enumeration in convex bodies. Motivated by a conjecture of Ehrhart [Ehr] and a result of Xu [Xu], we study linear systems of Diophantine equations with a single parameter. To be more precise, we suppose that the coefficients of our system are given by polynomial functions in a variable , and also that the number of solutions in nonnegative integers for any given value of is finite. We are interested in the behavior of the function , and in particular, we prove that is eventually a quasi polynomial, i.e., there exists some period and polynomials for such that for , the number of solutions for is given by . The other side of our problem can be stated in a similar fashion: suppose that is a convex polytope whose vertices are given by rational functions in . Then the number of integer points inside of , as a function of , enjoys the same properties as that of as above. We now describe in more detail some examples and the statements of our results.
1.1 Diophantine equations.
As a warmup to our result, we begin with two examples. The first is a result of Popoviciu. Let and be relatively prime positive integers. We wish to find a formula for the number of nonnegative integer solutions to the equation . For a real number , let denote the greatest integer less than or equal to , and define to be the fractional part of . Then the number of such solutions is given by the formula
where and satisfy and . See [BR, Chapter 1] for a proof. In particular, this function is a quasi-polynomial in .
For the second example which is a generalization of the first example, consider the number of solutions to the matrix equation
where the and are fixed positive integers and for . Write . We assume that , so that there exist integers (not unique) such that
Now define two regions for . Then if is in the positive span of the columns of the matrix in (1.2), there exist Popoviciu-like formulas for the number of solutions of (1.2) which depend only on whether or , and the numbers . See Section 5.1 for the precise statement.
In particular, one can replace the , , and by polynomials in in such a way that for all values of , the condition holds. For a concrete example, consider the system
Then for , we have that
so that for these values of , there exists a quasi-polynomial that counts the number of solutions .
Given these examples, we are ready to state our general theorem. We denote by the set of functions which are eventually quasi-polynomial.
Let be an matrix, and be a column vector of length , such that their entries are integer coefficient polynomials in . If denotes the number of nonnegative integer vectors satisfying (assuming that these values are finite), then .
This theorem generalizes the conjecture [Sta, Exercise 4.12]. See [Ehr, p. 139] for some verified cases of a conjectural multivariable analogue, which we state here. Let be some subset. We say that a function is a multivariate quasi-polynomial if there exists a finite index sublattice such that is a polynomial function on each coset of intersected with .
Conjecture 1.2 (Ehrhart).
Let be an matrix and be a column vector of length , such that all entries are linear functions in with integer coefficients such that for all , the number of nonnegative integer solutions to is finite. Then there exist finitely many polyhedral regions covering such that is a multivariate quasi-polynomial when restricted to each .
1.2 Lattice point enumeration.
We first recall a classical theorem due to Pick. Let be a convex polygon with integral vertices. If , , and denote the area of , the number of integer points in the interior of , and the number of integer points on the boundary of , respectively, then one has the equation
Now let us examine what happens with dilates of : define . Then of course and whenever is a positive integer, so we can write
which is a polynomial in . The following theorem of Ehrhart says that this is always the case independent of the dimension, and we can even relax the integral vertex condition to rational vertices:
Theorem 1.3 (Ehrhart).
Let be a polytope with rational vertices. Then the function is a quasi-polynomial of degree . Furthermore, if is an integer such that has integral vertices, then is a period of . In particular, if has integral vertices, then is a polynomial.
The function is called the Ehrhart quasi-polynomial of . One can see this as saying that if the vertices of are , then the vertices of are given by the linear functions . We generalize this as
Given polynomials for and , let be a positive integer such that for all . This is satisfied by sufficiently large, so we can define a rational polytope , where . Then .
We call the function a generalized Ehrhart polynomial.
In Section 2, we explain the equivalence of the two problems just mentioned, then prove our main result Theorem 1.1 in Section 3. The proof gives us an algorithm to compute these generalized Ehrhart polynomials, but it could be very complicated in practice. For computational reasons, we introduce the notions of generalized division and generalized gcd for the ring in Section 4. In a sense, these generalize the usual notions of division and gcd for the ring of integers . The methods and algorithms are quite similar, but are more involved due to technical complications. The proofs for the correctness of the algorithms are given in this section. As an application of these tools, in Section 5, we will describe explicit computations of some special generalized Ehrhart polynomials.
We thank Richard Stanley for useful discussions and for reading previous drafts of this article. We also thank an anonymous referee for helping to improve the quality and readability of the paper. Sheng Chen was sponsored by Project 11001064 supported by National Natural Science Foundation of China. Steven Sam was supported by an NSF graduate fellowship and an NDSEG fellowship.
2 Equivalence of the two problems
As we shall see, the two problems of the Diophantine equations and lattice point enumeration are closely intertwined. In this section, we want to show that Theorem 1.1 is equivalent to Theorem 1.4. Before this, let us see the equivalence of Theorem 1.4 with the following result. For notation, if and are vectors, then if for all .
For , define a rational polytope , where is an matrix, and is an column vector, both of whose entries are integer coefficient polynomials. Then .
Notice that the difference of Theorem 1.4 and Theorem 2.1 is that one defines a polytope by its vertices and the other by hyperplanes. So we will show their equivalence by presenting a generalized version of the algorithm connecting “vertex description” and “hyperplane description” of a polytope.
The connection is based on the fact that we can compare two rational functions and when is sufficiently large. For example, if and , then for all , we denote this by (“even” being shorthand for “eventually”). Therefore, given a point and a hyperplane, we can test their relative position. To be precise, let be a point where the are rational functions and let be a hyperplane where all the are polynomials of . Then exactly one of the following will be true:
Given this, we can make the following definition. We say that two points and lie (resp., weakly lie) on the same side of if (resp., ).
Going from the “vertex description” to the “hyperplane description”:
Given all vertices of a polytope , whose coordinates are all rational functions of , we want to get its “hyperplane description” for . Let be a hyperplane defined by a subset of vertices. If all vertices lie weakly on one side of , we will keep it together with , or or indicating the relative position of this hyperplane and the polytope. We can get all the hyperplanes defining the polytope by this procedure.
Going from the “hyperplane description” to the “vertex description”:
Let be a polytope, where is an matrix, and is an column vector, both of whose entries are integer coefficient polynomials. We want to find its vertex description. Let be the linear functionals defined by the rows of . So we can rewrite as
The vertices of can be obtained as follows. For every -subset , if the equations are linearly independent for , and their intersection is nonempty, then it consists of a single point, which we denote by . If for all , then is a vertex of , and all vertices are obtained in this way. We claim that the subsets for which is a vertex remains constant if we take sufficiently large. First, the notion of being linearly independent equations can be tested by showing that at least one of the minors of the rows of indexed by does not vanish. Since these minors are all polynomial functions, they can only have finitely many roots unless they are identically zero. Hence taking , we can assume that is either always linearly dependent or always linearly independent. Similarly, the sign of is determined by the sign of a polynomial, and hence is constant for .
We can easily transform an inequality to an equality by introducing some slack variables and we can also represent an equality by two inequalities and . So the main difference between the two theorems is that Theorem 1.1 is counting nonnegative solutions while Theorem 2.1 is counting all integral solutions. But we can deal with this by adding constraints on each variable.
A more interesting connection between Theorem 1.1 and Theorem 2.1 is worth mentioning here. First consider any fixed integer . Then the entries of and in the linear Diophantine equations of Theorem 1.1 all become integers. For an integer matrix, we can calculate its Smith normal form. Similarly, we can use a generalized Smith normal form for matrices over to get a transformation from Theorem 1.1 to Theorem 2.1.
Then given and , by Theorem 4.6, we can put into generalized Smith normal form: for some matrix
with nonzero entries only on its main diagonal, and unimodular matrices and . Then the equation can be rewritten as . Set and . By the form of , we have a solution if and only if divides for , and for any given solution, the values can be arbitrary. However, since , we need to require that , and any such gives a nonnegative solution to the original problem. Simplifying , where , we get , where , and . Although and has entries in , we can assume that they are polynomials by dealing with each constituent of the quasi-polynomials separately. So we reduce Theorem 1.1 to Theorem 2.1.
Consider the nonnegative integer solutions to
Write . When , the Smith normal form of is
So the equation (2.1) becomes
Set , so that we have
and thus . By the condition , we require
which gives us
a one dimensional polytope. So the number of solutions for is
We can do the case similarly. The Smith normal form of is
and , thus .
The proof of Theorem 4.6 is based on a theory of generalized division and GCD over the ring , which mainly says that for , the functions , , and lie in the ring . One interesting consequence of these results is that every finitely generated ideal in is principal, despite the fact that is not Noetherian. We developed this theory in order to appoach Theorem 1.1 at first, but subsequently have found a proof that circumvents its use. Further details can be found in Section 4.
3 Proof of Theorem 1.1
To prove Theorem 1.1, we will use a “writing in base ” trick, to reduce equations with polynomial coefficients to linear functions. Briefly, the idea of the following “writing in base ” trick is as follows: given a linear Diophantine equation
with polynomial coefficients and , fix an integer , then the coefficients all become integers. Now consider a solution with . Put the values of into the equation, then both sides become an integer. Then we use the fact that any integer has a unique representation in base ( is a fixed number), and compare the coefficient of each power of in both sides of the equation.
Finally, letting change, we happen to have a uniform expression for both sides in base when is sufficiently large. Moreover, the coefficient of each power of in both sides of the equation are all linear functions of (Lemma 3.1 and Lemma 3.2). Then by Lemma 3.3, we can reduce these equations with linear function coefficients to the case when we can apply Ehrhart’s theorem (Theorem 1.3) to show that the number of solutions are quasi-polynomials of . This completes the proof of Theorem 1.1.
Given with for (i.e., has positive leading coefficient), there is a unique representation of in base :
where is a linear function of such that for , for and . We denote .
Note that may not be equal to . For example, is represented as with , , and .
Let with integral coefficients and . If the are all nonnegative, the proposition holds for , otherwise we can prove it by induction on , where is the smallest index such that . Suppose is the minimal number such that , then in the representation of , put for , . By induction, we have a representation for
so we can add it to to get the desired representation for . Since for we have , by induction, we can make sure for . For a lower bound, is sufficient, i.e., for all , the desired unique representation is guaranteed to exist. Note that , since . So this process will stop in finitely many steps. Uniqueness of this representation is clear. ∎
Fix an integer and . Consider the set
where (as a usual polynomial) and (represented in base as in Lemma 3.1), with and . Then is in bijection with a finite union of sets of the form
where and is a linear form of with constant coefficients.
Fix a sufficiently large positive integer . By our assumptions, for any point , we can write with . The rest of the lemma is a direct “base ” comparison starting from the lowest power to the highest power in the equation . To get a feel for the proof, we recommend that the reader look at Example 3.5 first. We will use this as a running example to explain the steps of the proof.
First write in base , so that each is a linear function in . Then we know that since we have fixed sufficiently large. Going back to Example 3.5, we have so that , and , so that here “sufficiently large” means . Now expand out the equation to get
The constant term in base of (3.1) gives us the equation
Since we have the bound for all , we can in fact say that the LHS is equal to where is an integer such that is strictly greater than the sum of the negative and strictly less than the sum of the positive (and can also be 0). But note that only depends on the , so if our is sufficiently large, we may assume that . Going back to Example 3.5, we have the equation
so that .
So now we have finitely many cases for the value of to deal with. Fix one. Going back to the equation (3.1), we can substitute our value of and compare linear coefficients to get
Again, since we know that and for all , we can say that the LHS is equal to where is an integer such that is greater than the sum of the negative and and less than the sum of the positive and (and this is independent of because ). Going back to Example 3.5, we get the equation
Now we have finitely many cases of , and again we fix one. We continue on in this way to get . At each point, we only had finitely many choices for the next , so at the end, we only have finitely many sequences . For a given , we see that the must satisfy finitely many equalities of the form where is a linear form in with constant coefficients and . In Example 3.5, for , we would have the equations
along with the inequalities , and we have underlined the places where appears.
Each such gives us a set of the form , and to finish we take the union of these sets over all sequences . ∎
We will need one more prepatory lemma before doing the proof of Theorem 1.1.
If is a polytope defined by inequalities of the form , where and is a linear form of with constant coefficients, then .
We first use the fact that the combinatorics of stabilizes for sufficiently large (see Section 2), say that the polytope has vertices . Also, note that the coefficients of the vertices of the polytope are linear functions in with rational coefficients. Let be the least common multiple of the denominators that appear in all of the coordinates. Then by [Bar, Theorem 18.4], the function sending to the number of integer points in is a polynomial if we restrict to a specific congruence class of modulo . We can compose each of these functions with the polynomial to conclude that . ∎
Alternate proof of Lemma 3.3.
We will do induction on dimension first, the case of dimension 0 being trivial. An inequality of the form with will be called an inequality with constant term. Secondly, we will do induction on the number of inequalities with constant term. Theorem 1.3 gives the desired result when there are no inequalities with constant term. In general, let be a polytope defined by plus other relations of the same form, which we will call . Let be the polytope with relations and . If , then for , let be the polytope with relations and . Then
so to show , it suffices to show that and for . Since has one less inequality with constant term, we know that by induction. As for , write and choose such that . Then the equation can be rewritten as
Making substitutions into the relations , each relation still has the form (clear denominators if necessary to make sure all of the coefficients are integral), and we have eliminated the variable .
If , then for , let be the polytope with relations and . Then
and we proceed as before. Hence by induction on dimension. ∎
Proof of Theorem 1.1.
Consider a system of linear Diophantine equations with finitely many nonnegative integer solutions for each , as in Theorem 1.1. Let
be any equation from the system. The equations define a bounded polytope whose vertices are given by rational functions of as discussed in Section 2, where the degrees of the rational functions (difference between the degree of the numerator and the denominator) are independent of . We can express each point in this polytope as a convex combination of the vertices. Therefore, writing each variable in base , we can find some positive integer , which does not depend on , such that the coordinates of each point in the polytope are less than . Using Lemma 3.2, we can reduce each equation from into a new system like , with more variables than but all restrictions are of the form . Applying Lemma 3.3 finishes the proof. ∎
We give an example for Lemma 3.3. For a positive integer, let be the polygon defined by the inequalities , and . Then is defined by the inequalities , , and , and is defined by the inequalities , , and . We can rewrite the equality as , and then the other inequalities become and .
We see that is the convex hull of the points , while is the interval . The total number of integer points in and is given by the quasipolynomial
Its rational generating function is
We give an example of Lemma 3.2. Consider nonnegative integer solutions for
For any , is the expression in base . Now consider the left hand side. Writing in base , let , and with . In this case, , but we have further restrictions on the degrees of and in base coming from the coefficients . Then we have
Now we can write the left hand side in base with extra constraints on ’s.
We start with comparing the coefficient of in both sides. We have the following three cases:
So in the language of proof of Lemma 3.2, we have . We next consider term. If satisfies for , , then the equation is reduced to
Now compare the terms. We have five cases for each :
Last, we compare the terms. Note that since we assume , the term won’t carry over to term, so the computation of term only depends on the term . If satisfies the th condition for , the equation then becomes
So for each , we have
Overall, we have the set
is in bijection with the set
such that satisfies the equations in the sets
Here we borrow the notation of matrix multiplication