Threshold functions and Poisson convergence for systems of equations in random sets
We study threshold functions for the existence of solutions to linear systems of equations in random sets and present a unified framework which includes arithmetic progressions, sum-free sets, -sets and Hilbert cubes. In particular, we show that there exists a threshold function for the property “ contains a non-trivial solution of ” where is a random set and each of its elements is chosen independently with the same probability from the interval of integers . Our study contains a formal definition of trivial solutions for any linear system, extending a previous definition by Ruzsa when dealing with a single equation.
Furthermore, we study the distribution of the number of non-trivial solutions at the threshold scale. We show that it converges to a Poisson distribution whose parameter depends on the volumes of certain convex polytopes arising from the linear system under study as well as the symmetry inherent in the structures, which we formally define and characterize.
The study of the existence or absence of a given configuration in a large combinatorial structure plays a central role in discrete mathematics and particularly in combinatorial number theory.
On one hand, the extremal aspects of this question have provided an active area of research in extremal combinatorics where many different techniques are exploited to obtain results. One major and well known example is Szemerédi’s Theorem  which proves the existence of arbitrarily long arithmetic progressions in sets of positive density, see also [23, 21]. However, most of these results rely on smart ad hoc arguments that strongly depend on the specific structures under consideration. For some general results that have been obtained in the extremal case see Ruzsa [39, 40] as well as Shapira .
On the other hand, the common behaviour, i.e. what to expect for most sets, has been less studied. In this scenario one is interested in obtaining results regarding the existence (or absence) of solutions to systems of linear equations in random sets. This paper aims to provide answers to this question, giving a clear picture for a wide variety of structures that have been studied in the extremal case.
The model for random sets we consider in this work is known as the binomial model and is analogous to the model in random graphs, which is defined with respect to the parameter and the probability (possibly depending on ). We consider random sets where every element is chosen to be in independently with probability . In this context, we say that a certain property holds “for almost every set ” or that it holds “asymptotically almost surely” if the probability that a random set in does not satisfy such property tends to zero as tends to infinity.
Let us note that in any specific set will appear with probability
thus all sets with the same size are equiprobable. Also, the expected size for a random set in this model is
It follows that with high probability the number of elements in a random set in the described model will be close to .
Let be a combinatorial property and a binomial random set in with parameter . In this context, we say that is a threshold for the property if
implies (the 0-statement) and
implies (the 1-statement).
Roughly speaking: when is “above the threshold” almost all sets satisfy property and “below the threshold” almost no set satisfies the property. Observe that thresholds are only uniquely defined within constant factors. Therefore, we are interested in the order of magnitude of this transition phase.
Consider the homogeneous linear system of equations in variables (with )
which will be identified with its corresponding matrix . We will only consider matrices satisfying three natural conditions:
positivity (the system must contain at least one solution with all positive entries),
irredundancy (for each there exists a solution such that ),
non-degeneracy (the matrix has full rank).
From now on we call any system of equations given by a matrix satisfying these conditions admissible. Roughly speaking, an admissible system of equations must have positive solutions without repeated coordinates and cannot be reduced to another one with a smaller number of equations or variables.
For a given set , we say that contains a solution to if there exist elements such that is a solution to the system.
The problem we address in this paper is the following one: let be a random set sampled from . For an admissible matrix we study how the random variable
behaves with respect to and deduce the existence of a threshold function for the combinatorial property “ contains a non-trivial solution of ”. The existence of such a function is assured by the fact that monotone properties of random sets always have thresholds functions . Note that a crucial point in the analysis of the solutions is the definition of trivial solutions. It might be clear what a trivial solution looks like for -AP or a -set, but in the general setting this concept is less obvious and will be explored in Section 5. This will be the main difference compared to the study of small subgraphs in the classical model.
The exponent of the threshold function will have to be maximized over all induced submatrices of the system . A full explanation will follow in Section 6, but in order to state our results we need to introduce a definition here. For any set of column indices let denote the matrix obtained from by keeping only the columns indexed by , where is the empty matrix. We write the rank of a matrix as and define for all . Here we let . This allows us to define the following parameter.
For any admissible matrix define
Maximizing a parameter over all possible induced submatrices is reminiscent of similar results obtained while studying analogous problems in graph theory . The intuition behind it will be explored later with some examples. Also note the similarity between this parameter and the one developed by Rödl and Ruciński, who restricted themselves to density regular matrices when looking at random sparse versions of Rado’s theorem in [35, Definition 1.1]. Under the previous assumptions and using these definitions, we obtain the following result:
For some , let define an admissible system. Then, the probability is a threshold function for the property: “ contains a non-trivial solution of ”.
In other words, whenever the size of is we can assure that asymptotically almost surely there are no solutions, other than trivial ones, of the linear system with . The main contribution in the study will come from proper solutions, i.e. those solutions with pairwise different coordinates, since roughly speaking solutions with repeated coordinates only start appearing for larger values of .
We also study the behaviour of the limiting probability at the threshold. With this purpose in mind, observe that any system of equations together with the restrictions define a non-empty, convex and rational polytope of dimension which we denote by . We show that if and only if our system is strictly balanced (see the corresponding definition in Section 6) and for some constant , the limiting distribution converges to a Poisson distribution and hence the probability of having a solution tends to with an exponential decay in . Furthermore, the parameter only depends on the volume of as well as the inherent symmetry of the system (see Definition 4.1). More precisely,
For some , let define an admissible system, for some , . Then for every non-negative integer
if and only if is strictly balanced. Here is the polytope associated to the system and is a computable constant depending on .
Observe that the previous result implies that for the number of solutions is approximately Poisson distributed with parameter , and, in particular
Note also that for the statement of Theorem 1.2 it was not of importance whether one regards the solutions as subsets of or as vectors in . However, for the behaviour at the threshold, one needs to be more careful since the constants that appear when counting solutions play a role. This is not an issue when dealing with the usual systems considered in the literature, such as AP, Sidon and -sets (see below). However, since we are developing our theory for any admissible system, we have to take greater care of this issue since one can construct specific examples where neither approach aligns with one’s intuition. Our approach will therefore be somewhere in between considering solutions as vectors or as subsets. We start out considering them as vectors and then take care of a symmetry factor which depends only on and that can be unilaterally applied to all solutions. Note that for the already mentioned common systems there is no difference between this approach and considering them as subsets. More details are given in Section 4, including examples that illustrate why this approach is prudent.
The computation of the constant appearing in Theorem 1.3 is an algorithmically involved problem when dealing with general systems. One could compute this volume by means of triangulations of the polytope , but the problem is in general (for dimension greater than ) NP-complete . We provide computations for some concrete systems in Section 9.
This work will include a precise analysis of interesting combinatorial families that have been studied in the literature from many different points of view and which fit into the presented scheme. Let us state some of these common configurations:
A set of integers is an arithmetic progression of length (or shortly, a -AP) if it can be written in the form for some and .
Sidon and -sets
A set of integers is called a Sidon set (or -set) if every integer has at most one representation as a sum of two elements of , up to permutations of the summands involved. One can generalize this concept in several ways; for example a set of non-negative integers is a -set if every integer has at most representations as a sum of elements of , modulo permutations of the summands involved.
Another possible generalization of Sidon sets are so-called Hilbert cubes: a set of integers is a Hilbert cube of dimension (or -cube) if there exist positive and distinct integers satisfying
Clearly a set is Sidon if it does not contain any -cube. As it is shown by Sándor  almost all sets in with size contain a -cube. Our results extend those in , proving in particular Conjecture 3.1 in the same work.
Barycentric and sum-free sets
A set contains a -barycentric set if there exist elements such that
that is is the average of . Clearly if we have a -AP and trivial solutions are given by . Finally, a set of integers is a -sum-free set if for every pair the sum is not an element of . The case is also known as sum-free equation or Schur equation.
The existence of such structures can be expressed using systems of equations of the type for admissible matrices . A set avoids a -AP if the homogeneous system defined by
does not have a non-trivial solution . In this case all trivial solutions (see Definition 5.1) are given by and correspond to the case . A set is a Sidon set if there are no solutions of the linear system , except for the trivial ones, which have the form either , for . Similarly, a set is if there are no solutions in of the linear system defined by
A set avoids -Hilbert cubes if it does not contain solutions to
and in general for a -Hilbert cube we will have a system of rank in variables. Lastly, a set is -sum-free if there are no solutions of the linear system , (when the problem corresponds with a 3-AP).
It is clear from the definition of , and the -sum-free family that all matrices have maximum rank, that is , and , respectively and are in fact admissible matrices. The application of the previous theorems and the computations in Section 9 are summarized in Table 1.
Nevertheless, let us remark that the general approach presented in this paper allows one to study a lot more than just these few linear structures. Therefore, some generalizations had to be made that proved to be very delicate when no longer considering these well-known examples. We have already mentioned the issue of defining trivial solutions as well as the symmetry between solutions. We have included the computation of the symmetry constant in the previous table (see Section 4 for a proper definition). We will also need to count the number of solutions to the system in and, as we will see in Sections 4 and 6, one must be very careful when doing so for a general system.
In Section 2 we compare the results obtained from Theorem 1.2 to what is currently known in the extremal case for certain structures. Section 3 contains a brief overview of the tools needed for the proofs later on in this paper. In Section 4 we will apply Ehrhart’s Theory to obtain a useful lemma and a simple corollary that allow us to count the number of solutions to a system of linear equations up to some symmetry. Section 5 contains a formal definition of trivial solutions with examples to motivate it. In Section 6 we introduce the concept of induced submatrices and an important proposition regarding non-trivial solutions. It also contains a formal definition of strictly balanced systems which is a necessary prerequisite for Theorem 1.3. A proof of Theorem 1.2 using the Second Moment Method can be found in Section 7 and in Section 8 we study the local behavior of the threshold which results in a proof of Theorem 1.3 via an application of Brun’s Sieve. The analysis of associated to certain combinatorial families is carried out in Section 9. Finally, in Section 10 we discuss related problems and generalizations.
2. State of the art in the extremal case
In the presented approach, we seek to give a picture of the qualitative behavior of a random set. However one might wonder how far the common situation is from the extremal cases. The problem of estimating the size of maximal sets avoiding the structures introduced above has been intensively studied. In this direction one can find several results which give upper bounds for sets avoiding specific structures or, on the opposite direction, explicit constructions of large sets with this property. In both cases one requires ad hoc arguments that strongly depend on each specific structure.
For sets avoiding -AP’s we must go back to Szemerédi’s Theorem, that states that no set with positive density can avoid -AP’s for any . In particular, for non-trivial bounds were first obtained by Roth  and then refined by several authors, see [26, 7]. Nowadays, the best upper bound is established by Bloom . On the other hand, Behrend  constructed a set avoiding -AP’s of large size; this construction was slightly improved by Elkin  (see also ). More precisely, we have
for some constant .
Concerning the general -AP problem, analogous bounds have been obtained: the upper bounds come from the pioneering work of Gowers  and, more recently, dense constructions that lead to lower bounds for this problem were stablished by O’Bryant . These results can be summarized as follows
for a certain constant only depending on .
We show that almost all sets with size contain -AP’s. Observe that, for the gap between the usual situation and the extremal set is very large: most sets with size contain -AP’s but there are examples of (almost) linear size avoiding this structure. Nevertheless, as grows to infinity, this quantity approximates to and the gap between the exponents tends to .
In the direction of the present article, Warnke has also studied the upper tail of the number of -arithmetic progressions and Schur triples in random subsets, establishing exponential bounds .
Sidon and -sets
The study of Sidon sets dates back to Erdős. In  Erdős and Turán obtained an upper bound for the size of a maximal Sidon set in (see [30, 11] for further improvements of this result). In fact, there are algebraic constructions of Sidon sets that, combined with Erdős-Turán result, prove
In the direction of the present article, the case was studied in detail by Godbole, Janson, Loncatore and Rapoport in . They show that almost no set with size is . The proof is based on a tailor-made analysis on the particular shape of the equations defining sets.
Clearly, for (that is Sidon), the gap between the exponents in the usual situation, namely , and the extremal one, say , is very big. Let us also mention that in  Kohayakawa, Lee, Rödl and Samotij study the number of Sidon sets and the maximum size of Sidon sets contained in a sparse random set of integers. In particular, in Section 5 they analyze, by means of the Kim-Vu polynomial concentration inequality , the number of solutions to the Sidon equation (when the probability lies above the threshold). These results could be deduced using the presented framework. Concerning the general case, it is known that the cardinality of a maximum -set in is , but the main difficulty is to obtain a precise constant for the problem [13, 12]. As we show in Theorem 1.2 almost all sets in of size are . Once again, if we fix and let grow to infinity both situations approach each other.
Hilbert originally proved that any finite coloring of the positive integers contains a monochromatic -cube. The density version of this result is known as Szemerédi’s Cube Lemma and it is a key point in his proof of Roth’s Theorem. Gunderson and Rödl  obtained, by counting arguments, that for sufficiently large , any set with size contains a -cube. On the other side, by means of probabilistic arguments, one can construct a set of size avoiding -cubes. For the particular case , very recently Cilleruelo and Tesoro  have obtained an algebraic construction of a set of size avoiding -cubes.
As in the previous cases, when grows the existing gap between the exponents in our result and the ones in the upper and lower bounds tends to .
-sum free sets
The question of maximizing the cardinality of a set of integers in avoiding belongs to the folklore: one cannot select more than integers satisfying this condition for and this is optimal. The case coincides with the exclusion of -AP’s. Concerning , the problem was solved by Chung and Goldwasser  getting the same estimates as for . For , and sufficiently large , Chung and Goldwasser  discovered -sum-free sets of linear size in (and density tending to as increases); in fact Baltz, Hegarty, Knape, Larsson and Schoen  showed that this construction is optimal. Therefore, for this family it is known that the maximal size of a -sum-free set is linear in but Theorem 1.2 asserts that almost all sets of size contain at least one solution to , for every . Observe that in this family, the parameter does not play a role in the position of the threshold.
In this section we recall the Second Moment Method, Janson’s inequality and Brun’s Sieve – in the context of the Probabilistic Method – as well as basic notions in Ehrhart’s Theory for counting lattice points in convex polytopes.
3.1. The Second Moment Method.
The Second Moment Method is used in the version given by Corollary 4.3.4. of Alon and Spencer : let be a sum of indicator random variables, where corresponds to some event . For convenience we suppress the dependence on which defines our notation. We write if and the events and are not independent. Define
If and (as ), then asymptotically almost surely. In particular, under these assumptions, with probability tending to .
3.2. Brun’s Sieve
The traditional approach to the Poisson Paradigm is used in the version given by Theorem 8.3.1. of Alon, Spencer : let again be a sum of indicator random variables associated to some events . Let
where the sum is taken over all subsets of elements. The Inclusion-Exclusion Principle gives us
Theorem 3.1 (Brun’s Sieve).
Suppose there is a constant so that , and such that for every fixed
Then, for every non-negative integer
3.3. Lattice points in dilates of polytopes – Ehrhart’s Theory.
A convex polytope is the convex hull of a finite set of points (which are always bounded), or a bounded intersection of a finite set of half-spaces, and is said to be rational (resp. integral) if its vertices (i.e. its corner points) are points with rational (resp. integral) coordinates. Every rational polytope has a matrix representation of the form
for certain non-negative integers . Note that the inequalities can be easily turned into equalities through the use of slack variables. The (relative) dimension of a polytope is the dimension of the affine space . Note that this dimension is not necessarily , but a smaller non-negative integer. For a given polytope , let be the volume of in this affine space and the th-dilate of the polytope.
Ehrhart’s Theorem  (see also ) gives a precise description of the number of integer points on the th-dilate of a rational polytpe in this context: the quantity is given by a pseudopolynomial in of degree (recall that a pseudopolynomial is a function where the functions are periodic). More precisely, we have the following theorem:
Theorem 3.2 (Ehrhart’s Theorem).
Let be a -dimensional convex polytope.
If is an integral polytope, then is a polynomial in of degree .
If is a rational polytope, then is a pseudopolynomial in of degree . Additionally, its period divides the least common multiple of the denominators of the coordinates of the vertices of .
Additionally to Theorem 3.2, one can easily show that the leading coefficient in both cases is equal to . As a trivial corollary, for a rational polytope of dimension embedded in , we have
Let us mention that the full version of Ehrhart’s Theorem will be used in Subsection 9.3 in order to study volumes in families. However, the weaker version stated in Equation (5) will be use in the rest of the proofs.
4. Counting proper solutions up to symmetry
Consider the system defined by and recall that a solution to it is called proper if its coordinates are pairwise different. Note that the number of proper solutions in to the given system has the obvious upper bound . In fact, this bound trivially holds even for inhomogeneous systems, that is for any and we have
One can also easily give an easy constructive proof for the existence of some constant such that is a lower bound for large enough, see for example . These two bounds are in fact sufficient to prove the statements in Theorem 1.2.
However, the statement in Theorem 1.3 requires the exact asymptotic value of the ratio of the number of solutions to . This is where we apply Ehrhart’s Theory and in particular Equation (5) in order to count the number of proper solutions to the system and obtain the fundamental Lemma 4.3. Solutions that have repeated coordinates will simply be considered as as proper solution to some reduced system as we will introduce in the next section.
Let us start by considering that two proper solutions which are counted as separate by Ehrhart’s Theory can be essentially the same when considering symmetry. As an easy example for this consider that -APs are given by for which and both are proper solutions that one might consider as essentially identical. However the situation is not quite as simple as grouping solutions together if they are identical up to permutation. Consider for example the system given by
for which both and are again proper solutions. In this case however, one should consider them to be essentially different because the permutation did not just occur between coordinates with identical coefficients and hence cannot be applied to all solutions. The semicolons in the vector representations delineating the coordinates with factor from those with factor were added to emphasize that fact. In order to deal with this distinction we denote by the symmetric group on elements and introduce the following definition:
For any matrix its symmetry constant is defined as
where is the vector obtained by permuting the coordinates of x according to .
This definition based on the solution space might be immediately applicable for systems such as the ones given in the introduction, where the solutions are intuitively clear. However if one is given a more complex and less structured system, the following simple characterization of the symmetry constant based solely on the matrix might be easier to apply:
For any matrix and permutation let denote the matrix obtained by permutating the columns of according to . We have
where denotes equality up to linear transformations of the rows.
The inequality is trivial. In order to show equality, note that for every permutation from the set
we have and since both kernels have dimension we have equality. Now the kernel of a matrix is the orthogonal of the span of its rows and therefore the rows of can be obtained by linear transformation of the rows in . ∎
In the following we will consider two solutions to be essentially different if and only if
and otherwise x, y are treated as the same solution, even if they might differ as vectors. Observe that the definition and its characterization captures the previous examples as intended, i.e. it considers and to be the same -AP while distinguishing between two solutions given for the second example. For the common systems mentioned in the introduction this approach of considering solutions as vectors in and dividing by results in counting subsets in that are solutions. For the example (7), however there is a small but significant difference between this and our approach.
Now in order to apply Equation (5) note that for any admissible matrix the system with the additional restraints defines a rational polytope of dimension (by assumption the system has the maximum possible rank and the polytope is not empty by the positivity assumption). It is just the intersection of the -dimensional solution space and the -dimensional unit hypercube. Using this we can now formulate the following lemma which will be applied in the forthcoming sections and simplify the discussion:
Let , an admissible matrix and the rational polytope defined by . Then the number of different proper solutions of is of the form .
The number of lattice points in is precisely the number of (not necessarily proper) vector solutions to with the added condition that . As the intersection of the -dimensional solution space and the -dimensional unit hypercube, the polytope also has dimension . By Equation (5) the number of lattice points in the dilate is simply . Noting that we do not distinguish between solutions that are identical up to certain permutations as specified in Definition 4.1 introduces the factor .
We therefore have to consider the set of solutions of with some repeated coordinates and show that they have a negligible contribution to the total number of solutions. These solutions belong to the intersection of with a subspace defined by repetitions of coordinates. Since by assumption our system is irredundant, contains at least one solution with no repeated coordinates; this implies that there is no subspace defined by the the repetition of coordinates containing . Therefore, the polytope resulting from the intersection has dimension strictly smaller than .
It follows again by Equation (5) that the number of solutions with certain repeated coordinates is . Finally, the number of possible constellations of repeated coordinates is bounded by the number of partitions of , so the total number of solutions with repeated coordinates is and the lemma follows. ∎
5. Trivial solutions
The key point of this section is to correctly define what a trivial solution is. Observe that in some of the examples discussed before it was very clear what trivial solutions look like. For example, trivial solutions to -AP’s are given by , which any non-empty set would contain. In order to study the threshold we must avoid these kind of degenerate cases and understand what it means for the general setting.
For an admissible matrix , consider the system and associate to each variable its corresponding index . Let be a set partition of into blocks. Observe that defines a new system of equations , , obtained after taking the original system and combining the variables of each block of , i.e. summing up all columns related to the same block. We say that this new system of equations is associated to and derived from .
A system associated to a partition encodes certain solutions of the original system with repeated coordinates. For a given solution x of the system , we denote by the corresponding set partition of the indices . In particular, if x is a proper solution, then and therefore .
Observe that not every possible partition will come from a solution x and not every system associated to a partition will have proper solutions. For example, if one considers the equation it is clear that the related partition (that is ) necessarily implies , and thus the associated system will no longer be admissible (since it is neither irredundant nor non-degenerate). This observation is crucial in order to define what a trivial solution will be.
We say that x is a trivial solution of if
We denote the set of all partitions stemming from some non-trivial solution by
Note that by definition is admissible for all . Roughly speaking, our definition requires that the systems associated to our non-trivial solutions do not lose in complexity compared to the original system. Otherwise those solutions might in fact start appearing for smaller values of . We already observed that trivial -APs consisting of a single element would occur in any non-empty set.
Observe first that Definition 5.1 generalizes the notion of trivial solutions in the case introduced by Ruzsa in . Let us remark that previous generalizations have been made, like the one given by Shapira in  for density regular systems of equations. Indeed, his definition of trivial solution is more restrictive: in his context x is a trivial solution only if the system associated to has zero rank (that is for every and every ). As we will discuss later, this includes redundant solutions in some examples, but this does not affect his argument since he is interested in lower bounds for large sets avoiding solutions.
Let us discuss some examples to motivate our definition, i.e. to show that we are in fact no longer dealing with the same arithmetic structures when considering systems associated to trivial solutions. In Sidon sets, which are defined by the equation , we can derive systems like (namely in the original system) that give rise to non-trivial solutions since the rank of the associated system is still . However, as said before, if one considers the partition then the associated system has rank and thus all solutions of this kind are trivial, which is consistent with the classical definition.
Next, let us discuss what trivial solutions look like for -sets. Recall that a set is no longer a -set if there exist (essentially different) representations of the same element as sums of elements of . That is, there are elements , satisfying
and all representations are pairwise different, so none of them are obtained after permuting the elements of another representation. Let us focus on sets to illustrate what situations can occur. Here, we must avoid solutions to
and we are excluding situations like for example , with associated partition , since
But we should not exclude for example solutions , with partition , which is still a valid solution since
As we have seen in the Sidon case, different representations cannot have elements in common but the same representation can have repeated elements. If , we can also consider representations that have some elements in common but not all at once. As we observed before, the definition of trivial solutions in  situates these two examples at the same level and clearly they should be considered different for our purposes.
Considering this definition of (non-)trivial solutions and the previous Lemma 4.3, one can already observe that the main contribution in our analysis of the threshold will come from proper solutions. The number of such solutions is, nevertheless, easier to count than the number of solutions with repeated coordinates (as we are dealing with general systems). The main difficulty will be to prove that the contribution of non-trivial solutions with repeated coordinates is negligible with respect to the total number of non-trivial solutions.
6. Induced submatrices and related definitions
We start this section by motivating the need for a definition of induced submatrices (not to be confused with the matrices associated to some partition, discussed on the previous section). Extending the results obtained for simple systems like that of Sidon sets , one might expect the exponent of the threshold function of a given admissible system to be determined by the quotient of the number of variables of over the degrees of freedom in the system, which we will call the average degree of . However this exponent might not necessarily hold, as demonstrated by the example
We note that this system is of rank and has variables. Following the previous intuition, one might expect the exponent of the threshold function to be . The first row however implies that a solution to this system also fulfills the Sidon property () for which the stronger exponent is known to hold. It follows that we should also consider the underlying information coming from so-called induced submatrices in order to find an exponent that maximizes the average degree (see Definition 6.1).
In the general context, the induced submatrices are less clearly presented than in example (8). Intuitively, for any selection of rows one would try to set as many columns of these rows to zero through Gaussian elimination. Equivalently, one could fix some columns and maximize the number of rows whose entries in the columns can be set to zero through elimination such that one can disregard the corresponding variables.
For a proper definition, let be the columns of . We have already introduced as the matrix with columns for so that . Denote by the rows of and by the rows of . Assume without loss of generality that the first rows of are linearly independent. It follows that there exist such that