On the number of matroids
We consider the problem of determining , the number of matroids on elements. The best known lower bound on is due to Knuth (1974) who showed that is at least . On the other hand, Piff (1973) showed that , and it has been conjectured since that the right answer is perhaps closer to Knuth’s bound.
We show that this is indeed the case, and prove an upper bound on that is within an additive term of Knuth’s lower bound. Our proof is based on using some structural properties of non-bases in a matroid together with some properties of stable sets in the Johnson graph to give a compressed representation of matroids.
Matroids, introduced by Whitney in his seminal paper [Whitney1935], are fundamental combinatorial objects and have been extensively studied due to their very close connection to combinatorial optimization, see e.g. [SchrijverBookB], and their ability to abstract core notions from areas such as graph theory and linear algebra [Kung1996, OxleyBook].
There are several ways to define a matroid. Perhaps the most natural one is using the notion of independence. A matroid is a pair , where is the ground set of elements, and is a nonempty collection of subsets of called the independent sets with the following properties:
Subset property: implies for all , and
Exchange property: If with , then there exists an element in , such that .
A basic question is: how many distinct matroids are there on a ground set of elements? We denote this number by . Clearly, there are subsets of and hence at most ways to choose , which gives the trivial upper bound . Here, and throughout the paper, denotes the logarithm to the base 2.
This bound is easily improved to by focussing on matroids of a fixed rank. In a matroid, the maximal independent sets are called bases, and by the exchange property all bases of a matroid have the same cardinality. This common cardinality is the rank of the matroid. Let be the number of matroids of rank . As , it must hold that for some . By the subset property, any matroid of rank is completely determined by specifying its bases. As there are at most (call this ) such bases, this gives and thus
In 1973, Piff [Piff1973] improved this bound further to , by observing that a matroid is also completely determined by the closures of its circuits, and using a counting argument to show that there “only” such closures (we describe Piff’s proof in Section 2.5). This is the best upper bound known to date.
In the other direction, the best known lower bound is due to Knuth [Knuth1974] from 1974, who showed that . Knuth’s bound is based on an elegant construction of matroids whose non-bases111For a matroid of rank , a non-basis is an -subset of the ground set that is dependent. satisfy a particular property. Specifically, he constructs a large family of so-called sparse paving matroids. These are matroids of rank , where any two non-bases intersect in at most elements (i.e. their incidence vectors have Hamming distance or more). Such sets of non-bases are precisely the stable sets222We use stable set as a synonym for what is often called independent set in graph theory (i.e. a set of vertices, no two of which are adjacent). As independent set has a different meaning in matroid theory, this serves to avoid confusion. in the so-called Johnson graph . This is the graph with vertex set , in which two vertices are adjacent if and only if their intersection contains elements.
Knuth’s bound follows by taking a collection of such non-bases — equivalently, a stable set in the graph of this size, of which an explicit description is provided in Section 2.4 — and considering the family of size of sparse paving matroids obtained by taking each possible subset of this family. Thus , where is the number of sparse paving matroids on elements. This gives the lower bound
We explain Knuth’s bound in more detail in Section 2.4.
Historically, the interest in paving matroids seems to be a response to the publication of the catalog of matroids on at most 8 elements by Blackburn, Crapo, and Higgs [BlackburnCrapoHiggs1973] in the early 1970’s. With reference to such numerical evidence, Crapo and Rota consider it probable that paving matroids “would actually predominate in any asymptotic enumeration of geometries” [CrapoRotaBook, p. 3.17]. In his book “Matroid Theory”, Welsh also notes that paving matroids predominate among the small matroids, and puts the question whether this pattern extends to matroids in general as an exercise [WelshBook, p. 41]. An earlier lower bound on the number of matroids due to Piff and Welsh [PiffWelsh1971] was also based on a bound on the number of (sparse) paving matroids. Mayhew and Royle recently confirmed that the predominance of sparse paving matroids extends to the matroids on 9 elements [MayhewRoyle2008].
In recent years, (sparse) paving matroids have received attention in relation to a wide variety of matroid topics [Jerrum2006, GeelenHumphries2006, MerinoNoble2010, Bonin2011]. These authors all suggest that the class of sparse paving matroids is probably a very substantial subset of all matroids, pointing out Knuth’s argument for the lower bound.
Mayhew, Newman, Welsh and Whittle [MayhewNewmanWelshWhittle2011] present a very nice collection of conjectures on the asymptotic behavior of matroids. In particular, they conjecture that asymptotically almost every matroid is sparse paving:
Conjecture 1 (Mayhew, Newman, Welsh and Whittle [MayhewNewmanWelshWhittle2011]).
If true, this would imply:
Note that this is in fact a much weaker statement as is a very “forgiving” function, e.g. if or even if , then , while still .
1.1. Our results
Our main result is a substantial strengthening of the upper bound on .
The number of matroids on elements satisfies
Thus, this result comes quite close to Conjecture 2, except for the additive term. In particular, it implies that the number of matroids is indeed much closer to Knuth’s lower bound, and perhaps also lends support to the conjecture that most matroids are indeed sparse paving.
1.2. Our Techniques
The proof of Theorem 3 is based on a combination of the following:
A technique for proving refined upper bounds on the total number of stable sets in a graph.
Defining a notion of local cover of a matroid, which serves as a short certificate to identify the bases in the neighborhood of an -set. Combining the local covers for a carefully chosen set of -sets then serves as a compressed representation of any matroid.
To see the connection to the total number of stable sets, note that any upper bound on is also an upper bound on . As , where denotes the number of sparse paving matroids of rank , and is precisely the total number of stable sets in the Johnson graph , any method to upper bound must also bound the number of such sets.
We first give an overview of each of these two ideas, and then describe how they are combined to prove Theorem 3. These ideas are already useful by themselves to improve the currently known bounds on and . In Section 3 we show how local covers can be used in a very simple way to obtain the following bound.
While this bound is weaker than the one in Theorem 3, it already improves Piff’s upper bound substantially, and matches Knuth’s lower bound up to the additive term.
Similarly, in Section LABEL:s:sparsepaving we show how the refined counting technique for stable sets implies the following bound on the number of sparse paving matroids.
Previously, the best known upper bound on seems to be [MayhewWelsh2010] (we sketch an argument below).
Finally, we prove Theorem 3 in Section LABEL:s:bound.
1.2.1. Upper-bounding via local covers:
Recall that denotes the number of matroids of rank on elements. As , it suffices to bound each separately. For a matroid of rank , let us call a collection of flats a flat cover if it completely describes the matroid by certifying for each -set whether it is a basis or not.
A related notion is that of a local cover: a collection of flats that allows us to identify the bases in the neighborhood of some fixed -set. Our main observation is that given any matroid, for every -set, one can associate to it a local cover consisting of at most flats. This implies that if we pick any dominating set in the Johnson graph and list all the local covers for the vertices in , then this gives a valid flat cover consisting of at most flats. Together with standard arguments about the existence of small dominating sets in any regular graph, this implies that each matroid can be described by a “small” flat cover, which gives the bound in Theorem 5.
1.2.2. Upper-bounding via stable sets:
As it suffices to bound each of these terms separately. For a graph , let denote the number of stable sets in , and recall that . While it is hard to obtain any reasonable estimate of for general graphs, it was shown in [MayhewWelsh2010] that
One may argue this as follows. Let be a -regular graph and let denote the smallest eigenvalue of its adjacency matrix. Then the size of a maximum stable set of is at most by Hoffmann’s bound (see e.g. Theorem 3.5.2 of [BrouwerHaemers2012], or our Corollary LABEL:cor:degree). Let us denote . This implies that
For the graph it is known that is at most , which implies that the maximum stable set has size at most where . Note that this bound is quite good and is within a factor of the size of the explicit stable set used in Knuth’s lower bound. Applying (3) to then gives , which implies the bound (2). We note that the proof of (2) in [MayhewWelsh2010] is similar, except that there the same bound on the maximal size of a stable set of was shown by a combinatorial argument.
It turns out however that counting all the subsets in (3) is rather wasteful and that this bound can be improved. In particular, we show:
Let be a -regular graph on vertices with smallest eigenvalue , with . Then where and .
For the graph , we find that and hence this gives the stronger bound . As was our bound on the size of the maximum stable set, this bound on roughly implies that most stable sets occur as subsets of a few large stable sets of size . Using standard bounds on the binomial coefficients, this directly implies Theorem 6.
Our proof of Theorem 7 is based on a procedure for encoding stable sets that is originally due to Kleitman and Winston [KleitmanWinston1982]. We remain very close to the description of the procedure as given in Alon, Balogh, Morris and Samotij [ABMS2012], to which we also refer for detailed references on the earlier uses of the procedure. Compared to [ABMS2012], we give a somewhat improved analysis (specifically, Lemma 16) to obtain a sufficient bound in the parameter range that is of interest to us.
1.2.3. The improved upper bound on :
To obtain the bound in Theorem 3, we combine the two ideas above. The main observation is that given a matroid , if is a dependent -set (i.e. a non-basis) in , then has a local cover consisting of at most flats (as opposed to up to flats if was an arbitrary -set). Thus if we could construct a flat cover using few such local covers, then we would obtain a much smaller description of a matroid. To this end, we generalize the procedure of Alon et al. [ABMS2012] for encoding stable sets to more generally encode flat covers of the kind described above using a few number of bits. This gives the improved bound on and hence on .
Finally, we remark that the additive gap in our upper bound on arises only because of the factor gap between the known upper and lower bounds on the size of the maximum stable set in the graphs for . It is likely that reducing this gap could lead to improved bounds for . In Section LABEL:s:outtro, we elaborate on this issue a bit further.
As mentioned previously, a matroid is specified by , where the sets in the collection satisfy the independence axioms. The elements of are independent, the remaining elements of are dependent. The set is the ground set, and we say that is a matroid on . There are various set systems and functions defined on that each allow one to distinguish between dependent and independent sets, such as the set of bases, the rank function, the circuits, the closure operator, etc. We define these notions and state some of their basic properties here, but for a detailed account of their interrelations and for proofs we refer to Oxley [OxleyBook].
A basis of is an inclusionwise maximal independent set of . It follows from the independence axioms, that each basis has the same cardinality. In this paper, we will present matroids as , where is the set of bases of . The following is an alternate characterization of matroids in terms of the basis axioms, which we shall need later. A set is the set of bases of a matroid on if and only if and satisfies the basis exchange axiom
Here, we write and .
The rank of a set is , i.e. the cardinality of any maximal independent set in . The rank function is submodular:
We write . Then is the common cardinality of all bases, the rank of . We say that an -set is a non-basis if . Clearly, a matroid of rank with set of bases is also uniquely defined by its set of non-bases, .
A circuit of is an inclusionwise minimal dependent set of . We denote the set of circuits of by . By definition, each dependent set contains some circuit. We will use that if is an -set with , then it contains a unique circuit .
In , the closure of a set is the set . We will often use that for any set , which follows easily from induction and the submodularity of the rank function. A set is called a flat of if , and denotes the set of all flats of . As for any set , every closure is a flat.
The following simple property of flats will be crucially used in our construction of flat covers: A set is dependent if and only if there exists a flat such that . In other words, acts as witness that contains a dependency when restricted to .
The dual of is the matroid whose bases are . The bases, circuits, rank, and closure of sets in are called the cobases, cocircuits, corank, and coclosure of sets in , and we write , , , etc.
The rank and corank functions of are related by
Also, we put
A matroid is paving if for each circuit of (or equivalently if there is no dependent set of size ), and sparse if is paving. is said to be sparse paving if it is both sparse and paving. We write
2.2. Bounds on binomial coefficients
We will frequently use the following standard bounds.
We will also use the following bound on the sum of binomial coefficients for :
This follows by upper bounding , and summing up the resulting geometric series. We are particularly interested in the case , which yields
2.3. The Johnson graph
If is a finite set and , then we write
for the collection of -subsets of . We say that are adjacent (notation: ) if they have Hamming distance (or equivalently: ).
The Johnson graph is defined as the graph with vertex set , in which two vertices and are adjacent if and only if .
We abbreviate .
For any -set , we write for the neighborhood of in .
Obviously, for any -set .
The following lemma points out the connection between the Johnson graph and sparse paving matroids. It was essentially shown by Piff and Welsh [PiffWelsh1971] in proving an earlier lower bound on .
For , sparse paving matroids correspond one-to-one to stable sets in .
Let . We first show that the non-bases of a rank- sparse paving matroid on form a stable set in . Suppose that there are non-bases such that , then we would have
so that either or . In the former case, is a dependent set of size , which contradicts that is paving. In the latter case, it follows from (5) that
so that is a dependent set of of size , which contradicts that is paving.
Next, suppose that is a stable set in . We will show that forms a valid collection of bases for some matroid on .
First, it cannot be that as this would imply that and hence that has no edges. So the only way may fail to be a family of bases is if it fails the basis exchange axiom (4). That is, there are distinct and an such that for all . Now, it must be the case that , for otherwise it would hold that for the only . So, let be distinct elements of , and consider and . Since the base exchange axiom fails, it follows that both . On the other hand, , i.e. , contradicting independence of . ∎
2.4. Knuth’s lower bound
In [Knuth1974], Knuth argues that if has a stable set of size , then has at least stable sets, as each subset of is itself stable. Knuth constructed a stable set of size , but Theorem 1 in [GrahamSloane1980] shows the existence of a stable set of size at least .
We sketch the construction in [GrahamSloane1980]. Identifying the vertices of with their incidence vectors, we view them as vectors with exactly 1’s. It is easily verified that the functional , defined by
gives a valid -vertex-coloring of . As there are color classes, at least one of them should contain at least vertices.
Picking such a stable set gives , and in particular by (7). Therefore,
2.5. Piff’s upper bound
To prove his upper bound on , Piff [Piff1973] uses that any matroid is characterized by the set of all closures of circuits and their ranks, i.e. by the collection
This completely defines as a set is dependent in if and only if for some circuit of . He then uses the following counting argument to bound the size of .
If , then .
Fix an . Let be a circuit such that . Then for each , we have , i.e. there are sets that map to . It follows that
Summing these upper bounds over all , we get
It follows that the number of matroids on a set of elements is at most the number of subsets with . Thus, together by (9)
By (6), this gives and hence ∎
3. A weaker upper bound on the number of matroids
In this section, we introduce the notion of flat covers and local covers and use them to show that each matroid in has a concise description. Using this, we then bound .
Definition 10 (Flat cover).
Let be a matroid with elements, of rank . For a set , we say that a flat covers if . We say that a set of flats is a flat cover of if each non-basis is covered by some .
Note that if covers , then is characterized by and the collection
since by definition of a cover, we have .
Definition 11 (Local cover).
For an -set , we say that a collection of flats is a local cover at if covers all the non-bases .
Let . For each -set , there is a local cover such that .
Let be a fixed -set. Take Then clearly . We consider a .
If and is dependent, then for some . Then covers , as
If , then for some and . If covers , we are done. Otherwise,
so that equality holds throughout, and in particular and . It follows that , so that is a basis and it is not required to cover . ∎
If is a graph, then a set is dominating if . The point of introducing local covers is that one can construct a small flat cover from a collection of local covers at the vertices in some small dominating set, as every non-basis in the matroid will be covered by this collection. By standard probabilistic arguments (see Theorem 1.2.2 of [AlonSpencerBook]), one has:
has a dominating set of cardinality .
Let . Then has a flat cover with .
Denote the upper bound in Corollary 14 by . As each matroid in is uniquely determined by the set where is a cover of size bounded by , the number of matroids in is bounded by the number of subsets of a set of size of cardinality at most .
Remark: The difference between this upper bound and the lower bound of Knuth is . We note that this approach cannot give a gap of . In particular, even an upper bound on the cardinality of a dominating set of the Johnson graph that is closer to (which is clearly the best possible) could improve this gap to at best. We cannot expect to do better: indeed, if we consider the above proof applied to bound the number of sparse paving matroids, it is inherently as wasteful as using the trivial counting bound (3) (as a minimal cover of a sparse paving matroid just lists the non-bases). We proceed by describing a better technique for bounding the number of sparse paving matroids.
4. A procedure for encoding vertex sets
4.1. The procedure
We now describe a procedure given by Alon, Balogh, Morris and Samotij [ABMS2012], for which they refer to Kleitman and Winston [KleitmanWinston1982] as the original source. They use the procedure to encode a stable set as a pair , such that and the number of possibilities for both and can be controlled. We will use it for that purpose in this section as well, but to prepare for other uses in this paper we generalize the procedure so that it takes a general vertex set and produces a pair , satisfying
We stress that the encoding is not one-to-one, and several sets may produce the same pair . We will later describe why such a pair is useful.
Throughout this section, is a -regular graph on vertices, with , and the smallest eigenvalue of its adjacency matrix is . We denote . For a subset , let denote the subgraph of induced by . Let denote the number of edges with both end points in , i.e. the number of edges in . Let us assume there is some fixed linear ordering of the vertices of (say according to their indices ). By the canonical ordering of , we refer to the following procedure to order the set linearly. Let be the vertex with maximum degree in ; if there are multiple such , take the one that is smallest with respect to . Call the first vertex in the canonical ordering, and apply the procedure iteratively to .
The procedure to produce (see Figure 1) maintains two disjoint sets of vertices: for selected and for available. Initially, no vertices are selected () and all vertices are available (). During the procedure, the set will expand and the set will shrink, until . Throughout we will maintain (12) as an invariant.
The following is a simple but subtle and crucial observation from [ABMS2012].
Upon the termination of the algorithm, the set is completely determined by (irrespective of the set ).
This follows as at any step in the algorithm, the vertices chosen thus far in completely determine the remaining vertices and their ordering. In particular, given one can recover as follows: Initialize and . Repeating the following steps until (the resulting set when the algorithm terminates will be ). (i) Consider the canonical ordering of , and let be the first vertex in this ordering. (ii) If , discard from and from and go to step 1. Otherwise, discard from and go to step 1.
4.2. Application to counting stable sets
Later we will show:
The number of vertices selected into is at most .
Let us first see how this implies the following upper bound on , the number of stable sets in .
Let be a -regular graph on vertices with smallest eigenvalue , with . Then where and .
Let be any stable set. Running the procedure yields a pair with , such that (i) is completely determined by (by Lemma 15) and (ii) . Now, since is a stable set and , we have . Together with (ii) above, this implies that . Thus, and hence is completely determined by and .
As is completely determined by , for a fixed , there are at most possibilities for . Moreover, as , the number of ways of choosing is at most . ∎
We now prove Lemma 16. We first need the following lemma that was proved by Alon and Chung in [AlonChung1988], and earlier by Haemers (Theorem 2.1.4 (i) of [Haemers1980]). We use the version of the lemma stated in [ABMS2012].
For all , we have