Scaling Limits of Random Pólya Trees
Pólya trees are rooted trees considered up to symmetry. We establish the convergence of large uniform random Pólya trees with arbitrary degree restrictions to Aldous’ Continuum Random Tree with respect to the Gromov-Hausdorff metric. Our proof is short and elementary, and it shows that the global shape of a random Pólya tree is essentially dictated by a large Galton-Watson tree that it contains. We also derive sub-Gaussian tail bounds for both the height and the width, which are optimal up to constant factors in the exponent.
1. Introduction and main results
Any connected graph with vertex set can be associated in a natural way with a metric space , where is defined as the length of a shortest path that contains and in . In this paper we consider the setting where is a random tree with vertices, and we study, as , several properties of the associated random metric space.
The most prominent and well-studied case that fits in our setting is when is a critical Galton-Watson random tree with vertices, where the offspring distribution has a finite non-zero variance. In the series [3, 4, 5] of seminal papers Aldous proved that the metric spaces associated to those trees admit a common and universal limit, the so-called Continuum Random Tree (CRT). Since then, the CRT has been shown to be the limit of various families of random combinatorial structures, in particular other distributions on trees, see e.g. Haas and Miermont  and references therein, planar maps, see e.g. Albenque and Marckert , Bettinelli , Caraceni , Curien, Haas and Kortchemski , Janson and Stefansson , Stufler , and certain families of graphs, see Panagiotou, Stufler, and Weller .
Here we study the class of Pólya trees, which are rooted trees (that is, there is a distinguished vertex called the root) considered up to symmetry, equipped with the uniform distribution. They are named after George Pólya, who developed a framework based on generating functions in order to study their properties . The study of these objects, especially in random settings, poses significant difficulties: the presence of non-trivial symmetries makes it difficult to derive an explicit and handy description of the probability space at hand. To wit, random Pólya trees do not fit into well-studied models of random trees such as Galton-Watson trees, a fact that was widely believed and which was established rigorously by Drmota and Gittenberger .
The main contribution of this paper is a simple and short proof that establishes the scaling limit of random Pólya trees, with the additional benefit that it allows us to consider arbitrary degree restrictions. That is, we may restrict the outdegrees of the vertices to an arbitrary set (always including, of course, 0 and an interger , so that the trees are finite and non-trivial). Our proof also reveals a novel striking structural property that is of independent interest and that rectifies the common perception of random unlabelled trees. As already mentioned, it is known that random Pólya trees do not admit a simple probabilistic description. However, we argue that this barely fails to be the case, namely that a random Pólya tree “consists” in a well-defined sense of a large Galton-Watson tree having a random size to which small forests are attached – in other words, the global structure of a large random Pólya tree is similar to the structure of the Galton-Watson tree it contains.
Let be an aribtrary set of nonnegative integers containing zero and at least one integer greater than or equal to two. Let denote the uniform random Pólya tree with vertices and vertex outdegrees in . Then there exists a constant such that the metric space converges towards the continuum random tree in the Gromov-Hausdorff sense as tends to infinity.
In the theorem we use the normalization of Le Gall  and let denote the continuum random tree constructed from Brownian excursion, see Section 2 for the appropriate definitions. We also obtain explicit expressions for the scaling constant in our proof, see (5.8) and the subsequent equations.
Random Pólya trees were studied in several papers prior to this work. In particular, since the construction of the CRT in the early 90’s it was a long-standing conjecture [4, p. 55] that this model of random trees (without any degree restrictions) also allows the same scaling limit. The convergence of binary Pólya trees, that is, when the vertex outdegrees are restricted to the set , was established by Marckert and Miermont  using an appropriate trimming procedure. Later, in  the conjecture was proven by using different techniques; actually, a far more general result on the scaling limit of random trees satisfying a certain Markov branching property was shown. Among other results, the method in  allows also to study Pólya trees with some degree restrictions, where the vertex outdegrees have to be constrained in a set of the form or for . However, the question about the convergence of Pólya trees with arbitrary degree restrictions was open, and we answer it in this work with a simple argument.
Our next result is concerned with two extremal parameters. The height of a rooted tree is defined as the maximal distance of a vertex from the root, and the width is the maximal number of vertices at any fixed distance from the root.
Let be an aribtrary set of nonnegative integers containing zero and at least one integer greater than or equal to two. Let denote the uniform random Pólya tree with vertices and vertex outdegrees in . Then there are constants such that
for all and .
Similar bounds were obtained by Addario-Berry, Devroye and Janson  for critical Galton-Watson trees with finite nonzero variance, conditioned to be large. Our proofs show that these bounds are (up to the choice of ) best possible. As a direct consequence of our results we obtain for the distribution of H that
for all . The distribution of the height is known and given by
where is a Brownian excursion of duration one, and
see [4, Ch. 3.1]. Its moments are also known and given by
This follows from standard results for Brownian excursion by Chung , or by results of Rényi and Szekeres [30, Eq. (4.5)]) who calculated the moments of the limit distribution of the height of a class of trees that converges towards the CRT; see also .
Our method relies on generating random Pólya trees using the framework of Boltzmann samplers [16, 8]. Our main insight is that this allows us to show that with high probability, that is, with probability tending to one as , the shape of the Pólya tree is given by a subtree with small subtrees that contain vertices attached to each vertex. As a metric space, we argue that is distributed like a critical Galton-Watson tree, whose offspring distribution even has finite exponential moments, conditioned on having a randomly drawn size concentrating around times a constant. In particular, by using Aldous’s fundamental result , see also [24, 25], we obtain that converges to a multiple of the CRT; moreover, the Gromov-Hausdorff distance of and converges in probability to zero, yielding the desired result. To prove Theorem 1.2 we then use tail-bounds for in order to obtain the corresponding bounds for the height of .
The paper is structured as follows. In the next section we recall Aldous’ theorem regarding the convergence of Galton-Watson trees, and we introduce some notation that will be used throughout the paper. Since this paper is targeted to a probabilistic audience, we introduce in Section 3 all required combinatorial preliminaries, tailored to our specific aims. In Section 4 we give a formal definition of Pólya trees and derive the sampling algorithm that will be the basis of our analysis. Finally, in Section 5, which is the main novel contribution of this paper, we present the proofs of our main theorems.
2. Aldous’ fundamental theorem
2.1. Gromov-Hausdorff convergence
The exposition here is based on [9, Ch. 7] and . A pointed metric space is a metric space together with a distinguished element that is often referred to as its root. A correspondence between two pointed metric spaces and is a subset such that , and for each there is a point with , and for each there is a point with . The distortion of the correspondence is given by
If and are compact, the Gromov-Hausdorff distance between and is defined by
Two pointed metric spaces are isometric, if there exists a distance preserving bijection between the two that also preserves the roots. The Gromov-Hausdorff distance does not change if one of the spaces is replaced by another isometric copy. Moreover, two pointed spaces have Gromov-Hausdorff distance zero, if and only if they are isometric, and the Gromov-Hausdorff distance satisfies the axioms of a premetric on the collection of compact pointed metric spaces, see [9, Thm. 7.3.30] and [25, Thm. 3.5]. We may thus view as a metric on the collection of all isometry classes of compact pointed metric spaces.
2.2. The continuum random tree
The continuum random tree (CRT) is a random metric space that is encoded by the Brownian excursion of duration one. We briefly introduce it following [25, 22]. Given an arbitrary continuous function satisfying we may define a premetric on the interval given by
for . Let denote the corresponding quotient space obtained by identifying points that have distance zero. We consider this space as being rooted at the equivalence class of . The random pointed metric space coded by the Brownian excursion of duration one is called the Brownian continuum random tree (CRT).
2.3. Plane trees and Aldous’ theorem
The Ulam-Harris tree is defined as an infinite rooted tree with vertex set consisting of finite sequences of natural numbers. The empty string is its root, and the offspring of any vertex is given by the concatenations . In particular, the labelling of the vertices induces a linear order on each offspring set. A plane tree is defined as a subtree of the Ulam-Harris tree that contains the root. Any plane tree is a pointed metric space with respect to the graph-metric and the root vertex . Hence random plane trees may be considered as random elements of the metric space .
Let be a random variable with support on . Then, a -Galton-Watson tree is the family tree of a Galton-Watson branching process with offspring distribution , interpreted as a (possibly infinite) plane tree. We call critical if . The following invariance principle giving a scaling limit for certain random plane trees is due to Aldous  and there exist various extensions, see for example [14, 15, 18].
Let be a critical -Galton-Watson tree conditioned on having vertices, with the offspring distribution having finite non-zero variance . As tends to infinity, with edges rescaled to length converges in distribution to the CRT, that is
in the metric space .
In the following we use a more compact notation, writing and when refering to and .
2.4. Tail-bounds for the height and width
In [1, Thm. 1.2] the following tail-bounds were obtained.
Let be a critical -Galton-Watson tree conditioned on having vertices, with the offspring distribution having finite non-zero variance . Then there are constants such that for all and
3. Combinatorial preliminaries
We recall relevant notions and tools from combinatorics. In particular, we discuss constructions of combinatorial classes following Joyal  and Flajolet and Sedgewick , and give a brief account on Boltzmann samplers following Flajolet, Fusy, and Pivoteau . This will be our main tools for studying the class of Pólya trees.
3.1. Combinatorial classes and generating series
A combinatorial class is a set together with a size-function . We require that for any the subset of all -sized elements is finite. The ordinary generating series of a class is defined as the formal power series
with denoting the number of elements of the set . We set .
As an example of a combinatorial class, consider
of all permutations with denoting the symmetric group of order . Its ordinary generating series is given by
Recall that any permutation may be written in an essentially unique way as a product of disjoint cycles (corresponding to the orbits of the permutation). In the following we are going to let , denote the number of cycles of length , that is, with exactly elements, in this factorization. Here we count fixpoints as -cycles.
3.3. Operations on classes
3.3.1. Product classes
Given two classes and we may form the product class as the set-theoretic product
with the size-function given by for any and . It is a straightforward consequence, see also Chapter I.1. in , that
We may also form the class of all multisets of elements of , that is, sets of the form
with , being pairwise distinct and . Here the sum denotes the number of elements of the multiset. The size-function for multisets is given by
For any subset we may also form the class by restricting to multisets whose number of elements lies in .
In order to express the ordinary generating series for the class of multisets in , we require the concept of cycle index sums. Recall that for any permutation we let denote the number of cycles of length of .
For any subset define the cycle index sum
For example, when a short calculation, see below, shows that
Indeed, for any permutation the series is an element of the space of all sequences in with finite support. Conversely, to any element correspond only permutations of order and their number is given by . Hence
We may now express the ordinary generating series for a multiset of objects. This result is implicit in Harary and Palmer  as an application of Pólya’s Enumeration Theorem; see also Joyal [21, Prop. 9] for a formulation in a more general setting.
The ordinary generating series of a multiset class is given by
3.4. Boltzmann samplers
Given a nonempty combinatorial class and a parameter with , we may consider the corresponding Boltzmann distribution on that assigns probability weight to any element . A Boltzmann sampler is a stochastic process that generates elements from according to the Boltzmann distribution with parameter . There are various rules according to which we may construct such samplers.
Let and be nonempty combinatorial classes and such that are finite. Then a Boltzmann sampler for the product is given as follows.
Draw an element using a Boltzmann sampler .
Draw independently an element using a Boltzmann sampler .
Return the pair .
See, for example, in Section 2 of  for the (simple) justification.
Let be a nonempty combinatorial class and a subset. Then for any parameter with a Boltzmann sampler is given as follows.
Draw a permutation from such that for each and
For each cycle of let denote its length. Draw a random graph using a Boltzmann sampler .
Return the multiset of -objects that contains any precisely times, with the index ranging over all cycles of .
4. Random Pólya trees
4.1. Combinatorial decomposition of Pólya trees
Let be a subset containing and at least one integer . Let denote the combinatorial class of Pólya trees with vertex outdegrees in . Any Pólya tree is uniquely determined by the multiset
of smaller Pólya trees obtained by removing the root vertex of . The tree has vertex outdegrees in if and only if the number of elements of the multiset lies in , and if each of its elements belongs to . Thus, letting denote the combinatorial class constisting of a single object with size , the map
is a size-preserving bijection. Using Proposition 3.2, this yields the equation
4.2. Enumerative properties
In this section we collect basic analytic facts regarding Pólya trees, which are frequently used in the proofs of the main theorems. The following result is obtained by applying a general enumeration theorem due to Bell, Burris and Yeats [6, Thm. 75]. Special cases such as for trees with less general vertex-degree restrictions are classical combinatorial results, see e.g. [17, Thm. VII.4] but also Pólya  and Otter . We do provide an explicit proof for the readers convenience, but do not claim novelty of this result. Although it does not seem to be explicitly stated in this generality in the literature, it is implicit in the work  and the present proof summarizes the corresponding arguments.
Let denote the radius of convergence of the ordinary generating function . Then the following holds.
We have that and .
For some , the function satisfies
For some constant , the number of Pólya trees with vertices and outdegrees in is given by
We start with the proof of i). The series is dominated coefficentwise by the ordinary generating series of all Pólya trees and it is known that is analytic at the origin (see e.g. [17, Prop. VII.5] and [29, 27]). Hence . As formal power series we have by (4.2) that . The coefficients of all involved series are nonnegative, hence we may lift this identity of formal power series to an identity of real numbers. By assumption, and there is an integer such that . Thus, for all
with denoting the symmetric group of order and denoting the number of cycles of length of . In particular, by considering the summand for , we have that
Since this implies that the limit is finite and hence is finite.
Moreover, considering the summand in (4.3) for a cycle of length yields that
This implies that because otherwise . If , then (4.3) would imply that . Applying yields
which is clearly impossible. Hence our premise cannot hold and thus . We proceed with showing ii). We have that . The series is dominated coefficient-wise by
Since it follows that there is an such that This establishes ii). To see the last claim, by a general enumeration result given in [6, Thm. 28] it follows that
4.3. A Boltzmann sampler for random Pólya trees
Let denote a subset containing and at least one integer . Recall that we let denote the combinatorial class of Pólya trees with vertex outdegrees in .
The size-preserving bijection in (4.1) between the classes and , where each tree corresponds to the multiset of trees pendling from its roots, allows us to construct a Boltzmann sampler for Pólya trees. The Boltzmann distribution is a measure on Pólya trees with an arbitrary number of vertices. However, any tree with vertices has the same probability, i.e., the distribution conditioned on the event that the generated tree has vertices is uniform. This will allow us to reduce the study of properties of a random Pólya tree with exactly vertices to the study of .
The following recursive procedure terminates almost surely and draws a random Pólya tree with outdegrees in according to the Boltzmann distribution with parameter , i.e. any object with vertices gets drawn with probability .
Start with a root vertex .
Let be a random permutation drawn from the union of permutation groups with distribution given by
for each and . Here denotes the number of cycles of length of the permutation . In particular, is the number of fixpoints of .
If , then return the tree consisting of the root only and stop. Otherwise, for each cycle of let denote its length and draw a Polya tree by an independent recursive call to the sampler . Make identical copies of the tree and connect their roots to the vertex by adding edges. Return the resulting tree and stop.
A Boltzmann-sampler for is also explicitly described in [8, Fig. 14, (1)]. (Note that the exposition given there contains a typo, as it corresponds to attaching only one copy of each tree in Step .)
We would like to justify the above procedure by applying the rules in Section 3.4 for obtaining samplers for products and multisets. Indeed, the product rule states, that a sampler may be obtained by taking a root-vertex (which correspons to calling ), calling the multiset sampler , and constructing a tree by connecting with the root-vertices of the obtained trees. The rule for multiset classes yields a procedure for that involves calls to for several . If we interpret these calls as independent copies of Boltzmann distributed random variables, then the rules stated in Section 3.4 guarantee that the resulting random Pólya tree follows a Boltzmann distribution with parameter . However, this procedure is not ”explicit”, as we do not specify how to obtain these copies. Hence, instead, we interpret the indepent calls as recursive calls to our constructed procedure, i.e. each call corresponds to again taking a root vertex and choosing (independently) multisets from , which again may cause further recursive calls. This is similar to a branching process. Of course, we need to justify that this recursive procedure terminates almost surely and samples according to a Boltzmann distribution with parameter . This justification is given in [16, Thm. 4.2] in a more general context for classes that may be recursively specified as in (4.1) using operations such as products and multiset classes.
4.4. Deviation Inequalities
We will make use of the following moderate deviation inequality for one-dimensional random walks found in most textbooks on the subject.
Let be family of independent copies of a real-valued random variable with . Let . Suppose that there is a such that for . Then there is a such that for every there is a number such that for all and
5. Proof of the main theorem
In the following will always denote a set of nonnegative integers containing zero and at least one integer greater than or equal to two. Moreover, will always denote a natural number that satisfies and is large enough such that rooted trees with vertices and outdegrees in exist.
Proof of Theorem 1.1.
Suppose that we modify Step 1 to ”Start with a root vertex . If the argument of the sampler is (as opposed to for some ), then mark this vertex with the color blue.”. Then the resulting tree is still Boltzmann-distributed, but comes with a colored subtree which we denote by .
Note that is distributed like a Galton-Watson tree without the ordering on the offspring sets. By construction, the offspring distribution of is given by the number of fixpoints of the random permutation drawn in Step 2. Thus, the probability generating function of is
Moreover, for any blue vertex we may consider the forest of the trees dangling from that correspond to cycles of the permutation with length at least two. Let denote a random variable that is distributed like the number of vertices in . Then the probability generating function of is
Using Proposition 4.1 it follows that the generating functions and have radius of convergence strictly larger than one. Hence and have finite exponential moments. In particular, there are constants such that for any
Moreover, as we argue below, has average value
This can be shown as follows. Recall that the ordinary generating series satisfies the identity with the series given by
In particular, we have that with . Suppose that . Then by the implicit function theorem the function has an analytic continuation in a neighbourhood of . But this contradicts Pringsheim’s theorem [17, Thm. IV.6], which states that the series must have a singularity at the point since all its coefficients are nonnegative real numbers. Hence we have which is equivalent to .
With all these facts at hand we proceed with the proof of the theorem. Slightly abusing notation, we let denote the colored random tree drawn by conditioning the (modified) sampler on having exactly vertices. That is, if we ignore the colors, is drawn uniformly among all Pólya trees of size with outdegrees in . Moreover, let denote the colored subtree of , and for any vertex of let denote the corresponding forest that consists of non-blue vertices. We will argue that with high probability there is a constant such that for all . Indeed, note that by Proposition 4.1,
i.e. the probability is (only) polynomially small. Thus, for any , if we denote by independent random variables that are distributed like
Using (5.3) and setting we get that for an appropriate choice of . Thus, by the union bound
We are going to argue that the number of vertices in concentrates around a constant multiple of . More precisely, we are going to show that for any exponent we have with high probability that
To this end, consider the corresponding complementary event in the unconditioned setting
If this occurs, then we clearly also have that
By applying the union bound, the latter probability is at most
Since the random variable has finite exponential moments, we may apply the deviation inequality in Lemma 4.3 in order to bound this by o(1). Hence, (5.6) holds with probability tending to as becomes large. We are now going to prove that
with denoting the variance of the random variable . This implies that
Note that , as is not constant. Moreover,
where . Note that this expression is well-defined, since .
In order to show (5.8), let denote a bounded, Lipschitz-continous function defined on the space of isometry classes of compact metric spaces. Note that the tree conditioned on having vertices is distributed like the tree conditioned on having vertices. In particular, it is identically distributed to a -Galton-Watson tree conditioned on having vertices, which we denote by . Since (5.6) holds with high probability it follows that
Let denote the diameter of , i.e., the number of vertices on a longest path in . Since was assumed to be Lipschitz-continuous it follows that
for a sequence with as becomes large. Moreover, the average rescaled diameter converges to a multiple of the expected diameter of the CRT as tends to infinity, see e.g. . In particular, it is a bounded sequence. Since
as , it follows that
as becomes large. This completes the proof. ∎
Proof of Theorem 1.2.
We are going to use the notation of the previous proof. Let be given. Without loss of generality, we may assume throughout that . If the height of the tree satisfies , then or for at least one vertex . We are going to bound the probability for each of these events separately. By the tail bounds for conditioned Galton-Watson trees in Theorem 2.2 there exist constants such that for all and we have that
Moreover, conditioned on having size is distributed like conditioned on having size . Hence
By (5.4) it holds that
As we assumed that , it follows by (5.4) that there are constants with