Scaling Limits of Random Pólya Trees

# Scaling Limits of Random Pólya Trees

Konstantinos Panagiotou Institute of Mathematics, Ludwig-Maximilians-Universität, Theresienstr. 39, 80333 Munich, Germany  and  Benedikt Stufler Institute of Mathematics, Ludwig-Maximilians-Universität, Theresienstr. 39, 80333 Munich, Germany
July 19, 2019
###### Abstract.

Pólya trees are rooted trees considered up to symmetry. We establish the convergence of large uniform random Pólya trees with arbitrary degree restrictions to Aldous’ Continuum Random Tree with respect to the Gromov-Hausdorff metric. Our proof is short and elementary, and it shows that the global shape of a random Pólya tree is essentially dictated by a large Galton-Watson tree that it contains. We also derive sub-Gaussian tail bounds for both the height and the width, which are optimal up to constant factors in the exponent.

## 1. Introduction and main results

Any connected graph with vertex set can be associated in a natural way with a metric space , where is defined as the length of a shortest path that contains and in . In this paper we consider the setting where is a random tree with vertices, and we study, as , several properties of the associated random metric space.

The most prominent and well-studied case that fits in our setting is when is a critical Galton-Watson random tree with vertices, where the offspring distribution has a finite non-zero variance. In the series [3, 4, 5] of seminal papers Aldous proved that the metric spaces associated to those trees admit a common and universal limit, the so-called Continuum Random Tree (CRT). Since then, the CRT has been shown to be the limit of various families of random combinatorial structures, in particular other distributions on trees, see e.g. Haas and Miermont [18] and references therein, planar maps, see e.g. Albenque and Marckert [2], Bettinelli [7], Caraceni [10], Curien, Haas and Kortchemski [12], Janson and Stefansson [20], Stufler [31], and certain families of graphs, see Panagiotou, Stufler, and Weller [28].

Here we study the class of Pólya trees, which are rooted trees (that is, there is a distinguished vertex called the root) considered up to symmetry, equipped with the uniform distribution. They are named after George Pólya, who developed a framework based on generating functions in order to study their properties [29]. The study of these objects, especially in random settings, poses significant difficulties: the presence of non-trivial symmetries makes it difficult to derive an explicit and handy description of the probability space at hand. To wit, random Pólya trees do not fit into well-studied models of random trees such as Galton-Watson trees, a fact that was widely believed and which was established rigorously by Drmota and Gittenberger [13].

The main contribution of this paper is a simple and short proof that establishes the scaling limit of random Pólya trees, with the additional benefit that it allows us to consider arbitrary degree restrictions. That is, we may restrict the outdegrees of the vertices to an arbitrary set (always including, of course, 0 and an interger , so that the trees are finite and non-trivial). Our proof also reveals a novel striking structural property that is of independent interest and that rectifies the common perception of random unlabelled trees. As already mentioned, it is known that random Pólya trees do not admit a simple probabilistic description. However, we argue that this barely fails to be the case, namely that a random Pólya tree “consists” in a well-defined sense of a large Galton-Watson tree having a random size to which small forests are attached – in other words, the global structure of a large random Pólya tree is similar to the structure of the Galton-Watson tree it contains.

###### Theorem 1.1.

Let  be an aribtrary set of nonnegative integers containing zero and at least one integer greater than or equal to two. Let  denote the uniform random Pólya tree with  vertices and vertex outdegrees in . Then there exists a constant  such that the metric space converges towards the continuum random tree in the Gromov-Hausdorff sense as tends to infinity.

In the theorem we use the normalization of Le Gall [23] and let denote the continuum random tree constructed from Brownian excursion, see Section 2 for the appropriate definitions. We also obtain explicit expressions for the scaling constant in our proof, see (5.8) and the subsequent equations.

Random Pólya trees were studied in several papers prior to this work. In particular, since the construction of the CRT in the early 90’s it was a long-standing conjecture [4, p. 55] that this model of random trees (without any degree restrictions) also allows the same scaling limit. The convergence of binary Pólya trees, that is, when the vertex outdegrees are restricted to the set , was established by Marckert and Miermont [26] using an appropriate trimming procedure. Later, in [18] the conjecture was proven by using different techniques; actually, a far more general result on the scaling limit of random trees satisfying a certain Markov branching property was shown. Among other results, the method in [18] allows also to study Pólya trees with some degree restrictions, where the vertex outdegrees have to be constrained in a set of the form or for . However, the question about the convergence of Pólya trees with arbitrary degree restrictions was open, and we answer it in this work with a simple argument.

Our next result is concerned with two extremal parameters. The height of a rooted tree is defined as the maximal distance of a vertex from the root, and the width is the maximal number of vertices at any fixed distance from the root.

###### Theorem 1.2.

Let  be an aribtrary set of nonnegative integers containing zero and at least one integer greater than or equal to two. Let  denote the uniform random Pólya tree with  vertices and vertex outdegrees in . Then there are constants such that

 P(\textscH(An)≥x)≤Cexp(−cx2/n),P(\textscW(An)≥x)≤Cexp(−cx2/n)

for all and .

Similar bounds were obtained by Addario-Berry, Devroye and Janson [1] for critical Galton-Watson trees with finite nonzero variance, conditioned to be large. Our proofs show that these bounds are (up to the choice of ) best possible. As a direct consequence of our results we obtain for the distribution of H that

 cΩn−1/2\textscH(An)(d)⟶\textscH(Te)andE[\textscH(An)p]∼c−pΩnp/2E[\textscH(Te)p]

for all . The distribution of the height is known and given by

 (1.1) \textscH(Te)(d)=sup0≤t1≤t2≤1e(t),

where is a Brownian excursion of duration one, and

 (1.2) P(\textscH(Te)>x)=2∑k≥1(4k2x2−1)exp(−2k2x2),

see [4, Ch. 3.1]. Its moments are also known and given by

 E[\textscH(Te)] =√π/2,E[\textscH(Te)p]=2−p/2p(p−1)Γ(p/2)ζ(p)for p≥2.

This follows from standard results for Brownian excursion by Chung [11], or by results of Rényi and Szekeres [30, Eq. (4.5)]) who calculated the moments of the limit distribution of the height of a class of trees that converges towards the CRT; see also [28].

### Methods

Our method relies on generating random Pólya trees using the framework of Boltzmann samplers [16, 8]. Our main insight is that this allows us to show that with high probability, that is, with probability tending to one as , the shape of the Pólya tree  is given by a subtree  with small subtrees that contain  vertices attached to each vertex. As a metric space, we argue that  is distributed like a critical Galton-Watson tree, whose offspring distribution even has finite exponential moments, conditioned on having a randomly drawn size concentrating around  times a constant. In particular, by using Aldous’s fundamental result [5], see also [24, 25], we obtain that  converges to a multiple of the CRT; moreover, the Gromov-Hausdorff distance of  and  converges in probability to zero, yielding the desired result. To prove Theorem 1.2 we then use tail-bounds for in order to obtain the corresponding bounds for the height of .

### Outline

The paper is structured as follows. In the next section we recall Aldous’ theorem regarding the convergence of Galton-Watson trees, and we introduce some notation that will be used throughout the paper. Since this paper is targeted to a probabilistic audience, we introduce in Section 3 all required combinatorial preliminaries, tailored to our specific aims. In Section 4 we give a formal definition of Pólya trees and derive the sampling algorithm that will be the basis of our analysis. Finally, in Section 5, which is the main novel contribution of this paper, we present the proofs of our main theorems.

## 2. Aldous’ fundamental theorem

### 2.1. Gromov-Hausdorff convergence

The exposition here is based on [9, Ch. 7] and [25]. A pointed metric space is a metric space together with a distinguished element that is often referred to as its root. A correspondence between two pointed metric spaces and is a subset such that , and for each there is a point with , and for each there is a point with . The distortion of the correspondence is given by

 dis(R)=sup(x1,y1),(x2,y2)∈R|dX(x1,x2)−dY(y1,y2)|.

If and are compact, the Gromov-Hausdorff distance between and is defined by

 dGH(X∙,Y∙)=12infRdis(R)∈[0,∞[,

with the index ranging over all correspondences between and , see [9, Thm. 7.3.25] and [25, Prop. 3.6].

Two pointed metric spaces are isometric, if there exists a distance preserving bijection between the two that also preserves the roots. The Gromov-Hausdorff distance does not change if one of the spaces is replaced by another isometric copy. Moreover, two pointed spaces have Gromov-Hausdorff distance zero, if and only if they are isometric, and the Gromov-Hausdorff distance satisfies the axioms of a premetric on the collection of compact pointed metric spaces, see [9, Thm. 7.3.30] and [25, Thm. 3.5]. We may thus view as a metric on the collection of all isometry classes of compact pointed metric spaces.

### 2.2. The continuum random tree

The continuum random tree (CRT) is a random metric space that is encoded by the Brownian excursion of duration one. We briefly introduce it following [25, 22]. Given an arbitrary continuous function satisfying we may define a premetric on the interval given by

 d(u,v)=f(u)+f(v)−2infu≤s≤vf(s)

for . Let denote the corresponding quotient space obtained by identifying points that have distance zero. We consider this space as being rooted at the equivalence class of . The random pointed metric space coded by the Brownian excursion of duration one is called the Brownian continuum random tree (CRT).

### 2.3. Plane trees and Aldous’ theorem

The Ulam-Harris tree is defined as an infinite rooted tree with vertex set consisting of finite sequences of natural numbers. The empty string is its root, and the offspring of any vertex is given by the concatenations . In particular, the labelling of the vertices induces a linear order on each offspring set. A plane tree is defined as a subtree of the Ulam-Harris tree that contains the root. Any plane tree is a pointed metric space with respect to the graph-metric and the root vertex . Hence random plane trees may be considered as random elements of the metric space .

Let be a random variable with support on . Then, a -Galton-Watson tree is the family tree of a Galton-Watson branching process with offspring distribution , interpreted as a (possibly infinite) plane tree. We call critical if . The following invariance principle giving a scaling limit for certain random plane trees is due to Aldous [5] and there exist various extensions, see for example [14, 15, 18].

###### Theorem 2.1.

Let be a critical -Galton-Watson tree conditioned on having vertices, with the offspring distribution having finite non-zero variance . As tends to infinity, with edges rescaled to length converges in distribution to the CRT, that is

 (Tn,σ2√ndTn,∅)(d)⟶(Te,dTe,¯0)

in the metric space .

In the following we use a more compact notation, writing and when refering to and .

### 2.4. Tail-bounds for the height and width

In [1, Thm. 1.2] the following tail-bounds were obtained.

###### Theorem 2.2.

Let be a critical -Galton-Watson tree conditioned on having vertices, with the offspring distribution having finite non-zero variance . Then there are constants such that for all and

 P(\textscH(Tn)≥x)≤Cexp(−cx2/n),P(\textscW(Tn)≥x)≤Cexp(−cx2/n).

## 3. Combinatorial preliminaries

We recall relevant notions and tools from combinatorics. In particular, we discuss constructions of combinatorial classes following Joyal [21] and Flajolet and Sedgewick [17], and give a brief account on Boltzmann samplers following Flajolet, Fusy, and Pivoteau [16]. This will be our main tools for studying the class of Pólya trees.

### 3.1. Combinatorial classes and generating series

A combinatorial class is a set together with a size-function . We require that for any the subset of all -sized elements is finite. The ordinary generating series of a class is defined as the formal power series

 C(z)=∑n∈N0|Cn|zn,

with denoting the number of elements of the set . We set .

### 3.2. Permutations

As an example of a combinatorial class, consider

 S=⋃n∈N0Sn

of all permutations with denoting the symmetric group of order . Its ordinary generating series is given by

 S(z)=∑n∈N0n!zn.

Recall that any permutation may be written in an essentially unique way as a product of disjoint cycles (corresponding to the orbits of the permutation). In the following we are going to let , denote the number of cycles of length , that is, with exactly elements, in this factorization. Here we count fixpoints as -cycles.

### 3.3. Operations on classes

#### 3.3.1. Product classes

Given two classes and we may form the product class as the set-theoretic product

 C⋅D=C×D

with the size-function given by for any and . It is a straightforward consequence, see also Chapter I.1. in [17], that

 (C⋅D)(z)=C(z)D(z).

#### 3.3.2. Multisets

We may also form the class of all multisets of elements of , that is, sets of the form

 {(C1,n1),…,(Ck,nk)}

with , being pairwise distinct and . Here the sum denotes the number of elements of the multiset. The size-function for multisets is given by

 |{(C1,n1),…,(Ck,nk)}|=k∑i=1|Ci|ni.

For any subset we may also form the class by restricting to multisets whose number of elements lies in .

In order to express the ordinary generating series for the class of multisets in , we require the concept of cycle index sums. Recall that for any permutation we let denote the number of cycles of length of .

###### Definition 3.1.

For any subset define the cycle index sum

 ZΩ(s1,s2,…)=∑k∈Ω1k!∑σ∈Sksσ11⋯sσkk.

For example, when a short calculation, see below, shows that

Indeed, for any permutation the series is an element of the space of all sequences in with finite support. Conversely, to any element correspond only permutations of order and their number is given by . Hence

 ZN0=∑m∈N(N)0∏i≥1smiimi!imi=∏i≥1∑mi≥0smiimi!imi=∏i≥1exp(si/i)=exp(∑i≥1si/i).

We may now express the ordinary generating series for a multiset of objects. This result is implicit in Harary and Palmer [19] as an application of Pólya’s Enumeration Theorem; see also Joyal [21, Prop. 9] for a formulation in a more general setting.

###### Proposition 3.2.

The ordinary generating series of a multiset class is given by

 \textscMSETΩ(C)(z)=ZΩ(C(z),C(z2),C(z3),…).

In particular,

 \textscMSET(C)(z)=exp(∑k≥1C(zk)/k).

### 3.4. Boltzmann samplers

Given a nonempty combinatorial class and a parameter with , we may consider the corresponding Boltzmann distribution on that assigns probability weight to any element . A Boltzmann sampler is a stochastic process that generates elements from according to the Boltzmann distribution with parameter . There are various rules according to which we may construct such samplers.

#### 3.4.1. Product

Let and be nonempty combinatorial classes and such that are finite. Then a Boltzmann sampler for the product is given as follows.

1. Draw an element using a Boltzmann sampler .

2. Draw independently an element using a Boltzmann sampler .

3. Return the pair .

See, for example, in Section 2 of [16] for the (simple) justification.

#### 3.4.2. Multiset

Let be a nonempty combinatorial class and a subset. Then for any parameter with a Boltzmann sampler is given as follows.

1. Draw a permutation from such that for each and

2. For each cycle of let denote its length. Draw a random graph using a Boltzmann sampler .

3. Return the multiset of -objects that contains any precisely times, with the index ranging over all cycles of .

For a proof see [8, Prop. 38] and [16, Thm. 4.2].

## 4. Random Pólya trees

### 4.1. Combinatorial decomposition of Pólya trees

Let be a subset containing and at least one integer . Let denote the combinatorial class of Pólya trees with vertex outdegrees in . Any Pólya tree is uniquely determined by the multiset

 M(A)={(A1,n1),…,(Ak,nk)}

of smaller Pólya trees obtained by removing the root vertex of . The tree has vertex outdegrees in if and only if the number of elements of the multiset lies in , and if each of its elements belongs to . Thus, letting denote the combinatorial class constisting of a single object with size , the map

 (4.1) AΩ→X⋅\textscMSET(AΩ),A↦(o,M(A))

is a size-preserving bijection. Using Proposition 3.2, this yields the equation

 (4.2) AΩ(z)=zZΩ(AΩ(z),AΩ(z2),…).

### 4.2. Enumerative properties

In this section we collect basic analytic facts regarding Pólya trees, which are frequently used in the proofs of the main theorems. The following result is obtained by applying a general enumeration theorem due to Bell, Burris and Yeats [6, Thm. 75]. Special cases such as for trees with less general vertex-degree restrictions are classical combinatorial results, see e.g. [17, Thm. VII.4] but also Pólya [29] and Otter [27]. We do provide an explicit proof for the readers convenience, but do not claim novelty of this result. Although it does not seem to be explicitly stated in this generality in the literature, it is implicit in the work [6] and the present proof summarizes the corresponding arguments.

###### Proposition 4.1.

Let  denote the radius of convergence of the ordinary generating function . Then the following holds.

1. We have that  and .

2. For some , the function  satisfies

 E(ρΩ+ϵ,AΩ(ρΩ)+ϵ)<∞.
3. For some constant , the number of Pólya trees with  vertices and outdegrees in   is given by

 [zn]AΩ(z)∼dΩn−3/2ρ−nΩ.
###### Proof.

We start with the proof of i). The series is dominated coefficentwise by the ordinary generating series of all Pólya trees and it is known that is analytic at the origin (see e.g. [17, Prop. VII.5] and [29, 27]). Hence . As formal power series we have by (4.2) that . The coefficients of all involved series are nonnegative, hence we may lift this identity of formal power series to an identity of real numbers. By assumption, and there is an integer such that . Thus, for all

 (4.3) AΩ(x)≥x(1+1ℓ!∑σ∈SℓAΩ(x)σ1AΩ(x2)σ2⋯AΩ(xℓ)σℓ)

with denoting the symmetric group of order and denoting the number of cycles of length of . In particular, by considering the summand for , we have that

 AΩ(x)≥x(AΩ(x))ℓ/ℓ!.

Since this implies that the limit is finite and hence is finite.

Moreover, considering the summand in (4.3) for a cycle of length yields that

 ∞>AΩ(ρΩ)≥ρΩ(AΩ(ρΩℓ))/ℓ!.

This implies that because otherwise . If , then (4.3) would imply that . Applying yields

 AΩ(1)≥1+AΩ(1),

which is clearly impossible. Hence our premise cannot hold and thus . We proceed with showing ii). We have that . The series is dominated coefficient-wise by

 zexp(w+∑i≥2AΩ(zi)/i).

Since it follows that there is an such that This establishes ii). To see the last claim, by a general enumeration result given in [6, Thm. 28] it follows that

 [zm]AΩ(z)∼gcd(Ω)√ρΩEz(ρΩ,AΩ(ρΩ))2πEww(ρΩ,AΩ(ρΩ))ρΩ−mm−3/2,m≡1modgcd(Ω).

### 4.3. A Boltzmann sampler for random Pólya trees

Let denote a subset containing and at least one integer . Recall that we let denote the combinatorial class of Pólya trees with vertex outdegrees in .

The size-preserving bijection in (4.1) between the classes and , where each tree corresponds to the multiset of trees pendling from its roots, allows us to construct a Boltzmann sampler for Pólya trees. The Boltzmann distribution is a measure on Pólya trees with an arbitrary number of vertices. However, any tree with vertices has the same probability, i.e., the distribution conditioned on the event that the generated tree has vertices is uniform. This will allow us to reduce the study of properties of a random Pólya tree with exactly vertices to the study of .

###### Lemma 4.2.

The following recursive procedure  terminates almost surely and draws a random Pólya tree with outdegrees in  according to the Boltzmann distribution with parameter , i.e. any object with  vertices gets drawn with probability .

2. Let be a random permutation drawn from the union of permutation groups  with distribution given by

 P(σ(v)=ν)=xAΩ(x)1k!AΩ(x)ν1AΩ(x2)ν2⋯AΩ(xk)νk

for each  and . Here  denotes the number of cycles of length  of the permutation . In particular,  is the number of fixpoints of .

3. If , then return the tree consisting of the root only and stop. Otherwise, for each cycle  of  let  denote its length and draw a Polya tree  by an independent recursive call to the sampler . Make  identical copies of the tree  and connect their roots to the vertex  by adding edges. Return the resulting tree and stop.

A Boltzmann-sampler for is also explicitly described in [8, Fig. 14, (1)]. (Note that the exposition given there contains a typo, as it corresponds to attaching only one copy of each tree in Step .)

We would like to justify the above procedure by applying the rules in Section 3.4 for obtaining samplers for products and multisets. Indeed, the product rule states, that a sampler may be obtained by taking a root-vertex (which correspons to calling ), calling the multiset sampler , and constructing a tree by connecting with the root-vertices of the obtained trees. The rule for multiset classes yields a procedure for that involves calls to for several . If we interpret these calls as independent copies of Boltzmann distributed random variables, then the rules stated in Section 3.4 guarantee that the resulting random Pólya tree follows a Boltzmann distribution with parameter . However, this procedure is not ”explicit”, as we do not specify how to obtain these copies. Hence, instead, we interpret the indepent calls as recursive calls to our constructed procedure, i.e. each call corresponds to again taking a root vertex and choosing (independently) multisets from , which again may cause further recursive calls. This is similar to a branching process. Of course, we need to justify that this recursive procedure terminates almost surely and samples according to a Boltzmann distribution with parameter . This justification is given in [16, Thm. 4.2] in a more general context for classes that may be recursively specified as in (4.1) using operations such as products and multiset classes.

### 4.4. Deviation Inequalities

We will make use of the following moderate deviation inequality for one-dimensional random walks found in most textbooks on the subject.

###### Lemma 4.3.

Let be family of independent copies of a real-valued random variable with . Let . Suppose that there is a such that for . Then there is a such that for every there is a number such that for all and

 P(|Sn/np|≥ϵ)≤2exp(−cϵ2n2p−1).

## 5. Proof of the main theorem

In the following  will always denote a set of nonnegative integers containing zero and at least one integer greater than or equal to two. Moreover,  will always denote a natural number that satisfies  and is large enough such that rooted trees with  vertices and outdegrees in  exist.

###### Proof of Theorem 1.1.

We begin the proof with a couple of auxiliary observations about the sampler from Lemma 4.2. Let us fix throughout. We may do so, since by Proposition 4.1 we have that and .

Suppose that we modify Step 1 to ”Start with a root vertex . If the argument of the sampler is (as opposed to for some ), then mark this vertex with the color blue.”. Then the resulting tree is still Boltzmann-distributed, but comes with a colored subtree which we denote by .

Note that  is distributed like a Galton-Watson tree without the ordering on the offspring sets. By construction, the offspring distribution  of  is given by the number of fixpoints of the random permutation drawn in Step 2. Thus, the probability generating function of  is

 (5.1) E[zξ]=ρΩAΩ(ρΩ)ZΩ(zAΩ(ρΩ),AΩ(ρ2Ω),AΩ(ρ3Ω),…).

Moreover, for any blue vertex  we may consider the forest  of the trees dangling from  that correspond to cycles of the permutation  with length at least two. Let  denote a random variable that is distributed like the number of vertices  in . Then the probability generating function of  is

 (5.2) E[zζ]=ρΩAΩ(ρΩ)ZΩ(AΩ(ρΩ),AΩ((zρΩ)2),AΩ((zρΩ)3),…).

Using Proposition 4.1 it follows that the generating functions and have radius of convergence strictly larger than one. Hence  and  have finite exponential moments. In particular, there are constants  such that for any

 (5.3) P(ξ≥s)≤ce−c′sandP(ζ≥s)≤ce−c′s.

Moreover, as we argue below,  has average value

 E[ξ]=(∂∂s1ZΩ)(AΩ(ρΩ),AΩ(ρ2Ω),…)ρΩ=1.

This can be shown as follows. Recall that the ordinary generating series satisfies the identity  with the series  given by

 E(z,w)=zZΩ(w,AΩ(z2),AΩ(z3),…).

In particular, we have that  with . Suppose that . Then by the implicit function theorem the function  has an analytic continuation in a neighbourhood of . But this contradicts Pringsheim’s theorem [17, Thm. IV.6], which states that the series  must have a singularity at the point  since all its coefficients are nonnegative real numbers. Hence we have  which is equivalent to .

With all these facts at hand we proceed with the proof of the theorem. Slightly abusing notation, we let denote the colored random tree drawn by conditioning the (modified) sampler on having exactly  vertices. That is, if we ignore the colors, is drawn uniformly among all Pólya trees of size with outdegrees in . Moreover, let denote the colored subtree of , and for any vertex of let denote the corresponding forest that consists of non-blue vertices. We will argue that with high probability there is a constant such that for all . Indeed, note that by Proposition 4.1,

 (5.4) P(|ΓAΩ(ρΩ)|=n)=ρnΩAΩ(ρΩ)[zn]AΩ(ρΩ)=Θ(n−3/2),

i.e. the probability is (only) polynomially small. Thus, for any , if we denote by  independent random variables that are distributed like

 P(∃v∈Tn:|Fn(v)|≥s)=P(∃v∈T:|F(v)|≥s∣|ΓAΩ(ρΩ)|=n)≤O(n3/2)P(∃1≤i≤n:ζi≥s).

Using (5.3) and setting  we get that  for an appropriate choice of . Thus, by the union bound

 (5.5) P(∀v∈Tn:|Fn(v)|≤Clogn)=1−o(1).
The typical shape of  thus consists of a colored tree with small forests attached to each of its vertices, compare with Figure 1. In particular, we have that the Gromov-Hausdorff distance between the rescaled trees  and  converges in probability to zero. We are going to show that there is a constant  such that  converges weakly towards the Brownian continuum random tree . This immediately implies that and we are done. Figure 1. The typical shape of the random Pólya tree with  vertices.

We are going to argue that the number of vertices in  concentrates around a constant multiple of . More precisely, we are going to show that for any exponent  we have with high probability that

 (5.6) |Tn|∈(1±n−s)n1+E[ζ].

To this end, consider the corresponding complementary event in the unconditioned setting

 |T|∉(1±n−s)|ΓAΩ(ρΩ)|1+E[ζ].

If this occurs, then we clearly also have that

 ∑v∈T(1+|F(v)|)=|ΓAΩ(ρΩ)|∉(1±Θ(n−s))(1+E[ζ])|T|.

Let  denote the corresponding event. From  (5.5) we know that with high probability  for all vertices  of . Hence, with high probability, say, . Using again (5.4)

 P(E∣|ΓAΩ(ρΩ)|=n)=O(n3/2)P(nlog2n≤|T|≤n,E)+o(1).

By applying the union bound, the latter probability is at most

 ∑n/log2n≤ℓ≤nP(ℓ∑i=1(1+ζi)∉(1±Θ(n−s))(1+E[ζ])ℓ).

Since the random variable  has finite exponential moments, we may apply the deviation inequality in Lemma 4.3 in order to bound this by o(1). Hence, (5.6) holds with probability tending to as becomes large. We are now going to prove that

 (5.7) √(1+E[ζ])σ2√nTn(d)⟶Te

with  denoting the variance of the random variable . This implies that

 (5.8) cΩAn/√n(d)⟶Te with cΩ=√(1+E[ζ])σ2

and we are done. Note that  and  may be computed explicitly from the expression of the probability generating functions in (5.1) and (5.2). We obtain that is given by

 σ2 =(∂2∂z2E[zξ]+∂∂zE[zξ]−(∂∂zE[zξ])2)(1) = ρΩAΩ(ρΩ)∂2ZΩ∂s21(AΩ(ρΩ),AΩ(ρ2Ω),…) +ρΩ∂ZΩ∂s1(AΩ(ρΩ),AΩ(ρ2Ω),…) −ρ2Ω(∂ZΩ∂s1(AΩ(ρΩ),AΩ(ρ2Ω),…))2.

Note that , as is not constant. Moreover,

 E[ζ] =(∂∂zE[zζ])(1)=ρΩAΩ(ρΩ)∑i≥2(∂∂siZΩ)(AΩ(ρΩ),AΩ(ρ2Ω),…)iρiΩA′Ω(ρiΩ),

where . Note that this expression is well-defined, since .

In order to show (5.8), let  denote a bounded, Lipschitz-continous function defined on the space  of isometry classes of compact metric spaces. Note that the tree  conditioned on having  vertices is distributed like the tree  conditioned on having  vertices. In particular, it is identically distributed to a -Galton-Watson tree  conditioned on having  vertices, which we denote by . Since (5.6) holds with high probability it follows that

 E[f(cΩTn/√n)]=o(1)+∑ℓ∈(1±n−s)n1+E[ζ]E[f(cΩTξℓ/√n)]P(|T|=ℓ).

Let denote the diameter of , i.e., the number of vertices on a longest path in . Since  was assumed to be Lipschitz-continuous it follows that

 ∣∣E[f(cΩTξℓ/√n)]−E[f(σTξℓ/2√ℓ)]∣∣≤an,ℓE[\textscD(Tξℓ)/√ℓ]

for a sequence  with  as  becomes large. Moreover, the average rescaled diameter  converges to a multiple of the expected diameter of the CRT  as  tends to infinity, see e.g. [1]. In particular, it is a bounded sequence. Since

 E[f(σTξℓ/2√ℓ)]→E[f(Te)]

as , it follows that

 E[f(cΩTn/√n)]→E[f(Te)]

as  becomes large. This completes the proof. ∎

###### Proof of Theorem 1.2.

We are going to use the notation of the previous proof. Let be given. Without loss of generality, we may assume throughout that . If the height of the tree satisfies , then or for at least one vertex . We are going to bound the probability for each of these events separately. By the tail bounds for conditioned Galton-Watson trees in Theorem 2.2 there exist constants such that for all and we have that

 P(\textscH(T)≥y∣|T|=ℓ)≤C1exp(−c1y2/ℓ).

Moreover, conditioned on having size is distributed like conditioned on having size . Hence

 (5.9) P(\textscH(Tn)≥x/2)=n∑ℓ=1P(|Tn|=ℓ)P(\textscH(T)≥x/2∣|T|=ℓ)≤C1exp(−c1x2/(4n)).

By (5.4) it holds that

 P(maxv∈Tn|Fn(v)|≥x/2) ≤O(n3/2)P(maxv∈T|F(v)|≥x/2,|ΓAΩ(ρΩ)|=n) ≤O(n5/2)P(ζ≥x/2).

As we assumed that , it follows by (5.4) that there are constants with

 n5/2P(ζ≥x/2)≤cexp((5/2)logn−c′x/2)≤cexp(−c