Inversions in split trees and conditional Galton–Watson trees

# Inversions in split trees and conditional Galton–Watson trees

Xing Shi Cai, Cecilia Holmgren, Svante Janson, Tony Johansson
Department of Mathematics, Uppsala University, Sweden
{xingshi.cai, cecilia.holmgren, svante.janson, tony.johansson}@math.uu.se
This work was partially supported by two grants from the Knut and Alice Wallenberg Foundation and a grant from the Swedish Research Council.
Fiona Skerman
Heilbronn Institute, Bristol University, UK
f.skerman@bristol.ac.uk
###### Abstract

We study , the number of inversions in a tree with its vertices labeled uniformly at random, which is a generalization of inversions in permutations. We first show that the cumulants of have explicit formulas involving the -total common ancestors of (an extension of the total path length). Then we consider , the normalized version of , for a sequence of trees . For fixed ’s, we prove a sufficient condition for to converge in distribution. As an application, we identify the limit of for complete -ary trees. For being split trees [15], we show that converges to the unique solution of a distributional equation. Finally, when ’s are conditional Galton–Watson trees, we show that converges to a random variable defined in terms of Brownian excursions. By exploiting the connection between inversions and the total path length, we are able to give results that are stronger and much broader compared to previous work by Panholzer and Seitz [45].

## 1 Introduction

### 1.1 Inversions in a fixed tree

Let be a permutation of . If and , then the pair is called an inversion. The concept of inversions was introduced by Cramer [13] (1750) due to its connection with solving linear equations. More recently, the study of inversions has been motivated by its applications in the analysis of sorting algorithms, see, e.g., [36, Section 5.1]. Many authors, including Feller [20, pp. 256], Sachkov [51, pp. 29], Bender [6], have shown that the number of inversions in uniform random permutations has a central limit theorem. More recently, Margolius [41] and Louchard and Prodinger [38] studied permutations containing a fixed number of inversions.

The concept of inversions can be generalized as follows. Consider an unlabeled rooted tree on node set . Let denote the root. Write if is a proper ancestor of , i.e., the unique path from to passes through and . Write if is an ancestor of , i.e., either or . Given a bijection (a node labeling), define the number of inversions

 I(T,λ)def=∑uλ(v).

Note that if is a path, then is nothing but the number of inversions in a permutation. Our main object of study is the random variable , defined by where is chosen uniformly at random from the set of bijections from to .

The enumeration of trees with a fixed number of inversions has been studied by Mallows and Riordan [40] and Gessel et al. [24] using the so called inversions polynomial. While analyzing linear probing hashing, Flajolet et al. [22] noticed that the numbers of inversions in Cayley trees with uniform random labeling converges to an Airy distribution. Panholzer and Seitz [45] showed that this is true for conditional Galton–Watson trees, which encompasses the case of Cayley trees.

For a node , let denote the size of the subtree rooted at . The following representation of , proved in Section 2, is the basis of most of our results:

###### Lemma 1.

Let be a fixed tree. Then

 I(T)d=∑v∈VZv, (1.1)

where are independent random variables, and .

We will generally be concerned with the centralized number of inversions, i.e., . For any we have . Let denote the depth of , i.e., the distance from to the root . It immediately follows that,

 E[I(T)]=∑uλ(v)]=12Υ(T), (1.2)

where is called the total path length (or internal path length) of .

Let denote the -th cumulant of a random variable (provided it exists); thus and (see [26, Theorem 4.6.4]). We now define , the -total common ancestors of , which allows us to generalize (1.2) to higher cumulants of . For nodes (not necessarily distinct), let be the number of ancestors that they share, i.e.,

 c(v1,…,vk)def=|{u∈V:u≤v1,u≤v2,…,u≤vk}|.

We define

 Υk(T)def=∑v1,…,vkc(v1,…,vk), (1.3)

where the sum is over all ordered -tuples of nodes in the tree. For a single node , , since itself is counted in . So ; i.e., we recover the usual notion of total path length.

###### Theorem 1.

Let be a fixed tree. Let be the -th cumulant of . Then

 E[I(T)] =ϰ1(I(T))=12Υ(T)=12(Υ1(T)−|V|), (1.4) Var(I(T)) =ϰ2(I(T))=112(Υ2(T)−|V|), (1.5)

and, more generally, for ,

 ϰ2k+1(I(T))=0,ϰ2k(I(T))=B2k2k(Υ2k(T)−|V|), (1.6)

where denotes the -th Bernoulli number. Moreover, has the moment generating function

 E[etI(T)]=∏v∈Vezvt−1zv(et−1), (1.7)

and for the centralized variable we have the estimate

 E[et(I(T)−E[I(T)])]≤exp(18t2∑v∈T(zv−1)2)≤exp(18t2∑v∈Tz2v)=exp(18t2Υ2(T)),t∈R. (1.8)
###### Remark 1.

Recalling that and for , (1.4)–(1.6) can also be written as

 ϰk(I(T))=Bkk(−1)k(Υk(T)−|V|),k≥1.
###### Remark 2.

Higher moments and central moments can be calculated from the cumulants by standard formulas [52]. (Note that all odd central moments vanish by symmetry.) For example, recalling , Theorem 1 implies that

 E[(I(T)−E[I(T)])4]=3ϰ2(I(T))2+ϰ4(I(T))=148(Υ2(T)−|V|)2−1120(Υ4(T)−|V|). (1.9)

### 1.2 Inversions in sequences of trees

The total path length has been studied for random trees like split trees [8] and conditional Galton–Watson trees [3, Corollary 9]. This leads us to focus on the deviation

 Xn=I(Tn)−E[I(Tn)]s(n),

under some appropriate scaling , for a sequence of (random or fixed) trees , where has size .

#### Fixed trees

###### Theorem 2.

Let be a sequence of fixed trees on nodes. Let

 Xn=I(Tn)−E[I(Tn)]√Υ2(Tn).

Assume that for all ,

 Υ2k(Tn)Υ2(Tn)k→ζ2k, (1.10)

for some sequence . Then there exists a unique distribution with

 ϰ2k−1(X)=0,ϰ2k(X)=B2k2kζ2k,k≥1, (1.11)

such that and, moreover, for every .

###### Remark 3.

By Theorem 1, Thus, it is natural to consider , where we use .

###### Remark 4.

The functions and are called moment generating functions of and respectively. The convergence in a neighborhood of implies that and is uniformly integrable for all ; thus for all and for all integers . See, e.g., [26, Theorem 5.9.5].

As simple examples, we consider two extreme cases.

###### Example 1.

When is a path of nodes, we have for fixed

 Υk(Pn)∼1k+1nk+1.

Thus for . So by Theorem 2, converges to a normal distribution, and we recover the central limit law for inversions in permutations. Also, the vertices have subtree sizes and so we also recover from Theorem 1 the moment generating function [51, 41].

###### Example 2.

Let , a star with leaves, and denote the root by . We have and for . Hence, by Lemma 1, or directly, , and consequently

 (I(Tn)−E[I(Tn)])/nd⟶Unif[−12,12]. (1.12)

This follows also by Theorem 2, since for (e.g., by Lemma 3 below).

It is straightforward to compute the -total common ancestors for -ary trees. Thus our next result follows immediately from Theorem 2.

###### Theorem 3.

Let and let be the complete -ary tree of height with nodes. Let

 Xn=I(Tn)−E[I(Tn)]n,andX=∑d≥0bd∑j=1Ud,jbd,

where are independent . Then and , for every . Moreover is the unique random variable with

 ϰ2k−1(X)=0,ϰ2k(X)=B2k2kb2k−1b2k−1−1,k≥1. (1.13)

#### Random trees

We move on to random trees. We consider generating a random tree and, conditioning on , labeling its nodes uniformly at random. The relation (1.2) is maintained for random trees:

 E[I(Tn)]=E[E[I(Tn)∣Tn]]=12E[Υ(Tn)]. (1.14)

The deviation of from its mean can be taken to mean two different things. Consider for some scaling function ,

 Xn=I(Tn)−E[I(Tn)]s(n),Yn=I(Tn)−E[I(Tn)∣Tn]s(n)=I(Tn)−12Υ(Tn)s(n). (1.15)

Then and each measure the deviation of , unconditionally and conditionally. They are related by the identity

 Xn=Yn+Wn/2, (1.16)

where

 Wn=Υ(Tn)−E[Υ(Tn)]s(n). (1.17)

In the case of fixed trees and , but for random trees we consider the sequences separately.

We consider two classes of random trees — split trees and conditional Galton–Watson trees.

#### Split trees

The first class of random trees which we study are split trees. They were introduced by Devroye [15] to encompass many families of trees that are frequently used in algorithm analysis, e.g., binary search trees [27], -ary search trees [46], quad trees [21], median-of- trees [53], fringe-balanced trees [14], digital search trees [11] and random simplex trees [15, Example 5].

A split tree can be constructed as follows. Consider a rooted infinite -ary tree where each node is a bucket of finite capacity . We place balls at the root, and the balls individually trickle down the tree in a random fashion until no bucket is above capacity. Each node draws a split vector from a common distribution, where describes the probability that a ball passing through the node continues to the th child. The trickle-down procedure is defined precisely in Section 4. Any node such that the subtree rooted as contains no balls is then removed, and we consider the resulting tree .

In the context of split trees we differentiate between (the number of inversions on nodes), and (the number of inversions on balls). In the former case, the nodes (buckets) are given labels, while in the latter the individual balls are given labels. For balls , write if the node containing is a proper ancestor of the node containing ; if are contained in the same node we do not compare their labels. Define

 ^I(Tn)=∑β1<β21λ(β1)>λ(β2).

Similarly define as the total path length on balls, i.e., the sum of the depth of all balls. And let

 ^Xn=^I(Tn)−E[^I(Tn)]n,^Yn=^I(Tn)−s0^Υ(Tn)/2n,^Wn=^Υ(Tn)−E[^Υ(Tn)]n. (1.18)

Here is a fixed integer denoting the number of balls in any internal node, and we have (formally justified in Section 4). The following theorem gives the limiting distributions of the random vector . In Section 4.4 we state a similar result for under stronger assumptions. Note that the concepts are identical for any class of split trees where each node holds exactly one ball, such as binary search trees, quad trees, digital search trees and random simplex trees.

Let denote the Mallows metric, also called the minimal metric (defined in Section 4). Let be the set of probability measures on with zero mean and finite second moment.

###### Theorem 4.

Let be a split tree and let be a split vector. Define

 μ=−b∑i=1E[VilnVi],andD(V)=1μb∑i=1VilnVi.

Assume that and . Let be the unique solution in for the system of fixed-point equations

 (1.19)

Here , , are independent, with for , and for . Then the sequence defined in (1.18) converges to in and in moment generating function within a neighborhood of the origin.

The proof of Theorem 4 uses the contraction method, introduced by Rösler [48] for finding the total path length of binary search trees. The technique has been applied to -dimensional quad trees by Neininger and Rüschendorf [43] and to split trees in general by Broutin and Holmgren [8]. The contraction method also has many other applications in the analysis of recursive algorithms, see, e.g., [49, 50, 44].

###### Remark 5.

We assume that , for otherwise we trivially have and Theorem 4 reduces to Theorem 2.1 in [8].

###### Remark 6.

In a recent paper, Janson [33] showed that preferential attachment trees and random recursive trees can be viewed as split trees with infinite-dimensional split vectors. Thus we conjecture that the contraction method should also be applicable for these models and give results similar to Theorem 4.

###### Remark 7.

Assume that the constant split vector is used and each node holds exactly one ball (a special case of digital search trees, see [14, Example 7]). Then and (1.19) has the unique solution , where has the limiting distribution for inversions in complete -ary trees (see Theorem 3). This is as expected, as the shape of a split tree with these parameters is likely to be very similar to a complete -ary tree.

#### Conditional Galton–Watson trees

Finally, we consider conditional Galton–Watson trees (or equivalently, simply generated trees), which were introduced by Bienaymé [7] and Watson and Galton [54] to model the evolution of populations. A Galton–Watson tree starts with a root node. Then recursively, each node in the tree is given a random number of child nodes. The numbers of children are drawn independently from the same distribution called the offspring distribution.

A conditional Galton–Watson tree is a Galton–Watson tree conditioned on having nodes. It generalizes many uniform random tree models, e.g., Cayley trees, Catalan trees, binary trees, -ary trees, and Motzkin trees. For a comprehensive survey, see Janson [31]. For recent developments, see [32, 37, 16, 9].

In a series of three seminal papers, Aldous showed that converges under re-scaling to a continuum random tree, which is a tree-like object constructed from a Brownian excursion [2, 3, 4]. Therefore, many asymptotic properties of conditional Galton–Watson trees, such as the height and the total path length, can be derived from properties of Brownian excursions [3]. Our analysis of inversions follows a similar route. In particular, we relate to the Brownian snake studied by e.g., Janson and Marckert [35].

In the context of Galton–Watson trees, Aldous [3, Corollary 9] showed that converges to an Airy distribution. We will see that the standard deviation of is of order , which by the decomposition (1.16) implies that converges to the same Airy distribution, recovering one of the main results of Panholzer and Seitz [45, Theorem 5.3]. Our contribution for conditional Galton–Watson trees is a detailed analysis of under the scaling function .

Let be the random path of a standard Brownian excursion, and define for .

We define a random variable, see [30],

 ηdef=∫[0,1]2C(s,t)ds dt=4∫0≤s≤t≤1mins≤u≤te(u). (1.20)
###### Theorem 5.

Suppose is a conditional Galton–Watson tree with offspring distribution such that , , and for some , and define

 Yn=I(Tn)−12Υ(Tn)n5/4.

Then we have

 Ynd⟶Ydef=1√12σ√η N, (1.21)

where is a standard normal random variable, independent from the random variable defined in (1.20). Moreover, for all fixed .

The moments of and are known [34], see Section 5.

The rest of the paper is organized as follows. In Section 2, we prove Lemma 1 and Theorem 1. The results for fixed trees (Theorems 2, 3) are presented in Section 3. Split trees and conditional Galton–Watson trees are considered in Sections 4 and 5 respectively. Sections 4 and 5 are essentially self-contained, and the interested reader may skip ahead.

## 2 A fixed tree

In this section we study a fixed, non-random tree . We begin with proving Lemma 1, which shows that is a sum of independent uniform random variables.

###### Proof of Lemma 1.

We define and note that

 I(T)def=∑uλ(v)=∑u∈V(∑v:v>u1λ(u)>λ(v))=∑u∈VZu, (2.1)

showing (1.1). Let denote the subtree rooted at . It is clear that conditioned on the set , restricted to is a uniformly random labeling of into . Recall that denotes the size of . If the elements of are and if , then . As is uniformly distributed, so is .

We prove independence of the by induction on . The base case is trivial. Let be the subtrees rooted at the children of the root , and condition on the sets . Given these sets, restricted to is a uniformly random labeling of using the given labels , and these labelings are independent for different . Hence, conditioning on , the families are independent, and each is distributed as the corresponding family for the tree .

Consequently, by induction, still conditioned on , are independent, with . Furthermore, , and is determined by (as the only label not in ). Hence the family of independent random variables is also independent of , and thus are independent. This completes the induction, and thus the proof. ∎

Our first use of the representation in Lemma 1 is to prove Theorem 1, which gives both a formula for the moment generating function and explicit formulas for the cumulants of for a fixed . The proof begins with a simple lemma giving the cumulants and the moment generating function of in Lemma 1, from which Theorem 1 will follow immediately.

Recall that the Bernoulli numbers can be defined by their generating function

 ∞∑k=0Bkxkk!=xex−1 (2.2)

(convergent for ), see, e.g., [17, (24.2.1)]. Recall also , and , and that for .

###### Lemma 2.

Let , and let be uniformly distributed on . Then , and, more generally,

 ϰk(ZN)=Bkk(Nk−1),k≥2, (2.3)

where is the -th Bernoulli number. The moment generating function of is

 E[etZN]=eNt−1N(et−1). (2.4)
###### Proof.

This is presumably well-known, but we include a proof for completeness. The moment generating function of is

 E[etZN]=1NN−1∑j=0ejt=eNt−1N(et−1), (2.5)

verifying (2.4). The function is analytic and non-zero in the disc , and thus has there a well-defined analytic logarithm

 f(t):=loget−1t, (2.6)

with . By (2.5) and (2.6), the cumulant generating function of can be written as

 logE[etZN]=f(Nt)−f(t). (2.7)

Differentiating (2.6) yields (for )

 f′(t)=ddt(log(et−1)−logt)=etet−1−1t=1et−1+1−1t, (2.8)

and thus, using (2.2),

 tf′(t)=tet−1+t−1=∞∑k=0Bktkk!−1+t=∞∑k=2Bktkk!+12t. (2.9)

Consequently,

 f(t)=∞∑k=2Bkktkk!+12t, (2.10)

and thus by (2.7)

 logE[etZN]=∞∑k=2Bkk(Nk−1)tkk!+N−12t. (2.11)

The results on cumulants follow. (Of course, is more simply calculated directly.) ∎

###### Remark 8.

Similarly, using (2.10), or by (2.3) and a limiting argument, if or , then , .

Recall that in the introduction, we defined

 c(v1,…,vk)def=|{u:u≤v1,…,u≤vk}|,

i.e., is the number of common ancestors of .

###### Lemma 3.

Let denote the number of vertices in subtree rooted at . Then for ,

 ∑vzkv=Υk(T)def=∑v1,…,vkc(v1,…,vk).
###### Proof.

It is easily seen that

 ∑uzu=∑u,v1{u≤v}=∑vc(v). (2.12)

Similarly,

 ∑uz2u=∑u,v,w1{u≤v,u≤w}=∑v,wc(v,w). (2.13)

More generally,

 ∑uzku=∑uk∏i=1(∑vi1{u≤vi})=∑v1,…,vkc(v1,…,vk).\qed
###### Remark 9.

Observe that all common ancestors of the vertices must lie on a path; stretching from the last common ancestor to the root. Define a related parameter to be the sum over all -tuples of the length of this path (rather than number of vertices in the path). We call this the -common path length. Now and has appeared in various contexts, see for example [30] (where it is denoted ). Let denote the last common ancestor of the vertices and . It is easy to see that, with ,

 Υ′k(T)def=∑v1,…,vkh(v1∧⋯∧vk)=∑v1,…,vk(c(v1,…,vk)−1)=Υk(T)−nk,

and by Lemma 3, , so .

###### Remark 10.

Let be a star with leaves and root . Then is the number of embeddings such that for each . Similarly the -common path-length is the number of such embeddings such that for each .

###### Proof of Theorem 1.

Since cumulants are additive for sums of independent random variables, an immediate consequence of Lemmas 1 and 2 is that

 ϰk(I(T))=Bkk∑v∈V(zkv−1)=Bkk(Υk(T)−|V|),k≥1. (2.14)

where the last equality follows from Lemma 3. The fact that was noted already in (1.2).

Similarly, (1.7) follows from Lemma 1 and (2.5).

For the estimate (1.8), note first, e.g. by Taylor expansions, that for every real . It follows that if is any symmetric random variable with , then

 E[etU]=E[cosh(tU)]≤ea2t2/2. (2.15)

(See [28, (4.16)] for a more general result.) Lemma 1 thus implies, applying (2.15) to each ,

 (2.16)

which yields (1.8), using also Lemma 3. ∎

## 3 A sequence of fixed trees

In this section, we study

 Xn=I(Tn)−E[I(Tn)]s(n),

where is a sequence of fixed trees and is an appropriate normalization factor. We start by proving Theorem 2, a sufficient condition for to converge in distribution when .

###### Proof of Theorem 2.

First . For , note that shifting a random variable does not change its -th cumulant. Also note that . Therefore, it follows from Theorem 1 that

 ϰk(Xn)=ϰk(I(Tn))(Υ2(Tn)−n)k/2=BkkΥk(Tn)−n(Υ2(Tn)−n)k/2∼BkkΥk(Tn)Υ2(Tn)k/2,k≥2.

Recall that all odd Bernoulli numbers except are zero. Thus letting for all odd , the assumption that for all implies that

 ϰk(Xn)→Bkkζk,k≥1.

Since every moment can be expressed as a polynomial in cumulants, it follows that every moment converges, . Thus to show that there exists an such that , it suffices to show that the moment generating function stays bounded for all small fixed ; we shall show that this holds for all real . In fact, using Lemma 3,

 ∑v(zv−1)2≤∑v(z2v−1)=Υ2(Tn)−n≤Υ2(Tn). (3.1)

Hence, (1.8) yields

 E[etXn]≤exp(18(t/√Υ2(Tn))2∑v(zv−1)2)≤exp(18t2),t∈R. (3.2)

This and the moment convergence imply the claims in the theorem. ∎

### 3.1 The complete b-ary tree

We prove Theorem 3, which asserts that for complete -ary trees the limiting variable of is the unique for which for even and zero for odd . Fix . In the complete -ary tree of height , each node at depth has subtree size . Hence Lemma 1 implies that where

 Zd,j∼Unif{−am,d−12,−am,d−22,…,am,d−22,am,d−12}

are independent random variables. Let be independent . Approximating and noticing that , intuitively we should have for large ,

 Xn=m∑d=0bd∑j=1am,dn⋅Zd,jam,d≈∑d≥0bd∑j=1