# Search trees: Metric aspects and strong limit theorems

###### Abstract

We consider random binary trees that appear as the output of certain standard algorithms for sorting and searching if the input is random. We introduce the subtree size metric on search trees and show that the resulting metric spaces converge with probability 1. This is then used to obtain almost sure convergence for various tree functionals, together with representations of the respective limit random variables as functions of the limit tree.

10.1214/13-AAP948 \volume24 \issue3 2014 \firstpage1269 \lastpage1297 \newproclaimremark[theorem]Remark \newproclaimexample[theorem]Example \newproclaimdefinition[theorem]Definition \newproclaimnotation[theorem]Notation \newproclaimhypothesis[theorem]Hypothesis

Metric search trees

A]\fnmsRudolf \snmGrübel\correflabel=e1]rgrubel@stochastik.uni-hannover.de

class=AMS] \kwd[Primary ]60B99 \kwd[; secondary ]60J10 \kwd68Q25 \kwd05C05 Doob–Martin compactification \kwdmetric trees \kwdpath length \kwdsilhouette \kwdsubtree size metric \kwdvector-valued martingales \kwdWiener index

60B99, 60J10, 68Q25, 05C05, Doob-Martin compactification, metric trees, path length, silhouette, subtree size metric, vector-valued martingales, Wiener index

## 1 Introduction

A sequential algorithm transforms an input sequence into an output sequence where, for all , depends on and only. Typically, the output variables are elements of some combinatorial family , each has a size parameter and is an element of the set of objects of size . In the probabilistic analysis of such algorithms, one starts with a stochastic model for the input sequence and is interested in certain aspects of the output sequence. The standard input model assumes that the ’s are the values of a sequence of independent and identically distributed random variables. For random input of this type, the output sequence then is the path of a Markov chain that is adapted to the family in the sense that

(1) |

Clearly, is highly transient—no state can be visited twice.

The special case we are interested in, and which we will use to demonstrate an approach that is generally applicable in the situation described above, is that of binary search trees and two standard algorithms, known by their acronyms BST (binary search tree) and DST (digital search tree). These are discussed in detail in the many excellent texts in this area, for example in Knuth3 (), Mahmoud1 () and Drmota09 (). Various functionals of the search trees, such as the height Dev86 (), the path length RegnierQS (), RoeQS (), the node depth profile J-H2001 (), CDJ2001 (), CR2004 (), CKMR2005 (), FHN2006 (), DJN2008 (), the subtree size profile Fu2008 (), DeGr3 (), the Wiener index Nein02 () and the silhouette GrSilh () have been studied, with methods spanning the wide range from generatingfunctionology to martingale methods to contraction arguments on metric spaces of probability distributions (neither of these lists is complete). Many of the results are asymptotic in nature, where the convergence obtained as may refer to the distributions or to the random variables themselves. As far as strong limit theorems are concerned, a significant step toward a unifying approach was made in the recent paper EGW (), where methods from discrete potential theory were used to obtain limit results on the level of the combinatorial structures themselves: In a suitable extension of the state space , the random variables converge almost surely as , and the limit generates the tail -field of the Markov chain. The results in EGW () cover a wide variety of structures; search trees are a special case. It should also be mentioned here that the use of boundary theory has a venerable tradition in connection with random walks; see KaimVersh () and Woess1 ().

Our aims in the present paper are the following. First, we use the algorithmic background for a direct proof of the convergence of the BST variables , as , to a limit object , and we obtain a representation of in terms of the input sequence . Second, we introduce the subtree size metric on finite binary trees. This leads to a reinterpretation of the above convergence in terms of metric trees. We also introduce a family of weighted variants of this metric, with parameter , and then identify the critical value with the property that the metric trees converge for and do not converge if . The value turns out to also be the threshold for compactness of the limit tree. Third, we use convergence at the tree level to (re)obtain strong limit theorems for three tree functionals—the path length, the Wiener index and a metric version of the silhouette.

These topics are treated in the next three sections, where each has its own introductory remarks.

## 2 Binary search trees

We first introduce some notation, mostly specific to binary trees, then discuss the two search algorithms and the associated Markov chains and finally recall the results from EGW () related to these structures, including an alternative proof of the main limit theorem.

### 2.1 Some notation

We write for the distribution of a random variable and , , for the various versions of the conditional distribution of given (the value of) a random variable or a -field . Further, is the one-point mass at , is the indicator function of the set [so that ], denotes the binomial distribution with parameters and , is the beta distribution with parameters and is the uniform distribution on the unit interval. We also write for the uniform distribution on a finite set .

With let

be the set of 0–1 sequences of length , , the set of all finite 0–1 sequences and the set of all infinite 0–1 sequences, respectively. The set has , the “empty sequence,” as its only element, and is the length of , that is, if . For each node we use

to denote its left and right direct descendant (child) and its direct ancestor (parent). We write for , if and for , that is, if is a prefix of ; the extension to is obvious. The prefix order is a partial order only, but there exists a unique minimum to any two nodes , their last common ancestor; again, this can be extended to elements of . Another ordering on can be obtained via the function ,

(2) |

This will be useful in various proofs, and also in connection with illustrations.

By a binary tree we mean a subset of the set of nodes that is prefix stable in the sense that and implies that . Informally, we regard the components of as a routing instruction leading to the vertex , where 0 means a move to the left, 1 a move to the right and the empty sequence is the root node. The edges of the tree are the pairs , . A node is external to a tree if it is not one of its elements, but its direct ancestor is; we write for the set of external nodes of . Finally,

(3) |

is the size of the subtree of rooted at (or the number of descendants of in , including ).

Let denote the (countable) set of finite binary trees, those of size (number of nodes) . The single element of is , the tree that consists of the root node only.

### 2.2 Search algorithms and Markov chains

Let be a sequence of pairwise distinct real numbers. The BST (binary search tree) algorithm stores these sequentially into labeled binary trees , , with and . For we have and . Given , we construct as follows: Starting at the root node we compare the next input value to the value attached to the node under consideration, and move to if and to otherwise, until an “empty” node (necessarily an external node of ) is found. Then and , for all .

Now let be a sequence of independent random variables with for all , and let be the random binary tree associated with the first of these. By construction, the label functions are monotone with respect to the -order of the tree nodes, that is, with as in (2),

(4) |

In particular, if we number the external nodes of from the left to the right, then the number of the node that receives is the rank of this value among , hence uniformly distributed on . This shows that the (deterministic) BST algorithm, when applied to the (random) input , results in a Markov chain with state space , start at and transition probabilities

(5) |

In words: We obtain by choosing one of the external nodes of uniformly at random and joining it to the tree. We refer to this construction as the BST chain.

For the DST (digital search tree) algorithm, the input values are infinite 0–1 sequences, that is, elements of . Given we again obtain a sequence of labeled binary trees, but now we use the components , , of the next input value as a routing instruction through , moving to from an occupied node if and to otherwise. As in the BST case we assume that the ’s are the values of a sequence of independent and identically distributed random variables , where the distribution of the ’s is now a probability measure on the measurable space , with the -field generated by the projections on the sequence elements, , . This -field is also generated by the sets

(6) |

It is easy to check that the intersection of two such sets is either empty or again of this form. This implies that is completely specified by its values , , and the DST analogue of (5) then is

(7) |

By the DST chain with driving distribution we mean a Markov chain with state space , start at and transition mechanism given by (7).

### 2.3 Doob–Martin compactification

We refer the reader to Doob’s seminal paper Doob1959 () and to the recent textbook Woess2 () for the main results of, background on and further references for the boundary theory for transient Markov chains. For the BST chain the Doob–Martin compactification has recently been obtained in EGW (): It can be described as the closure of the embedding of into the compact space , endowed with pointwise convergence, that is given by the standardized subtree size functional

with as defined in (3). Further, the elements of the boundary may be represented by probability measures on , with convergence of a sequence in meaning that

and for all if we have a sequence of elements of instead.

The general theory implies that converges almost surely to a limit with values in ; EGW () also contains a description of . The proof given there does not make use of the algorithmic background, but takes the transition mechanism (5) as its starting point. We now show that this background leads to a direct proof of , and to a representation of in terms of the input sequence.

We need some more notation. On we define a metric by

(8) |

On itself this gives the discrete topology, and the completion of with respect to leads to , a compact and separable metric space. This is also the ends compactification if we regard as the complete rooted binary tree. We extend the ’s to by

Because of

these sets are open and closed. Further,

hence is a -system that generates . Together these facts imply that weak convergence of probability measures to a probability measure on is equivalent to

(9) |

In view of

and convergence in the Doob–Martin topology is therefore equivalent to the weak convergence of probability measures on the metric space if we represent finite subsets of by the uniform distribution on .

Moreover, any sequence of probability measures on is tight, as is compact, and therefore has a limit point by Prohorov’s theorem Bill68 (), page 37. If is a convergent sequence for each , then there is only one such limit point, which means that converges weakly to some probability measure and that (9) holds. Finally, let

(10) |

be the time that the node becomes an element of the BST sequence. It is easy to see that the ’s are finite with probability 1.

###### Theorem 1

Let be the sequence of binary trees generated by the BST algorithm with input a sequence of independent and identically distributed random variables with .

[(a)]

With probability 1 the sequence converges weakly to a random probability measure on as .

The random variables

are independent, and for all .

Let , , and , , be as in part (b) of the theorem. The order property (4) of the labeled binary search trees implies that for a node with label , , the relation is equivalent to . Hence, by the law of large numbers,

with probability 1 for every . In view of

the one-point sets with elements from are open in the topology on . As assigns at most the value to such a set, it follows with the portmanteau theorem Bill68 (), page 11, that any limit point of this sequence is concentrated on . Parts (a) and (b) of the theorem now follow with the above general remarks on weak convergence of probability measures on .

For the proof of (c) we use the following well-known fact: The conditional distribution of , given and given that the value lands in an interval of the augmented order statistics, is the uniform distribution on , which implies that is the distribution of the normalized distance to the left endpoint of . For different -values these relative insertion positions are independent, hence , , are independent and uniformly distributed on the unit interval.

We note the following consequence of the representation in part (c) of the theorem: For a fixed let

with for be the path that connects to the root node. We then have

(11) | |||

(12) |

Note that the factors , , are independent and that they all have distribution .

Theorem 1 confirms the view expressed in Woess2 (), pages 191 and 218, that in specific cases embeddings (or boundaries) can generally be obtained directly on using the then available additional structure; here this turns out to be the algorithmic representation of the Markov chain. However, there are two additional benefits of the general theory: First, because of the space–time property (1) the limit generates the tail -field

associated with the sequence . This may serve as a starting point for the unification of strong limit theorems for functionals , of the discrete structures: If converges to in a “reasonable” space, then the limit , which is -measurable, must be a function of ; see, for example, Kall (), Lemma 1.13. The second general result is extremely useful in the context of the calculations that arise in specific applications of the theory: The conditional distribution of the chain given the value of is again a Markov chain, where the new transition probabilities can be obtained from the limit value and the old transition probabilities by a procedure that is known as Doob’s -transform. In the present situation it turns out that the conditional distribution of the BST chain, given , is the same as that of the DST chain driven by . We refer the reader to EGW () for details; the last statement appears there only for a specific , but the generalization to an arbitrary probability measure in the boundary is straightforward. Roughly, the embedded jump chains at the individual nodes are Pólya urns; for these the boundary has been obtained in BK1964 (), and from the general construction of the Doob–Martin boundary it is clear that the outcome is unaffected by the step from a Markov chain to its embedded jump chain. We collect some consequences in the following proposition, where

(13) |

are the elements of the natural filtration of the BST chain.

###### Proposition 2

With the notation and assumptions as in Theorem 1,

(14) |

and, for all ,

(15) |

Further, the variables are conditionally independent given .

## 3 Metric aspects

All trees in this paper are subgraphs of the complete binary tree, which has as its set of nodes and as its set of edges; in particular, our trees are specified by their node sets . In a tree metric the distance of any two nodes is the sum of the distances between successive nodes on the unique path from to , which means that such a metric is given by its values , , . For example, the metric in Section 2.3 has , and the canonical tree distance is given by . For our trees the prefix order further leads to

(16) |

Metric trees may also be interpreted as graphs with edge weight, where the edge receives the weight .

Our aim in this section is to rephrase the convergence of the BST sequence as a convergence of metric trees, and to show that this view leads to convergence with respect to stronger topologies. The situation here is much simpler than for Aldous’s continuum random tree where the Gromov–Hausdorff convergence of equivalence classes of metric trees is used; see EvansSF () and the references given there. In fact, the search trees considered here have node sets that grow monotonically to the full , so we may define convergence of a sequence of metric binary trees to to mean that

(17) |

which of course is equivalent to for all , . Note that and are both local metrics in the sense that does not depend on the tree as long as .

Motivated by the view in Section 2.3 of finite and infinite binary trees as probability measures on , we now introduce the (relative) subtree size metric, which assigns to the distance of and , that is,

if , and

for the complete tree and a probability measure on , where we assume that for all . Again, there is an algorithmic motivation: In terms of the BST mechanism, the weight of an edge is the (relative) number of times this edge has been traversed in the construction of the tree. These metrics depend on their tree in a global manner.

With this terminology in place we may now rephrase the convergence in Theorem 1 as the convergence in the sense of (17) of the finite metric trees to the infinite metric tree , almost surely and as .

By construction the Doob–Martin compactification is the weakest topology that allows for a continuous extension of the functions , . For the analysis of tree functionals stronger modes of convergence turn out to be useful; for example, do we have uniform convergence in (17)? Also, subtree sizes decrease along paths leading away from the root node, so we may consider a weight factor for the distance of a node to its parent that depends on the depth of the node: For all , we define the weighted subtree size metric with weight parameter by

in the finite and infinite case, respectively. Of course, with the subtree size metric reappears.

###### Theorem 3

Let be the smaller of the two roots of the equation , . Let , and be as in Theorem 1. {longlist}[(a)]

For , the metric space is compact with probability 1.

For , the metric space has infinite diameter with probability 1.

For , the metric spaces converge uniformly to as in the sense of

(18) |

For , and with for ,

We embed the metric trees into the linear space of all functions via

probability measures on become elements of by identifying with the function . In particular, we now write instead of . For let be the set of all with

Clearly, this gives a family of nested separable Banach spaces, with

We now show that, with the above identification,

(19) | |||||

(20) |

and that, for and as ,

(21) |

Clearly, (19) implies that with probability 1 if .

The basis for our proof of (19) and (20) is the connection of BST trees to branching random walks, a connection that has previously been used by several authors, especially for the analysis of the height of search trees; see the survey DevGelbesBuch () and the references given there. Let , , be a numbering of the nodes from such that

with as defined in (2). The key observation is that the variables

are the positions of the members of the th generation in a branching random walk with offspring distribution and with

for the point process of the positions of the children relative to their parent. Biggins Biggins () obtained several general results for such processes that we now specialize to the present offspring distribution and point process of relative positions. Let

and

(22) |

Note that

(23) |

and that, by definition of ,

(24) |

Finally, let be the number of particles in generation that are located to the left of .

Now suppose that . Let and . We adapt the upper bound argument in Biggins () to our present needs: For all and , with ,

By (24), . Choosing the optimal , which with (23) is easily seen to be greater than , leads to

with a finite constant that does not depend on . Hence

which in turn implies (19) by monotone convergence.

Suppose now that , so that by (24) for . By Biggins (), Theorem 2,

with probability 1. In particular, and again with probability 1,

Clearly, this implies (20).

For the proof of (21) we first consider the random variables , , for some fixed . We wish to relate these to , with as in (13). For this, we use the representation of in terms of given in Section 2.3, together with Proposition 2. We may assume that .

The representation (2.3), the conditional independence of the -variables given , and the well-known formula for the first moment of beta distributions together lead to

In view of

the product telescopes to

(25) |

We now introduce

Then is a vector-valued martingale. For we have by part (a) of the theorem that with probability 1 and that , hence almost surely and in mean in by Proposition V-2-6 in NeveuMart ().

In our present representation of trees as functions on we have

which implies that for all . As pointwise with probability 1 by Theorem 1 we can now use a suitable version of the dominated convergence theorem, such as that given in Kall (), Theorem 1.21, to obtain that converges to in as , again almost surely and in mean.

It remains to show that the tree statements in the theorem follow from the linear space statements (19), (20) and (21).

For (a) we prove that the limiting metric space is totally bounded. From (19) and the definition of the norm we obtain for any given a such that

which by the definition of the weighted subtree size metric means that all nodes with have a distance from their predecessor at level that is less than . As there are only finitely many nodes of level less than this shows that the whole of may be covered by a finite number of -balls. Of course, this argument is meant to be applied to each element of a suitable set of probability 1 separately.

We note that the convergence of metric trees considered in Theorem 3 implies the convergence with respect to the Gromov–Hausdorff distance of the corresponding equivalence classes of metric trees; see Burago (), Section 7.3.3.

The subtree size metric also leads to a visualization of search trees: We use the function defined in (2) to map nodes to points in the unit interval, and above the -coordinate we draw a line parallel to the -axis from to . In order to obtain a visually more pleasing result we may add lines that run parallel to the -axis, connecting nodes with the same parent. In Figure 1 we have carried this out for the trees arising from two separate input sequences for the BST algorithm, with the data obtained from alternating blocks of length 10 of digits in the decimal expansion of . The upper part refers to the odd and the lower to the even numbered blocks. In both cases we have given the trees for and , and with . Vertically, the trees are from the same distribution; moving horizontally to the right, we have almost sure convergence.

## 4 Tree functionals

In this section we show how the above results can be used in connection with the asymptotic analysis of tree functionals. Here is the recipe: We start with a functional of the trees, with (deterministic) functions on that have values in some separable Banach space . We suspect that converges almost surely to some limit variable as . We know that if this is the case, then for some defined on (as always, almost surely). We do not know what is, but if we manage to rewrite the ’s in terms of subtree sizes, then Theorem 1 may lead to an educated guess. On that basis we next consider , assuming that . This gives an -valued martingale. By the associated convergence theorem we then have that converges to almost surely and in mean. Finally, a simple inspection of may reveal that is asymptotically negligible—indeed, if converges to , then must tend to 0.

In the first three subsections we work out the details of the above strategy for path lengths, for a tree index and for an infinite dimensional tree functional. The final subsection is a collection of remarks on other functionals and related tree structures, indicating further applications of the method, but also its limitations. The potential-theoretic approach can provide additional insight; for example, we will relate a martingale introduced in connection with tree profiles to Doob’s -transform.

Throughout this section we abbreviate to .

### 4.1 Path length

The first tree functional we consider is the internal path length,

(26) |

which may be rewritten as

(27) |

Let

be the harmonic numbers. It is well known that

where is Euler’s constant. We need two auxiliary statements; we omit the (easy) proofs.

###### Lemma 4

For all ,

For a random variable