Balanced vertices in rooted trees

Balanced vertices in labeled rooted trees

Miklós Bóna Department of Mathematics, University of Florida, Little Hall, PO Box , Gainesville, FL, (USA) bona@ufl.edu
July 14, 2019
Abstract.

In a rooted tree, we call a vertex balanced if it is at equal distance from all its descendant leaves. We count balanced vertices in three different tree varieties. For decreasing binary trees, we can prove that the probability that a vertex chosen uniformly at random from the set of all trees of a given size is balanced is monotone decreasing.

Introduction

Various parameters of many models of random rooted trees are fairly well understood if they relate to a near-root part of the tree or to global tree structure. The first group includes the numbers of vertices at given distances from the root, the immediate progeny sizes for vertices near the top, and so on. See [5] for a comprehensive treatment of these results.

Not surprisingly, the technical details of fringe analysis become quite complex as soon as the focus shifts to layers of vertices further away from the leaves. So while there are explicit results on the (limiting) fraction of vertices at a fixed, small distance from the leaves, an asymptotic behavior of this fraction, as a function of the distance, remained an open problem. Recently, Boris Pittel and the present author have studied this family of questions in [2]. Most work in fringe analysis focused on decreasing binary trees, which are also called binary search trees. We will explain the reason for that. However, in this paper, we will discuss questions that we can successfully investigate for other tree varieties as well.

We call a vertex of a rooted tree balanced if all descending paths from to a leaf have the same length. In other words, if is any leaf that is a descendant of , then the unique path from to consists of edges, where does not depend on the choice of . The number is called the rank of .

1. Decreasing binary trees

A decreasing binary tree on vertex set is a binary plane tree in which every vertex has a smaller label than its parent. This means that the root must have label , every vertex has at most two children, and that every child is either an left child or a right child of its parent, even if is the only child of its parent.

Decreasing binary trees on vertex set are in bijection with permutations of . In order to see this, let be a permutation. The decreasing binary tree of , which we denote by , is defined as follows. The root of is a vertex labeled , the largest entry of . If is the largest entry of on the left of , and is the largest entry of on the right of , then the root will have two children, the left one will be labeled , and the right one will be labeled . If is the first (resp. last) entry of , then the root will have only one child, and that is a left (resp. right) child, and it will necessarily be labeled as must be the largest of all remaining elements. Define the rest of recursively, by taking and , where and are the substrings of on the two sides of , and affixing them to and . See Figure 1 for an illustration.

Figure 1. The tree for .

Note that in the tree shown in Figure 1, all vertices are balanced, except 8, 9, and 6. Also note that in any decreasing binary tree, vertex can be balanced only if all its descendants are balanced.

1.1. Balanced vertices of a fixed rank

Recall that a vertex is balanced if all descending paths from to a leaf have the same length. That length is the rank of . Let be the total number of all balanced vertices of rank in all decreasing binary trees of size . Let , and let be the total number of such trees on vertices whose root is balanced of rank .

Proposition 1.1.

The differential equation

(1)

holds, with the initial condition .

Proof.

Note that is the number of ordered pairs , where is a balanced vertex of rank in a decreasing binary tree on vertex set , in other words, is the number of decreasing binary trees on with a balanced vertex of rank marked. If the marked vertex is not the root of , then removing the root of , we get, on the one hand, a structure on that is counted by , and, on the other hand, a decreasing binary tree and a decreasing binary tree with a balanced vertex of rank marked. By the Product formula of exponential generating functions, these pairs of trees are counted by the generating function . The factor 2 is needed since the order of the obtained two trees matters, and is just the generating function of the sequence of factorials, hence of the sequence enumerating decreasing binary trees. Finally, if the marked vertex is the root of tree, then removing it we just get a structure enumerated by . ∎

The special case of counts leaves, which are of rank 0, and are trivially balanced. Indeed, we have , so . Therefore (1) reduces to

with . The solution of this differential equation is indeed

which is the generating function for the number of all leaves of all decreasing binary trees on vertices.

If , then , since both trees on two vertices, and both trees on three vertices in which the root has two children, have a root that is balanced of rank 1. So (1) reduces to

with . The solution of this initial value problem is

Further generating functions can theoretically be computed, but one runs out of computing power very fast. A crucial difference from earlier work such as [2] is that for all , the generating function is a polynomial function, since if the root of a tree is balanced and is of rank , then that tree cannot have more than vertices. We are now going to show that this implies that for all , the generating function is always a rational function of denominator .

Corollary 1.2.

Let be a fixed nonnegative integer. Let be the probability that a vertex chosen uniformly from the set of all vertices of all decreasing binary trees on is balanced, and is of rank . Then for any fixed , the limit

exist. Furthermore, Let , that is, let be the numerator of . Then .

Note that the fact that the limits exist can also be proved using the techniques of additive functionals as explained in [6]. (The number of balanced vertices in a rooted tree is an additive functional.) However, we provide a self-contained proof here.

Proof.

Solving the linear differential equation (1) for , we get that

(2)

where the constant of integration is to be chosen so that the initial condition is satisfied. Recall that , and therefore, , is a polynomial function. Therefore, the numerator of the right-hand side of (2) is a polynomial function.

Now let , where is a polynomial, and and are complex numbers. Note that . Then

If is larger than the degree of the polynomial , then equating coefficients of on both sides, we get that

So

Note that it is easy to see that for all . Indeed, is certainly at least as large as the probability that a randomly selected vertex is of rank and is the root of a perfect binary tree (one in which every non-leaf vertex has exactly two children), and even this last probability stays above a positive constant as goes to infinity. See [1] for details.

The simple form of is the reason that decreasing binary trees are easier to analyze from this aspect than other tree varieties. Using the above corollary, we get that , , and . This shows that for large , about 65.7 percent of all vertices of decreasing binary trees are balanced and of rank at most three. More computation shows that for sufficiently large, about 66.62 percent of all vertices are balanced and of rank at most four, and about 66.84 percent are balanced and of rank at most five.

1.2. A result about monotonicity

As decreasing binary trees are in bijection with permutations, we have additional tools analyzing them. This enables to prove the following strong result. We could not prove similar results for other labeled rooted trees.

Theorem 1.3.

Let be the probability that a vertex chosen uniformly at random from the set of all vertices of all decreasing binary trees on is balanced. Then the sequence is weakly decreasing.

Let be the probability that the root of a randomly selected tree on vertices is balanced, and is of rank . Set for all . We start by an inequality for the numbers for fixed .

Lemma 1.4.

For all and all fixed , the inequality holds.

Proof.

We prove the statement by induction on . The statement is true for all if , since in that case, for all and all . Now let us assume that the statement is true for and prove it for .

Let be a permutation of length . The probability that the largest entry of is in position for any is . The root of is balanced of rank if and only if all its children are balanced of rank , so

(3)

Replacing by , we get the analogous formula

(4)

The initial difficulty is that while the summands in (3) are smaller than their counterparts in (4), there is one more of them. The crucial observation is that if we remove the smallest summand from the numerator of (3), then the remaining summands of that numerator can be matched with the summands of the numerator of (4), so that in each pair, the summand coming from (4) is at least as large as the summand coming from (3). That will prove that the numerator of (3) is at most times as large as the numerator of (4).

Let be the index for which is minimal, so that last product is the minimal summand in the numerator of (3). First we look at indices smaller than . Note that by the induction hypothesis, , so

(5)

Let us sum these inequalities for , to get

(6)

Now we consider indices larger than . Again, by the induction hypothesis, , so

(7)

Let us sum these inequalities for , to get

(8)

Adding (6) and (8), we get an inequality whose right-hand side agrees with the sum in (4), and whose left-hand side is the sum in (3), except the summand of the latter indexed by . However, that summand was the smallest of the summands in the sum in (3), which implies that

This is equivalent to our claim. ∎

Corollary 1.5.

Let be the probability that the root of a decreasing binary tree on is balanced. Then .

Proof.

It follows from our definitions that and . As Lemma 1.4 shows that for , the only issue that we must consider is that the sum that provides has one more summand than the sum that provides . However, this is not a problem, since for all , we have , while and , so

This inequality, and applying Lemma 1.4 for all , proves our claim. ∎

Proof.

(of Theorem 1.3.) Induction on , the initial case of being obvious. In order to prove that , note that a random vertex of a tree of size has probability to be the root. Furthermore, if , then there is an probability that a random vertex of a random tree is part of the left subtree of that root, and that left subtree has vertices. Indeed, the left subtree of the root has vertices if and only if is in position of the corresponding permutation, and each of the vertices of that left subtree are equally likely to be chosen. The same argument applies for right subtrees. This proves that

Therefore, the inequality is equivalent to the inequality

(9)

In order to prove (9), note that the first equality in (9) shows that is obtained as the average of the summands in the numerator of the fraction that is equal to . The average value of a set of real numbers does not change if we add the average value of the set to the set as a new element. In this case, that average value is , proving that

(10)

So (9) will be proved if we can show that

Noting that by Corollary 1.5, it suffices to prove that

which simplifies to the inequality

(11)

Finally, (11) holds, because the right-hand side of (10) grows if we replace by , (since in any given tree, the root can only be balanced if all vertices are balanced), which leads to the inequality

which is clearly equivalent to (11).

As the sequence is weakly decreasing, its limit exists. Note that as we mentioned in Section 1.1. On the other hand, the number of balanced vertices of rank larger than five is certainly at most as large as the number of all vertices of rank larger than five, and it is known [2] that the latter is about 0.00125 times the total number of vertices. This proves that

2. Non-plane 1-2 trees

In these trees, the vertices are still bijectively labeled by the elements of , each vertex has a smaller label than its parent, and each non-leaf vertex has one or two children, but ”left” and ”right” do not matter anymore. See Figure 2 for an illustration. It is well known [3] that the number of such trees is the Euler number , and that the exponential generating function of the Euler numbers is

See sequence A000111 in the On-line Encyclopedia of Integer Sequences [7] for the many occurrences of these numbers in Combinatorics.

Figure 2. The five rooted non-plane 1-2 trees on vertex set .

Let be the exponential generating function for the number of all balanced vertices of rank in all non-plane 1-2 trees on , and let be the exponential generating function for the number of non-plane 1-2 trees on in which the root is balanced and of rank .

Theorem 2.1.

The differential equation

(12)

holds, with the initial condition .

Proof.

Let be an ordered pair in which is a non-plane 1-2 tree on vertex set and is a balanced vertex of rank of . Then is the exponential generating function counting such pairs. Let us first assume that is not the root of , and let us remove the root of . On the one hand, this leaves a structure that is counted by . On the other hand, this leaves an ordered pair consisting of a non-plane 1-2 tree with a vertex of order marked, and a non-plane 1-2 tree. By the Product formula of exponential generating functions, such ordered pairs are counted by the generating function . Finally, was the root of , then the root of was balanced and of rank . Such trees are counted by , or, after the removal of their root, by . ∎

Crucially, the generating function , and hence, its derivative , are polynomials, which enables us to explicitly solve the linear differential equation (12). Indeed, the solution is

(13)

where the integral in the numerator is an elementary function since the integral of is an elementary function for all positive integers . (The constant of integration is chosen so that the initial condition is satisfied.) Note that we are not able to count all verties of rank in a similar fashion, since the generating function for the number of non-plane 1-2 trees in which the root is of rank is not a polynomial, and the solutions analogous to (13) will not be elementary functions for . Even and are not elementary functions.

2.1. The number of all vertices

So that we could compute the probability that a randomly selected vertex of a randomly selected non-plane 1-2 tree on is balanced and of rank , we need to know the size of the number of all vertices in all such trees. The asymptotics of the Euler numbers are well-known (see [5], for example), but to keep the paper self-contained, we provide an argument here at the level of precision that we will need. See any introductory textbook on Complex analysis, such as [4] for the relevant notions.

As , the dominant singularities of are at and , so the coefficients of are of exponential order . (Note that the singularity at is removable.) However, we need a little more more precision. The following proposition will provide that.

Proposition 2.2.

Let be a function so that and are analytic functions, , while and . Then

We can apply Proposition 2.2 to at with and if we note that . Then Proposition 2.2 implies that . The singularity of at is removable, since exists, so .

Now observe that

(14)
(15)

Applying this to with and , we get that the dominant term of is of the form , so

(16)

So the total number of all vertices in all non-plane 1-2 trees of size is

(17)

2.2. Leaves

Let denote exponential generating function for the total number of leaves in all non-plane 1-2 trees on vertex set . It is then easy to verify that .

Theorem 2.3.

The equality

(18)

holds.

Proof.

Let us apply Theorem 2.1 with , to get the linear differential equation

Indeed, , since the only tree whose root is balanced and of rank 0 is the one-vertex tree. Recalling that , and the initial condition , we can solve the last displayed linear differential equation to get what was to be proved. ∎

Note that is the unique nonremovable singularity of smallest modulus of , and that at that point, has pole of order two, since also has a zero at that point. Therefore, we cannot apply Proposition 2.2 directly. Instead, we use the following lemma.

Lemma 2.4.

Let be a function so that and are analytic functions, , while , and . Then

Proof.

The conditions directly imply that has a double root, and hence has a pole of order two, at . In order to find the coefficient that belongs to that pole, let . Now differentiate both sides with respect to , to get

Setting , we get

(19)

By our definitions, in a neighborhood of , the function behaves like

and our claim follows by (19). ∎

Let be the number of all leaves in all non-plane 1-2 trees on vertex set .

Theorem 2.5.

The equality

holds. In other words, for large , the probability that a vertex chosen uniformly at random from all vertices of all non-plane 1-2 trees is a leaf is .

Proof.

Note that has a unique singularity of smallest modulus, at , so the exponential growth rate of its coefficients is . Also note that at that point, the denominator of has a double root. Therefore, Lemma 2.4 applies, with and , yielding that the coefficient of the term in the Laurent series of about is

Now observe that

(20)

Applying this to the dominant term of with and , we get that

(21)

The proof of our claim is now immediate by comparing formulas (21) and (17). ∎

2.3. Balanced vertices of rank 1

Let be exponential generating function for the total number of balanced vertices of rank 1 in all non-plane 1-2 trees on vertex set . Note that such vertices have only leaves as neighbors.

Theorem 2.6.

The differential equation

(22)

holds, with the initial condition .

Proof.

Let us apply Theorem 2.1 with , to get the linear differential equation

Indeed, , since there are only two trees whose root is balanced and of rank 1; one of them has two vertices and the other one has three vertices. Recalling that , and the initial condition , we can solve the last displayed differential equation to prove our claim. ∎

Theorem 2.7.

The equality

holds. In other words, for large , the probability that a vertex chosen uniformly at random from all vertices of all non-plane 1-2 trees is balanced and of rank 1 is .

Proof.

Note that has a unique singularity of smallest modulus at , and that that singularity is a pole of order two. Therefore, we can apply Lemma 2.4 with and . At , this yields . Furthermore, , so . Therefore, the coefficient of the term in the Laurent series of about is

Applying (20) with and , we get that

(23)

We can now prove our claim by comparing formulae (17) and (23). ∎

2.4. Balanced vertices of higher rank

For any fixed , we can compute the probability that a vertex selected from all vertices of all non-plane 1-2 trees uniformly at random is balanced and of rank . For instance, for , we get that

where denotes the number of balanced vertices in all non-plane 1-2 trees on .

It is a direct consequence of (13), the well-known fact that (easy to prove by induction) that

for some polynomials and , and Lemma 2.4 that for all positive integers , there exists a polynomial with rational coefficients so that

3. Plane 1-2 trees

Plane 1-2 trees are similar to non-plane 1-2 trees, except that the children of each vertex are linearly ordered, left to right. The difference between plane 1-2 trees and decreasing binary trees is that in plane 1-2 trees, if a vertex has only one child, then that child has no ”direction”, that is, it is not a ”left child” or a ”right child”. See Figure 3 for an illustration.

Figure 3. The three rooted plane 1-2 trees on vertex set .

So plane 1-2 trees are ”in between” decreasing binary trees (where left or right matters for every child) and non-plane 1-2 trees (where left or right does not matter for any vertex). Indeed, in plane 1-2 trees, left or right matters, except for vertices that have no siblings.

Let be the exponential generating function for the number of plane 1-2 trees on vertex set . Then satisfies the differential equation

(24)

with initial condition . Indeed, removing the root of a plane 1-2 tree that has more than one vertex, that tree falls apart to the ordered set of two such trees, except when the root of has only one child.

See sequence A080635 in [7] for many other combinatorial problems whose solution involves the power series .

Solving (24), we get the explicit formula

Noting that , and that the summand at the end does not influence the growth rate of the coefficients of , we can proceed as in Section 2. That is, we can use Proposition 2.2 to compute that the number of plane 1-2 trees on vertex set satisfies

Therefore, the total number of all vertices of all such trees satisfies

(25)

Let be the exponential generating function for the number of all balanced vertices of rank in all plane 1-2 trees on , and let be the exponential generating function for the number of plane 1-2 trees on in which the root is balanced and of rank .

Theorem 3.1.

The differential equation

(26)

holds, with the initial condition .

Proof.

Let be an ordered pair in which is a plane 1-2 tree on vertex set and is a balanced vertex of rank of . Then is the exponential generating function counting such pairs. Let us first assume that is not the root of , and let us remove the root of . On the one hand, this leaves a structure that is counted by . On the other hand, this leaves an order pair consisting of a plane 1-2 tree with a a balanced vertex of rank marked, and a plane 1-2 tree. If the tree without the marked vertex was not empty, then by the Product formula of exponential generating functions, such ordered pairs are counted by the generating function , since the order of the two trees matters. If the tree without the marked vertex was empty, then there is only one way to ”order” the 1-element set of subtrees of the root, consisting of the subtree with the marked vertex. This results in the correction term .

Finally, was the root of , then the root of was balanced and of rank . Such trees are counted by , or, after the removal of their root, by . ∎

3.1. Leaves

Setting in (26), noting that , and , then solving the resulting differential equation we get the following result for the number of all leaves.

Corollary 3.2.

The equality

(27)

holds.

We can apply Lemma 2.4 to the numerator and the denominator of in (27) to compute the growth rate of the coefficients of that power series. Indeed, has a unique singularity of smallest modulus at , and that point the numerator of is nonzero, the denominator, and its first derivative are zero, while the second derivative of the denominator is not 0. Then a computation analogous to that immediately following the proof of Theorem 2.7 shows that if is the number of all leaves in all plane 1-2 trees on vertex set , then

(28)

Comparing (28) with (25), we obtain the following result.

Theorem 3.3.

The equality

holds.

In other words, the probability that a vertex selected uniformly at random from all plane 1-2 trees of size will be a leaf is about 0.391.

3.2. Vertices of higher rank

Note that remarkably, we can obtain an explicit formula for for every . This is in contrast to the case when we want to compute the generating function of all vertices of a given rank, where we run into non-elementary functions for .

Indeed, the standard form of (26) is

In order to solve this linear differential equation, first we multiply both sides by the integrating factor

which is an elementary function.

After that multiplication, we need to integrate both sides of the obtained equation to get . However, we are able to do so, since on the right hand side, we have , that is, a cosine function times a polynomial function, and it is well known that such products have elementary integrals.

4. Further directions

The method that we used in this paper may well be applicable to count balanced vertices in other tree varieties, as long as the number of children each vertex can have is bounded.

It seems intuitively very likely that in any tree variety, as goes to infinity, the probability that a vertex chosen uniformly at random from all vertices of all trees of size is balanced will be monotone decreasing. Still in the one case where we could prove this, the case of decreasing binary trees, our proof heavily depended on the simple bijection between these trees and permutations. New ideas are needed for other tree varieties.

We mentioned that plane 1-2 trees are ”in between” the other two tree varieties studied in this paper. So it is perhaps interesting that their vertices are the most likely to be leafs. Indeed, a random vertex of a decreasing binary tree has a one-third chance to be a leaf, while the same probability is about 36.3 percent for non-plane 1-2 trees, and 39.1 percent for plane 1-2 trees. Understanding this phenomenon could lead to new insights.

References

  • [1] M. Bóna -protected vertices in binary search trees. Adv. in Appl. Math. 53 (2014), 1–11.
  • [2] M. Bóna, B. Pittel, On a random search tree: asymptotic enumeration of vertices by distance from leaves. J. Appl. Prob. to appear. Preprint available at at arXiv:1412.2796.
  • [3] Bóna, M. (2016) Introduction to Enumerative and Analytic Combinatorics, CRC Press, 2016.
  • [4] J. Brown, R. V. Churchill, Complex Variables and Applications, 9th edition. McGraw-Hill, 2013.
  • [5] P. Flajolet and R. Sedgewick, Analytic Combinatorics, Cambridge University Press, Cambridge, UK, 2009.
  • [6] S. Janson and C. Holmgren, Limit laws for functions of fringe trees for binary search trees and recursive trees. Electronic J. Probability 20 (2015), no. 4, 1-51.
  • [7] Online Encyclopedia of Integer Sequences, online database, www.oeis.org.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
224181
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description