Protection numbers in simply generated trees and Pólya trees
We determine the limit of the expected value and the variance of the protection number of the root in simply generated trees, in Pólya trees, and in unlabelled non-plane binary trees, when the number of vertices tends to infinity. Moreover, we compute expectation and variance of the protection number of a randomly chosen vertex in all those tree classes. We obtain exact formulas as sum representations, where the obtained sums are rapidly converging and therefore allowing an efficient numerical computation of high accuracy. Most proofs are based on a singularity analysis of generating functions.
The protection number of a tree is the length of the shortest path from the root to a leaf. It is interchangeably called the protection number of a root. We define the protection number of a vertex in tree as the protection number of a maximal subtree of having as a root. We say that a vertex is -protected if does not exceed its protection number.
Previous research concerning protection numbers has been conducted in two closely related directions: (i) a number of -protected vertices in a tree of size , and (ii) the protection number of a root or a random vertex.
Cheon and Shapiro  were the first ones to investigate the number of -protected nodes in trees. They stated the results for unlabelled ordered trees and Motzkin trees. Later on Mansour  complemented their work by solving -ary tree case. Over the next several years these results were followed by a series of papers examining the number of -protected nodes (usually for small values of ) in various models of random trees. To mention just a few, Du and Prodinger  analysed the average number of -protected nodes in random digital search trees, Mahmoud and Ward  presented a central limit theorem as well as exact moments of all orders for the number of -protected nodes in binary search trees and three years later they found the number of -protected nodes in recursive trees (consult ). The family of binary search trees was investigated also by Bóna and Pittel  who showed that the number of its -protected nodes decays exponentially in .
In 2015 Holmgren and Janson  went for more general results. Using probabilistic methods, they derived a normal limit law for the number of -protected nodes in a binary search tree and a random recursive tree.
Soon after, two particular parameters attracted attention of the algorithmic community. These were (as already mentioned earlier) the protection number of the root and the protection number of a random vertex. In 2017 Copenhaver  found that in a random unlabelled plane tree the expected value of the protection number of the root and the expected value of the protection number of a random vertex approach and , respectively, as the size of the tree tends to infinity. These results were extended by Heuberger and Prodinger . They showed the exact formulas for the first terms of the expectation, the variance and the probability of the respective protection numbers.
The aim of this paper is to generalize the protection number results to a larger class of rooted trees. We study both the root protection number as well as a random vertex protection number for the family of simply generated trees (introduced by Meir and Moon ) and their non-plane counterparts: unlabelled non-plane rooted trees, also called Pólya trees due to their first extensive treatment by Pólya , examined further by Otter  including numerical results and the binary case. The present paper broadens the results from , but maintaining the emphasis on as concrete formulas as possible.
For simply generated trees a general theory of asymptotics of certain functional was developed recently in , but this theory does not cover local functionals as the number of protected nodes. Devroye and Janson  presented a unified approach to obtaining the number of -protected nodes in various classes or random trees by putting them in the general context of fringe subtrees introduced by Aldous in . We have obtained analogous results for simply generated trees, but employing a different methodology. This allows an efficient numerical treatment and may serve as a basis for random generation in the framework of Boltzmann sampling . Parts of our investigations fall into the general framework of additive functionals treated in , but our focus on concrete expressions allows an easy access to numerical evaluation of the considered parameters.
Plan of the paper
In Sections 2, 3, and 4 we consider simply generated trees, Pólya trees and non-plane binary trees, respectively. In each section the expected value and the variance of the protection number of the root and the protection number of a random vertex are computed. All these quantities tend to constants when the tree size tends to infinity. The emphasis is on deriving exact expressions for these constants in terms of characteristic parameters of the considered tree class. We obtain them in terms of sums that converge at an exponential rate and therefore enable us to compute efficiently accurate numerical values. We provide numerical values for the several well-known simply generated tree classes as well as for the two non-plane classes studied in Sections 3 and 4.
2. Simply generated trees
2.1. Protection number of the root
The class of simply generated trees was introduced in  and can be described as the class of plane rooted trees whose generating function satisfies a function equation of particular type: If denotes the sum of the weights of all trees with vertices, then the generating function satisfies
where the power series has only non-negative coefficients, , and there is a such that . Moreover, it is required that the equation has a unique positive solution.
We are interested in the asymptotic protection number of a random simply generated tree, sampled according to the weights from all simply generated trees with vertices, where tends to infinity.
For the sake of simplicity we assume that is non-periodic, meaning that there are integers such that are all positive and satisfy . The periodic case can be dealt with in the very same way, but the calculations leading to the desired number have to be done repeatedly (for analogous situations) in order collect several contributions to the final value.
Within this paper the primary tool that is used will be singularity analysis (see [15, 16]), which provides a direct connection between the singularities of a generating function and the asymptotic behaviour of its coefficients. By Pringsheim’s theorem [16, p. 240] we know that a generating function must have a singularity at , if denotes the radius of convergence. Our assumption that is non-periodic guarantees furthermore that this is the only singularity on the circle of convergence. Throughout this paper we will call the dominant singularity of the generating function. In particular, we denote the dominant singularity of by . Furthermore, we say that a function has an algebraic singularity of type at , if there is a constant such that as tends to in such a way that . In this case admits a Puiseux expansion in terms of powers for some positive integer . For instance, it is well known that the generating function associated to some class of simply generated trees has an algebraic singularity of type 1/2 (for obvious reasons also called square root singularity) the location of which is determined by the system , , cf. . For further information on this theory we refer the reader to  and .
Let denote the generating function of the class of simply generated trees that have protection number at least , where marks the total number of nodes. Furthermore, let be non-periodic. Then, can be defined by
Note that .
All generating functions have the same dominant singularity as , and it is a square root singularity.
First let us consider that the generating function reads as
where and denotes the -fold composition. Since is analytic at , inserting a function admitting a Puiseux expansion results in
again being a Puiseux expansion at . It is well known that admits a Puiseux expansion with nonzero numbers and . Moreover, we always insert one of the functions , thus attains the positive values , , implying that is always positive, as is a power series with only non-negative coefficients. By induction it is guaranteed that is always negative and thus all the function have a unique dominant singularity of square root type at . ∎
In order to derive the expected value of the protection number of a random simply generated tree of size (i.e. with nodes) asymptotically, we use the well known formula
Thus, we need to calculate the probability , which is given by
Let be the protection number of a random simply generated tree of size . Then the expected value and the variance satisfy
with denoting the dominant singularity of the generating function of the class of simply generated trees.
We know that the asymptotic behaviour of the generating function, namely , implies
as tends to infinity. In order to derive the asymptotics of the -th coefficient of , observe that we know from Lemma 1 that all generating functions have the same dominant singularity of type . Setting , the Puiseux expansions of and read as
Plugging these expansions into (1) and using we get
Expanding and comparing coefficients of and yields
Obviously, the ’s match exactly the , , as they are the constant terms in the Puiseux expansions of the functions , with . Thus, the equation for can be rewritten as .
As , we get
Applying a transfer lemma  directly gives the asymptotics of the coefficients of and plugging them in conjunction with (3) into Equation (2) yields the asymptotic value for the mean. In order to derive the formula for the asymptotic variance we use the equation
and immediately get the asserted result. ∎
It is easy to see that the sequence is monotonically decreasing, since the number of trees with protection number at least is always greater than the number of trees that have an -protected root, i.e. protection number at least . Since is monotonically increasing on the positive real axis, this implies that Thus, we can estimate the sum for the expected value by
which converges, since . As the last sum is a convergent geometric series and the inequality even holds term-wise, we can calculate efficiently the asymptotic mean and variance for all classes of simply generated trees with arbitrary accuracy. We will now exemplify this by calculating the limits of mean and variance of the protection number of some prominent classes of simply generated trees.
Example (Plane trees).
The generating function of plane trees is the unique power series solution of
Thus, its dominant singularity is , and .
The recursion for the ’s reads as
In case of plane trees the recursion can be solved explicitly, leading to
The limits of expected value and variance are therefore given by
which has already been calculated by Heuberger and Prodinger in .
Example (Motzkin trees).
The generating function of Motzkin trees is defined by
which can be solved to result in
Thus, its dominant singularity is and .
The recursion for the ’s reads as
This recursion can be transformed into another one for the numerators of the rational numbers : Indeed, if we write , then and , for . The recurrence for the ’s does not fall into the scheme of Aho and Sloane  and we are not aware of any method to solve it explicitly. But as stated before, the sequence is exponentially decreasing and estimates are easily obtained. Thus we can calculate the limits of mean and variance for the protection number numerically with arbitrary accuracy:
Example (Incomplete binary trees).
The generating function of incomplete binary trees is defined by
The dominant singularity is therefore at and .
The recursion for the ’s reads as
This recursion cannot be solved explicitly, but the numerical values can be easily computed: They are
Example (Cayley trees).
Though, in a strict sense, Cayley trees do not belong to the class of simply generated trees (cf. the discussions in  and ), they are usually listed as an example for that class. In fact, they are closely related (see  for a thorough analysis and  for an analysis of the differences) and in many contexts (like the one considered here), quotients of coefficients are computed which makes the fact that in this case the generating functions are exponential ones irrelevant.
The (exponential) generating function of Cayley trees is defined by
which has its dominant singularity at . Moreover, we have .
The recursion for the ’s reads as
As in the two previous examples the recursion for the ’s cannot be solved explicitly, but the numerical values are
Example (Binary trees).
This is the class of complete binary trees with only internal vertices contributing to the size. The generating function is then defined by the functional equation with where is the function displayed in (4). Though this class does not strictly fall into the simply generated framework, the functional equation is of the form , which reflects the fact that incomplete binary trees with all nodes counted are in bijection to complete binary trees with only internal vertices counted. For the protection number this causes some shifts within the tree. But the methodology presented above works here as well. We get and Since we have , for all , and then finally , as tends to infinity. Thus we obtain
2.2. Protection number of a random vertex
In the first part of this section we studied the average protection number of a simply generated tree, that is the protection number of the root of the simply generated tree. Now we are interested in the average protection number of a randomly chosen vertex in a simply generated tree of size . We denote this sequence of random variables by .
As in the previous section we calculate the mean via In order to do so we proceed analogously to Heuberger and Prodinger in  and define to be the generating function of the sequence of -protected vertices summed over all trees of size . As in  this generating function can be calculated by
by means of the bivariate generating function of simply generated trees, where marks the size and the number of leaves, and the generating function of simply generated trees with protection number at least . The formula for arises from considering a -protected vertex in the following way: First point at a leaf in a simply generated tree (which yields the factor ), then remove this leaf (which explains the ) and finally attach a tree with protection number at least (giving the factor ).
The procedure works also for complete binary trees, where only internal vertices contribute to the tree size. The only difference is that for complete binary trees the factor in (5) must be removed, because removing a leaf does not change the size.
Using the generating function we can express the probability by
Let be the protection number of a randomly chosen vertex in a random simply generated tree of size . Then,
First we need to determine the -th coefficient of . We have
Using and we get
Therefore (7) transforms to
Thus, altogether we have
Finally, we get
For the variance we use again the formula and (6). ∎
|Incomplete binary trees||1.991819588602741||3.638259051495130|
|Complete binary trees||1.265686036087572||0.226591112528581|
3. Pólya trees
3.1. Protection number of the root
Let be the generating function of Pólya trees, which reads as
and in correspondence to the previous section let us denote by the generating function of the class of Pólya trees that have protection number at least . This generating function can be specified by
with . From the classical results of Pólya  we know that has a unique dominant singularity of type 1/2 and admits Puiseux series expansion there, which starts as
All the generating functions have their (unique) dominant singularity at , and the singularity is a square root singularity.
First let us recall that . Thus, for the lemma is trivial. For we proceed by induction. Therefore let us assume that has the dominant singularity which is of type . Then the dominant singularity of , satifying the recurrence relation (8), comes from , since is analytic in with sufficiently small. Applying the exponential function to a function having an algebraic singularity does neither change the location nor the type of the singularity, which proves the assertion after all. ∎
The goal of this section is to derive an asymptotic value for the average protection number of Pólya trees. We use again the formula , but rewrite this equation as
where the conditional probabilities can be obtained by
The asymptotic expansions of the -th coefficients of and read as
as , with a constant .
Let the Puiseux expansion of be given by .
Then behaves asymptotically as where . Applying the asymptotic relation and using the equation completes the proof. ∎
which directly yields the following theorem.
Let be the protection number of a random Pólya tree of size . Then
The proof for the asymptotic mean follows directly by Lemma 3. In order to determine the variance we use the representation . ∎
Note that in order to get accurate numerical values, we must not compute by insertion into a (truncated) series expansion for The reason is that lies on the circle of convergence and thus the convergence is very slow at . Instead, can be directly computed using the recurrence relation (8). The values for , which appear in that recurrence relation, can be computed with the help of the series expansion of , because then lies in the interior of region of convergence where the series converges at an exponential rate.
We could also have used the same approach as for simply generated trees in order to get the asymptotic mean. Then the resulting formula looks like
where . One can show that tends to 1 and and tends to 0 exponentially fast and get the constant given in Theorem 3. However, since this approach requires more technical calculations, we decided to switch to the more direct strategy using the conditional probabilities. Moreover note that the equivalence of (11) and (12) is immediate from (8).
3.2. Protection number of a random vertex
The method of marking a leaf and replacing it by a tree with protection number does not work here. Due to possible symmetries in non-plane trees, this would result in wrong counting: Indeed, if there are -protected vertices which can be mapped to each other by some automorphisms of the tree (i.e., they lie in the same vertex class), then only one of them is counted. Though this is counterbalanced by trees having leaves in the same vertex class one of which is replaced by a tree with protection number (the root of this tree is then counted times), there are further overcounts: As all leaves are marked, trees having several leaves in the same vertex class are counted several times, and so are their -protected vertices.
Thus we appeal to the proof of [28, Theorem 3.1] here: For a tree let
Moreover, we define to be the number of -protected nodes in . Then the generating function satisfies (cf. [28, Equ. (3.1)])
As in Section 2.2 we utilize the formula and express the occurring probabilities as with being the generating function whose th coefficient is the cumulative number of -protected nodes in all trees of size . Obviously, and thus by differentiating (13) with respect to and inserting we obtain
and from (15) we get
Since decreases exponentially (cf. remark after Theorem 3), and so does , these probabilities decrease exponentially and thus the series for , namely
For numerical purposes, however, it is not necessary to have an explicit expression for . If we write with being the operator on the ring of formal power series defined by
then is a contraction on the metric space equipped with the formal topology (cf. [16, Appendix A.5]). Indeed, if and coincide up to their th coefficient, then the first coefficients of and coincide.
As there is exactly one tree with vertices which possesses -protected vertices at all (namely the path of length has a -protected root) whereas all smaller trees do not possess any -protected vertices, we know that the (one-term) series coincides with in its first coefficients. Applying to a few times, with each application more than doubling the number of known coefficients of , gives quickly a fairly accurate expression for . We obtain the following theorem:
Let be the protection number of a random vertex in a random Pólya tree of size . Then
4. Non-plane binary trees
4.1. Protection number of the root
We denote by the generating function of non-plane binary trees, where marks the number of internal nodes. Then satisfies
The generating function of non-plane binary trees with protection number at least fulfills
In order to obtain the asymptotic mean and variance for the protection number of a random non-plane binary tree of size we proceed analogously as in the previous section for Pólya trees. Thus, we use
Let be the protection number of a random non-plane binary tree of size . Then
Let the Puiseux expansion of and read as
Using singularity analysis yields the desired result for the mean. For the variance we use again the formula . ∎
4.2. Protection number of a random internal vertex
The asymptotic mean and variance for the protection number of a randomly chosen internal vertex in a random non-plane binary tree can be obtained in the same way as in the previous section for Pólya trees.
Thus, we again set up an equation for the generating function where the coefficients count the number of non-plane binary trees of size with -protected vertices:
Differentiating this equation with respect to and setting yields
Therefore we get
The asymptotic expansion of is given by
By denoting we can use the same arguments as in the Pólya case to efficiently obtain numerical values for the probabilities . Finally, we are able to calculate the asymptotic mean and variance for the protection number of a random node in non-plane binary trees.
Let be the protection number of a random internal vertex in a random non-plane binary tree of size . Then