Asymptotic Enumeration of Compacted Binary TreesThis research was supported by the Austrian Science Fund (FWF) grant SFB F50-03.

Asymptotic Enumeration of Compacted Binary Treesthanks: This research was supported by the Austrian Science Fund (FWF) grant SFB F50-03.

Antoine Genitrini Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606, 4 place Jussieu 75005 Paris. Antoine.Genitrini@lip6.fr    Bernhard Gittenberger Technische Universität Wien, Wiedner Hauptstraße 8-10/104, 1040 Wien, Austria. {Bernhard.Gittenberger,Michael.Wallner}@tuwien.ac.at    Manuel Kauers Institute for Algebra, Johannes Kepler University, Altenberger Strasse 69, 4040 Linz, Austria. Manuel@Kauers.de    Michael Wallner
Abstract

A compacted tree is a graph created from a binary tree such that repeatedly occurring subtrees in the original tree are represented by pointers to existing ones, and hence every subtree is unique. Such representations form a special class of directed acyclic graphs. We are interested in the asymptotic number of compacted trees of given size, where the size of a compacted tree is given by the number of its internal nodes. Due to its superexponential growth this problem poses many difficulties. Therefore we restrict our investigations to compacted trees of bounded right height, which is the maximal number of edges going to the right on any path from the root to a leaf.

We solve the asymptotic counting problem for this class as well as a closely related, further simplified class.

For this purpose, we develop a calculus on exponential generating functions for compacted trees of bounded right height and for relaxed trees of bounded right height, which differ from compacted trees by dropping the above described uniqueness condition. This enables us to derive a recursively defined sequence of differential equations for the exponential generating functions. The coefficients can then be determined by performing a singularity analysis of the solutions of these differential equations.

Our main results are the computation of the asymptotic numbers of relaxed as well as compacted trees of bounded right height and given size, when the size tends to infinity.

Keywords: Compacted trees, Enumeration, D-finiteness, Analytic Combinatorics, Directed Acyclic Graphs, Chebyshev Polynomials.

1 Introduction

Most trees contain redundant information in form of repeated occurrences of the same subtree111In the rest of the paper, a subtree of a given tree contains a root and all descendants of the original tree. Such a substructure is sometimes called a fringe subtree.. In order to get an efficient representation in memory, these trees can be compacted by representing each occurrence only once. The removed subtrees are replaced by pointers which link to the shared subtree. Such structures are classically named as directed acyclic graphs or short as DAGs.

Flajolet et al., in their extended abstract [15], analyzed in detail the gain in memory of the compaction. Some proofs have been omitted and have not been stated later. This gap was closed in [10], where the framework was extended to other DAG structures and analyzed in the context of XML compression. Furthermore, Ralaivaosaona and Wagner extended in [26] the analysis of the gained memory to simply generated families of trees.

The latter two papers on the quantitative analysis of the compaction process, studied the transformation of a given set of trees of given size to the set of compacted trees in order to determine the average rate of compaction. We focus on a different aspect, namely the enumeration problem of compacted binary trees. On the one hand, enumerating combinatorial structures is important if one wants to understand shape characteristics of large random structures or for uniform random generation of those structures. On the other hand, the enumeration of particular classes of DAGs is in general a difficult problem which requires the extension of combinatorial methodology and is therefore interesting in its own right.

One of the difficulties in the enumeration of compacted binary trees lies in the fact that a compacted binary tree of size could arise from a binary tree whose size belongs to the whole interval . Thus, a brute-force approach is hopeless.

The first papers about the enumeration of DAGs appeared in the 70’s. Robinson presented two distinct approaches [27, 29] based either on some inclusion-exclusion method or on Pólya’s enumeration theory [25]. Combining combinatorial arguments and analytic methods the asymptotic number of labeled DAGs was determined in [4], for connected structures then in [5]. The first investigation of shape parameters seems to go back to McKay [22]. Recently, enumeration results for many particular classes of DAGs can be found in the literature, see for instance [7, 8, 9, 16, 20, 21, 32, 33, 34], as well as investigations on the (random) generation of particular DAGs, see [3, 11, 23, 24].

A now classical way for enumeration is the use of generating functions. In this context, precisely for labeled structures (see paper [28]), Robinson designed generating functions of a very particular nature to solve an asymptotic counting problem concerning DAGs. The classical types of generating functions like ordinary and exponential ones were not suited to the problem.

We are facing the same problem in the enumeration of compacted trees. Indeed, due to the fact that compacted trees are unlabeled combinatorial structures, which are moreover closely related to plane trees, a treatment with ordinary generating functions will be the first choice. However, the fast growth of the counting sequence requires the use of exponential generating functions. In order to be able to get asymptotic results, we will confine ourselves to certain subclasses of the class of compacted trees as well as some related classes by relaxing certain conditions. Moreover, we will develop a calculus for exponential generating functions designed for these classes. Bounding the right height of our DAGs leads to a sequence of D-finite functions (see [19, 31] for introductions to the subject) for which it is possible to analyze their differential equations and obtain finally our main result. Likewise, in other enumeration problems for particular classes of DAGs bounding a certain parameter turned intractable recurrences into D-finite ones. Examples are the enumeration of certain classes of lambda-terms [7, 9, 8] or increasing series-parallel DAGs [6].

Plan of the paper

Our combinatorial structures are based on the fundamental properties of the compaction procedure. We will first analyze some properties of this classical procedure (linked to the common subexpression problem) in Section 2.

Then we will define the basic concepts and state our main results in Section 3, see Theorems 3.3 and 3.4.

Some basic observations concerning the structure of compacted trees will then be presented in Section 4.

These will help us to state a combinatorial and (most importantly) recursive specification of the problem in Section 5. A further important result is the derivation of a recurrence relation for the number of compacted binary trees, see Theorem 5.1. This recurrence is not classical at all, and we are not able to solve it explicitly.

Due to this fact, we follow yet a different approach in the remaining part of this work: We will use exponential generating functions to model our problem, as the superexponential growth rate of the counting sequence suggests, though we are dealing with unlabeled combinatorial structures. Therefore, a new calculus translating certain set operations for classes of compacted trees into algebraic operations of exponential generating functions will be developed in Section 6.

Section 7 is devoted to a simplified problem, the study of the counting problem of relaxed binary trees. These DAGs are in a sense compacted trees where the restriction of uniqueness on the subtrees is relaxed. In particular, compacted trees are a subset of relaxed binary trees. With the same methods as used on compacted trees we are able to derive a recurrence relation. However, this recurrence relation is as difficult as the first one for compacted trees.

A natural constraint for compacted trees seems to bound some specific depth limit, the so-called right height. This is the maximal number of edges directed to the right which appear on any path from the root to a leaf. In Section 7, the calculus developed in Section 6 enables us to derive a differential equation for the generating function of relaxed trees for each bound on the right height. This sequence of D-finite differential equations follows a rather explicit recursive scheme, presented in Theorem 7.11 which allows us to analyze the dominant singularities of the solutions of the differential equation for any . Eventually, this strategy is successful and we are able to determine the asymptotic number of relaxed binary trees of bounded right height.

Finally, in Section 8 we modify the results of the previous section to cover the case of compacted trees as well. Again, we derive a sequence of D-finite differential equations where, as in Section 7, the dominant singularities of the generating function are regular singularities of the differential equation. This allows us to extract the asymptotic behavior of the counting sequence, which contains irrational powers of . The necessary information is directly extracted from the differential equations. Except for a few exceptions, they do not have closed form solutions.

2 Creating a compacted tree

Many problems in computer science and computer algebra involve redundant information. A strategy to save memory is to store every instance only once and to point to already existing instances, whenever an instance appears repeatedly. In [15, Proposition 1] a compression algorithm was presented, and it was shown that for a given tree of size , its compacted form can be computed in expected time . However, such procedures have been known since the ’s (see [15, 13] and especially the “value-number method” in compiling [2, Section 6.1.2]). Figure 1 shows this procedure, which follows a top-down decomposition scheme (i.e. post-order traversal) of labeled binary trees. Every node (or actually the subtree whose root is the respective node) is associated with a “unique identifier” (uid). Two subtrees are equivalent if and only if the uid’s are the same.

 
  function UID(T : tree) : integer;
 

global counter : integer, Table : list;
begin
   if  T  =  nil
      then  return(O);
      else
         triple  :=  <root(T),UID(left(T)),UID(right(T))>;
         if  Found(triple,Table)
            then  return(value_found);
            else  counter  :=  counter+l;
               Insert  pair  (triple,counter)  in  Table;
               return(counter);
         fi
   fi
end

 
 

Figure 1: The UID procedure from [15, Fig. 2] which computes “unique identifiers” for all (fringe) subtrees of a given binary tree . It is assumed that counter is initially set to . Table is a global list that maintains associations between triples and already computed UID’s; it is also initially empty. The function root(T) extracts the label of the root of tree .

We now give an example of the behavior of the procedure for an arithmetic expression.

Example 2.1: Consider the labeled tree necessary to store the arithmetic expression (* (- (* x x) (* y y)) (+ (* x x) (* y y))) which represents . The “Table”, built by the UID procedure, contains

and the tree in its full and compacted version is shown in Figure 2.

Figure 2: Tree and compacted tree associated with (* (- (* x x) (* y y)) (+ (* x x) (* y y))) computed by the UID procedure from Figure 1.

Motivated by this procedure, based on a post-order traversal of the tree, we define an ad hoc DAG-structure, which we call a compacted binary tree, that encodes the result of the compaction of the tree. The trees under consideration are full binary in the sense that their nodes have either 0 or 2 children. Furthermore, in the definition we refer to subtrees: a fringe subtree or short subtree is the tree which corresponds to a node and all its descendants. In this paper we only consider such subtrees.

Definition 2.2.

A compacted binary tree is a DAG computed by the UID procedure from a given full binary tree. Every edge leading to a subtree that has already been seen during the traversal is replaced by a new kind of edge, a pointer, to the already existing subtree. The size of the compacted binary tree is defined by the number of its internal nodes.

In the sequel we will only consider full binary trees and their compacted forms. Thus, the term compacted trees means compacted binary trees. In Figure 3, we represent all compacted trees of size and .

Figure 3: All compacted trees of size . The labels in the nodes are the uids of the corresponding subtrees. But note: The labels are not belonging to the combinatorial objects. Compacted trees are unlableled graphs.

The subclass of DAGs we are interested in is strongly influenced by properties of trees. In particular, compacted trees are connected and plane. The out-degree222For the terms out- and in-degree, source, sink, and so on, we interpret an undirected edge as directed away from the root, in accordance with a node-child relation. of each vertex is equal to , except for the unique sink (leaf) for which it is . Furthermore, there is a unique source, which is the root.

The latter properties are induced by the full binary tree structure. Next, we treat the specific properties of the UID procedure. The result of the algorithm strongly depends on the chosen traversal. In this case the post-order traversal is used – but one could also consider a different one. There are two important observations. First of all, it has an important consequence on the pointers:

Proposition 2.3.

In a compacted tree the pointers only point to previously discovered trees.

In other words, the ordering imposed by the traversal restricts the possible choices of the pointers.

Definition 2.4.

For any compacted tree of size , the spine is the structure (with nodes) obtained from the compacted tree by deleting all pointers and the leaf.

In the Figure 4, from left to right, we see a compacted tree (without details on the pointers) and its spine. Furthermore, every distinct subtree is stored only once. In terms of the corresponding compacted trees this translates into uniqueness of every subtree.

Figure 4: A compacted tree and its spine.

3 Main results

Before being able to state our main results we have to define further combinatorial classes. Indeed, the uniqueness condition for compacted trees caused some difficulties in their enumeration. So, we will first analyze a simpler class where we drop this condition.

Definition 3.1.

A relaxed compacted binary tree (short relaxed binary tree, or just relaxed tree), of size is a directed acyclic graph consisting of a binary tree with internal nodes, one leaf, and pointers. It is constructed from a binary tree of size , where the first leaf in a post-order traversal is kept and all other leaves are replaced by pointers. These links may point to any node that has already been visited by the post-order traversal.

Obviously, the notion of spine adapts to the class of relaxed trees.

In fact, let us give another way to interpret compacted trees: compacted trees are relaxed trees with the restriction that all subtrees in the spine are unique. Note that this condition does not hold for all relaxed trees. In particular compare Figure 5 for the smallest relaxed tree which is not a compacted tree.

Figure 5: Left: the smallest relaxed tree that is not a compacted tree; right: the corresponding unique compacted tree.

The asymptotic enumeration of relaxed trees is still too complicated. We will derive recurrence relations for their counting sequence as well as for the counting sequence of compacted trees. In order to obtain asymptotic results, we restrict the right height.

Definition 3.2.

For any relaxed tree, we define its right height to be the maximal number of right edges on any path from the root to another node in the spine (of the relaxed or compacted tree under consideration). The level of a node is the number of right edges on the path from the root to this node.

Figure 6 introduces an example and a natural way of representing a relaxed tree in order to emphasize these notions. It proves convenient to rotate the trees by degrees.

Figure 6: A compacted tree with right height . Nodes of level  are colored in red, nodes of level  in blue, and the node of level  in green.

Bounding the right height defines a sequence of classes which follows a recursive construction principle. We will eventually exploit this structure and obtain our main results, the asymptotic number of relaxed trees with internal vertices and the analogous result for compacted trees.

Theorem 3.3 (Asymptotics of relaxed trees with bounded right height).

The number of relaxed trees with right height at most is for asymptotically equivalent to

where is independent of .

Theorem 3.4 (Asymptotics of compacted trees with bounded right height).

The number of compacted trees with right height at most is for asymptotically equivalent to

where is independent of .

Therefore, we can also answer the question (at least asymptotically) of how many relaxed trees are actually compacted trees. Combining Theorems 3.3 and 3.4 we get the following result.

Corollary 3.5 (Proportion of compacted among relaxed trees).

Let () be the number of compacted (relaxed) binary trees with right height at most . Then, for we have

Thus, the number of compacted trees among relaxed trees for large is negligible. This result quantifies the restriction of uniqueness of subtrees in compacted trees.

4 On the structure of compacted trees

In this section we will discuss some basic observations concerning the structure of compacted trees. First note that pointers may point to vertices lying outside the subtree of the pointer’s start node (compare with Figure 2 and 3). Such subtrees of compacted trees cannot be compacted trees themselves. For this reason, we define the concept of c-subtrees.

Definition 4.1.

A c-subtree is a subtree of a compacted tree. A cherry is a c-subtree where both children of the root are pointers.

A cherry is, in a sense, the “minimal” construction to create a new subtree. It consists of a node and two pointers, which point to already existing c-subtrees. An example is given in Figure 3: In the rightmost tree, the -subtree with the root node labeled by is a cherry. Such a cherry is also not a compacted tree in the sense of Definition 2.2 as the root node has two pointers which point to an external structure. The only compacted tree of size is also given in the same Figure.

With this terminology we are able to analyze some aspects of the DAG-structure of compacted trees. First, we look at the spine.

Lemma 4.2.

The spine of a compacted tree of size is a binary tree of size .

Proof.

Obviously, by deleting the leaf and the pointers we get a rooted, acyclic graph. It remains to show that this graph is connected. Assume that there exists a pointer which is the only connection between two parts of the compacted tree. By the UID procedure a pointer corresponds to a multiple occurrence of a subtree. Therefore we get a contradiction, as this subtree must already exist in the tree and is, therefore, connected with the root via internal edges. ∎

Let us remark that the tree structure of a spine is binary in the sense that its nodes are either of out-degree , (with two possibilities, either with a left child or with a right child), or .

Proposition 4.3.

From any binary tree of size , we can build a compacted tree of size , with the following operations:

  1. Add a leaf as left child of the leftmost node of the binary tree.

  2. Add pointers to every node such that every node except the leaf has out-degree .

  3. Let the pointers point to internal nodes which are in post-order traversal before the root node (under consideration) such that the corresponding subtree is unique (not already existing).

Every compacted tree of size can be constructed this way.

Proof.

A simple way to build a compacted tree by using the spine is the following one. Add the leaf to the leftmost node of the binary tree. Then traverse the binary tree by using the post-order traversal, and each time one meets a node with out-degree less than we add or pointers to the last node one visited (this ensures uniqueness of the new subtree).

The last statement is obvious, since every compacted tree can be reconstructed from its spine using only the operations listed above. By Lemma 4.2 the spine has the same size as the compacted tree.

Note that in the previous proposition the compacted tree does not need to be the result of the compaction of the initial binary tree under consideration (even if it is a full binary tree). Furthermore, in many cases we can construct several compacted trees by enriching the same binary tree. So the function mapping compacted trees to its spine is not one-to-one.

The last result also tells us that cherries are the fundamental structures that guarantee the uniqueness of c-subtrees. Indeed, if a cherry violates the condition implicit in the third operation listed in Proposition 4.3, the structure is not a compacted tree according to our definition, but only a relaxed tree.

A different explanation why cherries are the crucial objects for uniqueness comes from the property that the compaction procedure generates an increasing set of elements, i.e. already seen subtrees. Here we mean that the next element is constructed by a new internal node and previous, already built, elements. In particular, the first element is always a leaf, the second one is always an internal node with two leaves as children (a “classical cherry”). Then, as a third element one has an element with a new internal node and a cherry as its left child, or as its right child, or on both sides. How will further elements be built such that the uniqueness property is maintained? Let us focus on the bad ways to do so, i.e. we ask: What is forbidden? There are two cases according to the type of the current node (in the post-order traversal of the tree):

  • The current node is a cherry: The only forbidden way to place the two pointers is by choosing an already generated subtree and letting the two pointers of the cherry point to the children of the subtree. Note that the children of an already generated subtree must have been generated before. Thus, for any already generated subtree there is one forbidden configuration for the placement of the pointers.

  • The current node is not a cherry: In this case at least one edge is not a pointer. But then it can easily be seen by induction that the subtree of the corresponding child is unique. Hence, there is no restriction on placing the pointer since the current node will always generate an new subtree.

This idea will be picked up in the next section and used to derive a recurrence relation for the number of compacted trees of size . Besides, it shows that we have to be careful only when dealing with nodes having two pointers (see Section 8).

5 Counting compacted structures by recurrence

Using the properties stated in the last section for compacted trees, we are now able to exhibit a combinatorial recurrence based on a decomposition of the structures under consideration.

5.1 A recurrence relation for compacted trees

Let be the number of compacted binary trees of size . Recall that Figure 3 showed all compacted trees of size and . It is easily checked that the first few terms of the sequence are given by

Note that this sequence is not found in Sloane’s Online Encyclopedia of Integer Sequences. As a first step we derive a recursion representing this sequence.

Suppose that we perform a post-order traversal on a tree and that already c-subtrees have been discovered. Then the current node is the root of another c-subtree. Let denote the class of all c-subtrees of size that may show up as such a c-subtree. Then we may think of the already compacted subtrees as an external pool of trees where our pointers can point to additionally when continuing our traversal. For an illustration see Figure 7. Note that the leaf is always part of this pool but not counted, and all subtrees in the pool must be constructed out of elements from the pool. In this sense the pool is closed in itself, and its evolution in the compaction procedure is an increasing sequence of sets.

Figure 7: The two cases of the pool construction of Theorem 5.1. The pool (circled elements) represents the already visited c-subtrees the pointers may point to. In the second case it may also point to the c-subtrees of the left sibling.

We define the size of the pool to be the number of distinct subtrees with at least one internal node. Thus, the pool for the trees in has size and consists of distinct c-subtrees. This artificially looking convention will simplify the later analysis.

Theorem 5.1.

Let , and as above. Moreover, we denote the cardinality of by . Then

(1)
(2)
(3)
Proof.

An element of consists of internal nodes connected by internal edges. The remaining edges of the compacted binary tree are pointers (the possible edge to the leaf may be interpreted as a pointer). These must be chosen in such a way that no subtree is generated twice. Additionally, they may point either to a c-subtree of the pool or to a c-subtree of its left sibling, see Figure 7. The second condition is due to the post-order traversal of the tree by the UID procedure.

Now we can give a recursive decomposition of such trees. Let be a c-subtree with nodes and a pool of size . The root of has a left and a right subtree attached to and , (for ) internal nodes, respectively. Note that every internal node also represents a c-subtree. For the left child the pool remains the same as for its parent. However, for the right child the pointers may additionally point to c-subtrees of its left sibling. Hence, the pool is increased by the size of its left sibling. These considerations directly give Equation (1).

Next, let us consider the initial conditions (2) and (3). The c-subtrees with no internal nodes can be interpreted as pointers. These may point to any element of the pool, hence .

The c-subtrees with internal node are cherries, whose both children are not internal nodes. Hence, they consist either of two pointers or of a leaf and a pointer. As the pool always contains a leaf, it is sufficient to consider the first case. Then these two pointers have possibilities each to point at. Among these cases are which must be excluded as they are the ones already found in the pool. Note that these can be recreated by letting the pointers point to the same children as the ones found in the pool. Hence, we get

Corollary 5.2.

The number of compacted trees of size is equal to .

Obviously, by Theorem 5.1 the numbers depend on the numbers for all and all . Thus their computation is quadratic in time and memory.

Lemma 5.3.

The number of compacted binary trees of size satisfies the following bounds:

Proof.

Let us first consider the lower bound: Consider the subclass of chains. These are trees where the left child is always an internal edge and the right child is a pointer, see Figure 8. Let be the number of chains with internal nodes. The leaf is the only such object of size . Hence, we have . A chain of size can be constructed from a chain of size by appending a new root node with a pointer. The pointer has possible locations to point to. This implies, . We get the lower bound .

Figure 8: The number of compacted trees of size of right height at most is equal to .

Let us now focus briefly on the upper bound: Consider all possible spines. There are (Catalan numbers) such structures, as they are binary trees. Next, note that a binary tree of size has leaves. In our case these are pointers. By Proposition 2.3 pointers can only point to previously discovered trees. Hence, every pointer has at most possibilities to point at. This proves the upper bound. ∎

The last result implies that the asymptotic growth of compacted trees satisfies but it is also bounded from below by . Thus, an ordinary generating function for would have radius of convergence equal to zero. Hence, we will need to use exponential generating functions in order to ensure a non-zero radius of convergence. This idea will be used in the next sections. But first, let us state a simplified problem, which also proves very difficult to solve, but is not as technical.

5.2 A recurrence relation for relaxed compacted trees

Let be the number of relaxed trees of size . The first few terms of the sequence are given by

This sequence is given by the sequence A in the OEIS. The latter counts the number of deterministic completely defined initially connected acyclic automata with inputs and transient unlabeled states and a unique absorbing state, see [20]. The bijection of these structures to our (enriched) trees is obvious, by traversing relaxed trees from the root to the leaf. Remark that the asymptotic behavior of the number of such structures seems not to be known.

Let be the number of relaxed c-subtrees of size and a pool of size . We directly get a recurrence relation for these numbers, that is directly linked to the one for :

Corollary 5.4.

Let , then

(4)
(5)

The number of relaxed trees of size is equal to .

Proof.

This is a direct consequence of Theorem 5.1 and the fact that we dropped the uniqueness restriction enforced by (3). ∎

Note that the nature of the recurrence relation did not change compared to the one of the compacted case. Unfortunately, we were not able to find an explicit solution, or to continue from here. A more promising approach is the one of generating functions introduced in the next section.

6 Operations on trees

We have seen in the previous sections that the numbers and are growing like , (compare Lemma 5.3, which also holds in the relaxed case). Therefore we introduce exponential generating functions in order to get a non-zero radius of convergence. But then there arises a problem in the construction: exponential generating functions are designed for labeled objects, but we are dealing with unlabeled ones. Thus, we first investigate how the nature of exponential generating functions reflects the construction of such enriched trees.

The use of non-standard generating functions in the enumeration of DAGs is not new. Robinson [28] introduced the so-called “special generating function”

to derive nice expressions of such generating functions for labeled DAGs. This ad hoc generating function seems not applicable in our context, but exponential generating functions are.

For this purpose, we restrict ourselves to a subclass: relaxed trees of bounded right height, and we are going to derive their exponential generating functions. In this context we introduce the following notations: Let be a combinatorial class. Its exponential generating function is given by where denotes the number of elements in of size .

Lemma 6.1.

(Adding a new root) Let be a combinatorial subclass of relaxed trees, and let be the combinatorial class whose elements consist of a new root node, with an element of as its left child, and with a pointer as its right child. Then,

Proof.

Consider a relaxed tree of of size . Adding a new root node with the considered tree as its left child creates a tree of size . The new pointer has possibilities, in particular it may point to one of the internal nodes or the leaf. On the level of generating functions this implies

With the help of this lemma, we are able to construct the generating function of relaxed trees of right height equal to . Let be the respective combinatorial class, and be the associated generating function.

Corollary 6.2.

The generating function of relaxed trees of right height equal to is

Proof.

Such a tree is either just a leaf of size or it is constructed from an element of by appending a new root node. Obviously, this construction does not increase the right height, and it constructs all such trees. On the level of generating functions this directly translates into

Solving the equation and extracting coefficients gives the result. ∎

This gives an alternative proof of the lower bound in Lemma 5.3. It nicely exemplifies how exponential generating functions model operations on compacted trees.

We proceed now with other operations on combinatorial classes and generating functions. The next two might seem “strange” at first glance, as they do not produce relaxed trees. However, they are the basic operations for the construction of other ones.

Lemma 6.3 (Adding/deleting the root while ignoring pointers).

Let be a class of relaxed trees. Let be the class of objects obtained from by adding a new root node without pointer (as its right child), and let be the class obtained from by deleting the root node but (if existent) keeping its pointer.333This means in particular, that a single leaf, being root of a size 0 object, simply disappears. Furthermore, an object with a root having no pointers will become disconnected at the root. The pointers from the right to the left subtree remain. However, this construction will only be used when the root has a pointer. Then,

Proof.

Adding a new root node increases the size by one, whereas deleting it decreases it by one. Hence, elements of of size are in bijection with elements of of size as well as with elements of of size , compare Figure 9. Therefore, we get

Figure 9: Adding a new root node without pointer, deleting a root node while preserving its (possible) pointer, and adding a new pointer to the existing root node.

These constructions can then be used to derive the following to operations:

Proposition 6.4 (Sequences and pointers).

The generating function corresponding to the class obtained by appending an arbitrary (possibly empty but finite) sequence of nodes to the root (each with one pointer) to a class is given by

The generating function of the class obtained by adding a new, additional pointer to the root nodes of the objects of a class is given by

Proof.

This is a direct consequence of the Lemmas 6.1 and 6.3, compare Figures 9 and 10. ∎

Figure 10: Appending a finite (possibly empty) sequence to the root node.

Now we have all operations needed to continue our investigation of trees with bounded right height. In the next sections we show how this calculus is used to derive differential equations for relaxed and compacted trees of bounded right height.

In the sequel, it will prove convenient to work with operators on generating functions. For this purpose, we will use the same letters for the operators as were used for the combinatorial classes (or generating functions).

7 Relaxed binary trees

We will now show how to use the calculus developed in Section 6 to derive ordinary differential equations for the exponential generating functions of relaxed trees of bounded right height. In this context we introduce the following notation: Let be the combinatorial class of relaxed trees. Its exponential generating function is given by where denotes the number of elements in of size . We denote the class of relaxed trees of right height at most by and its corresponding exponential generating function by .

We have derived in Corollary 6.2 as

Let us now consider relaxed trees of right height at most one.

7.1 Relaxed trees of right height at most 1

Let be the combinatorial class of relaxed trees with right height at most , compare Figure 11. The corresponding generating function is given by .

Figure 11: A relaxed tree from , i.e. with right height at most .

We will break the problem into smaller parts by decomposing according to the following equation

(6)

where is the exponential generating function of relaxed binary trees with exactly right subtrees, i.e.  right edges in the spine going from level to level . Obviously, we have . In order to get , we apply the previously developed constructions. An illustration of such a tree is shown in Figure 12.

Figure 12: A relaxed tree with exactly one right edge in the spine.
Proposition 7.1.

The generating function of relaxed trees with exactly one right edge in the spine is given by

Proof.

The idea is to decompose the structure of into smaller parts which are in bijection to constructible classes.

  1. On level there is a unique node with one right edge, see Figure 12. Before this node there is a possibly empty sequence of nodes corresponding to the sequence construction given by the operator . Call this the initial sequence. First consider a relaxed tree with empty initial sequence, see Figure 13.

    Figure 13: Step 1: An element of with empty sequence of initial nodes on level .
  2. On level , the left child of the unique node with two children (and without pointer) is followed by a sequence of nodes, whose pointers may only point to vertices of the sequence. This is an element of and thus counted by .

    Furthermore, we see that the elements on level form a sequence with a cherry as its last element. Its pointers may also point to nodes from the sequence discussed in the previous paragraph, which is in bijection with . By moving the -instance of level to the end of the sequence on level we get a sequence containing one special node which has two pointers. Then we delete the last node on level , compare with Figure 14.

    In terms of generating functions we get

    (7)

    Note that due to the cherry every element has at least one internal node.

    Figure 14: Step 2: Nodes of level  can only point to nodes on level  (left); moving these nodes to level and deleting the remaining node at level  gives (right).
  3. Furthermore, notice that the node on level containing a right child (and not a right pointer) has no pointers. However, elements of the initial sequence may point to it. Therefore, we reinsert this node by adding it as a new root without pointer. The constructed object bijectively corresponds to the elements of with empty initial sequence.

  4. Finally, we append an initial sequence (cf. Step ).

After those steps, the resulting object looks like shown in Figure 15: a sequence with two special nodes, one having no pointer, the other one having two pointers. The class of all such elements is in bijection with , as all the steps above can be reverted.

Figure 15: Step 4: The final sequence-like object bijectively corresponding to .

Now we have to translate the operations performed in the four steps into algebraic operations on generating functions. As already mentioned, after Step 2 the class of objects we get in that way has generating function . The operation in Step 3 corresponds to integrating the generating function by Lemma 6.3. The final step is the application of the operator of Proposition 6.4 and therefore generates a factor , which completes the proof. ∎

The main idea of the previous proof was to cut and glue the -instance in such a way that a sequence-like object appears such that the process forms a bijection from to the class of sequence-like objects of the form shown in Figure 15. This new object has the advantage of being constructible by the operations introduced in Section 6.

Of course, one can easily compute explicitly. Yet, this representation is easier to generalize to .

Corollary 7.2.

The generating function of relaxed trees with exactly right edges in the spine from level to level is given by

Proof.

By cutting at the first right edge from level  to level , we observe a decomposition into an initial sequence, a right edge from level  to level  with its two endnodes being a sequence on level  and an instance counted by . The decomposition is exhibited in Figure 16. Thus, we may reuse the construction from Proposition 7.1 by replacing the initial value by . ∎

Figure 16: A recursive decomposition of elements from .

Finally, we are able to combine the previous results to derive the generating function of . We need the classical notation of double factorials:

Theorem 7.3.

The exponential generating function of relaxed trees of right height at most is D-finite and satisfies

The closed form formula and the coefficients are given by

Remark 7.4: The general background of -finite functions is well presented in Stanley’s book [31].

Proof.

We start with the result of Corollary 7.2. But instead of the integral representation, we use the following differential equation valid for :

Remembering the initial decomposition (6) and summing over all we get

Rearranging this equation and replacing by we get

(8)

Now, , hence the differential equation simplifies to

Solving this equation by separation of variables yields the closed form expression. Finally, the coefficients extraction is easy while using . ∎

7.2 Relaxed trees of right height at most 2

Let be the combinatorial class of relaxed trees with right height at most , compare Figure 17. The corresponding generating function is given by .

Figure 17: A relaxed tree from , i.e. with right height at most .

In the same fashion as before, we will break the problem into smaller parts by decomposing into

(9)

where is the exponential generating function of relaxed trees of right height at most  with exactly right edges in the spine going from level to level . Obviously, we have .

Remark 7.5: Note that, as seen in the sequel, the functions are in fact the perturbation of the recurrence of differential equations we are currently building. Moreover, they also uniquely determine the initial condition of this recurrence. Therefore, we will sloppily call these functions as well as others in the same role “initial conditions”. This should not be confused with the initial conditions of the differential equations themselves. Those do not play any role in our arguments, so the risk of confusion should be low.

Proposition 7.6.

The exponential generating function of relaxed trees of right height at most  with exactly one right edge from level to level in the spine satisfies

(10)
Proof.

The main idea is to decompose the structure of again into parts (compare with Figure 18): an initial sequence, the first right edge from level to level , the sequence on level after this right edge, and an instance of starting on level after this right edge. Then we use the same transformation idea as in the proof of Proposition 7.1. We take the sequence on level after the right edge and move it to the end of the -instance. Note that this is legitimate concerning the pointers. But it generates a node with two pointers within a sequence of . With respect to the -instance this change happens on its top level to the very left.

We can now delete the initial sequence and the level node of the right edge, as they can be created again by known operations. Let us denote the class of objects obtained performing the above operations on objects from and then deleting these two parts by , its generating function by . By Lemma 6.3 and Proposition 6.4 we get

Note that is associated to structures with right height at most . It is nearly an instance of . There are only two differences:

First, it has a special construction after its last right edge. With respect to the differential equation (8) which corresponds to the class , this change affects the initial condition (recall Remark 7.2!) . Thus, we can reuse this specification, by replacing the initial condition. On the level of generating functions this corresponds to replacing by , because a (possibly empty) sequence is followed by a node with a double pointer and another sequence (compare with Figure 14). Hence, by (8) the corresponding combinatorial class has generating function given by

Second, due to the unique right edge from level to level , every object in has at least one node. The elements in which do not satisfy this condition are leaves and they belong to . On the level of generating functions the elements of which are not leaves correspond to , as is a sequence construction. This gives

This yields

(11)

Finally, putting everything together some tedious calculations show (10). ∎

Figure 18: Transforming a structure of into an instance of .

Remark 7.7: We want to comment on the last reasoning in the previous proof. It might seem complicated and awkward to delete the leaf by subtracting and adding the shifted version . Another solution would obviously be to subtract only the generating function associated with a single leaf, giving . This is of course legitimate, however it leads to an inhomogeneous differential equation. We will see that it is crucial to have a homogeneous equation, because we want to sum over infinitely many of them.
As in the case, we get for by a recursive application of the previous arguments.

Corollary 7.8.

The generating function of relaxed trees with right height at most , and exactly right edges in the spine from level to level is given by

Proof.

By cutting at the first right edge from level to level , we observe a decomposition into an initial sequence, a right edge from level to level with nodes, a sequence on level  and an instance counted by . Thus, we may reuse the construction from the proof of Proposition 7.6 by replacing the initial value with . ∎

Note that for the final result it is crucial that we found homogeneous differential equations.

Theorem 7.9.

The exponential generating function of relaxed trees of right height at most is D-finite and satisfies

A closed form formula and the coefficients are given by

Proof.

Again, let us take the result of Corollary 7.8 and sum over all , while remembering the decomposition (9). By linearity this gives

(12)

A simplification gives

Inserting the initial value we get the D-finite expression. The correctness of the closed form formula can then be easily checked with a computer algebra system.

In order to extract the coefficients of we observe that the differential equation can be simplified further by an integration with respect to . Thus, it is equivalent to

as . Next, observe that as we are dealing with exponential generating functions, the derivative is just a shift on the level of coefficients. In other words, . Therefore, a partial fraction decomposition enables a direct extraction of the coefficients. ∎

7.3 Relaxed trees of right height at most

The approach from the previous section can be generalized to an arbitrary bound for the right height. Let be the corresponding generating function. The idea is to use the previous construction, and to derive a differential equation for from the one of .

Figure 19: A relaxed tree from , i.e. with right height at most .

We introduce a family of linear differential operators , , which describe the differential equations constructed for . Let denote the differential operator and the identity operator, i.e. . For example, .

Theorem 7.10 (Differential operators).

Let be a family of differential operators given by

Then the exponential generating function of relaxed binary trees with right height at most satisfies for

(13)
Proof.

For we derive two families of operators: The differential operator and an auxiliary operator for the inhomogeneity such that

For we derived in (8) the claimed form with .

We continue with the case . The explicit differential operator is given in Theorem 7.9. We will now show how the operator can be constructed from the ones for and in the language of operators.

In Proposition 7.6 we have derived the necessary substitution to get the differential equation of from the one of . The idea was to decompose with respect to the number of right edges from level to level , see Figure 18. This transformation creates an -like structure with a new initial condition and the constraint not to be empty.

From (8) we get the generic differential equation for -like structures with generating function as

First, the new initial condition is given by

Second, the -like class being in bijection to cannot be empty, and the initial sequence on level has to be appended. Thus, the substitution (11) has to be used where is replaced by , and by . This gives for

Summing over and recalling that we get

On the left we see the differential operator applied to and on the right the inhomogeneity operator applied to . Inserting shows the claim for .

Finally, for larger , we can recycle the previous arguments for and apply them recursively. This holds, as we may again cut an instance of at the first right edge in the spine from level to level and decompose it in the repeatedly shown fashion, compare with Figure 18. Then the same reasoning as in Section 7.2 allows us to extract the differential equation of from the one of by

(14)

Hence, by induction the claim holds. ∎

Let us apply the last theorem and compute the first few differential equations.

The initial conditions of the differential equations can be obtained successively from lower order solutions. In particular, note that due to the construction the first coefficients of are the first elements of the counting sequence of relaxed trees, as a tree of size has always right height at most . Thus with we can enumerate all relaxed trees up to size .

Next, we take a closer look at these operators.

Theorem 7.11 (Properties of ).

For any , let be as in Theorem 7.10. Let be such that

(15)

Then we have

The initial polynomials are , , and .

Proof.

The initial polynomials are given by Theorem 7.10. The shape (15) of the operator follows by induction using its recursive definition. Using an ansatz and comparing coefficients gives the recurrence relations for . ∎

The asymptotic behavior (according to ) of the number of relaxed trees with right height at most is governed by these differential equations. These differential equations belong to a known class [14, Chapter VII.9]. Consider an ordinary generating function of the kind

(16)

where the are meromorphic in a simply connected domain . Given a meromorphic function , let be the order of the pole of at , and meaning that is analytic at .

Definition 7.12 (Regular singularity, [14, p. 519]).

The differential equation (16) is said to have a singularity at if at least one of the is positive. The point is said to be a regular singularity if

and an irregular singularity otherwise.

Definition 7.13 (Indicial polynomial, [14, p. 520]).

Given an equation of the form (16) and a regular singular point , the indicial polynomial at is defined as

where

and . The indicial equation at is the algebraic equation .

The following technical lemma will be needed to derive the asymptotics for the solutions of the special type of differential equations given in Theorem 7.15.

Lemma 7.14.

Let and consider the differential operator

Suppose that is a simple factor of , and suppose that for some , a solution of admits a generalized series solution . Then the coefficient sequence satisfies a recurrence of the form

where are certain polynomials in  and is some fixed nonnegative integer.

Proof.

We have for all .
Write for , in the understanding that runs through all integers, but is zero for all negative and almost all positive indices . By assumption, we know that .

It follows that