# Chain Rotations: a New Look at Tree Distance

###### Abstract

As well known the rotation distance between two binary trees , of vertices is the minimum number of rotations of pairs of vertices to transform into . We introduce the new operation of chain rotation on a tree, involving two chains of vertices, that requires changing exactly three pointers in the data structure as for a standard rotation, and define the corresponding chain distance . As for , no polynomial time algorithm to compute is known. We prove a constructive upper bound and an analytical lower bound on based on the number of maximal chains in the two trees. In terms of we prove the general upper bound and we show that there are pairs of trees for which this bound is tight. No similar result is known for where the best upper and lower bounds are and respectively.

### Keywords:

Binary tree, Rotation distance, Chain distance, Upper and lower bounds, Design of algorithms.

## 1 A new definition of tree distance

Consider a rooted binary tree of vertices identified with the integers from 1 to in infix order as for a binary search tree (in the following the term tree always refers to trees of this form). A subtree of is a tree rooted in a vertex of and containing all the descendants of in . The vertices of correspond to an integer interval, e.g., in the tree of Figure 1 the subtree rooted at 7 corresponds to the interval [4,8]. A rotation of two adjacent vertices is an operation preserving the infix order through the change of three pointers. In Figure 1 the rotation between vertices 5 and 7 produces a tree where the right pointers of 3 and 5, and the left pointer of 7, have been changed. The inverse rotation between 7 and 5 transforms into .

Rotations were originally defined to keep binary search trees balanced. In [2] Culik and Wood have defined the rotation distance between two trees and as the minimum number of rotations needed to transform into . has then been adopted in combinatorics as a standard measure of distance between trees, and has a role in computational biology where a comparison between evolutionary trees is done on the basis of subtree transfer [3]. A transformation requiring rotations is called optimal.

A constructive upper bound was given in [2] and was improved to in the seminal work of Sleator, Tarjan, and Thurston [9] where the authors transformed the problem into polygon transformation via diagonal flips. Since then a rich literature has appeared on the subject, nevertheless no efficient algorithm has been proposed to determine , nor it is known whether the problem is NP-hard. In particular interesting estimates for have been given in [7, 8], a significant approximation algorithm has been proposed in [3], and other works have been directed to establish significant lower bounds [4, 6]. All in all rotation distance is a classical topic and has led to many elegant results.

An important concept is the one of equivalent edges, that is pairs of edges, one in and one in , whose deletion splits both and in two parts and where are subtrees of containing the same subset of vertices (hence correspond to the same integer interval), and containing the remaining vertices. In Figure 1 the edges (3,7) of and (3,5) of are equivalent, with the resulting subtrees of and respectively rooted in 7 and 5 and corresponding to the interval [4,8]. The remaining portions of contain vertices 1, 2, 3, 9, 10. If a pair of equivalent edges exists, any optimal transformation of into can be done independently on and . Note that the equivalent edges can be determined in linear time. Letting denote the number of pairs of equivalent edges, the two trees are split accordingly into pairs of trees to be processed independently. It has been proved that, for , at least rotations are needed to transform into [1, 6], then the lower bound follows. We assume that the splitting of has been done beforehand, then we shall work on trees without equivalent edges.

Rotations were defined in the field of data structures with the unquestionable merit of being local operations that require exactly three pointer changes. This implies the replacement of two vertices (5 and 7 in Figure 1) and one subtree transfer (subtree [6], here composed of one vertex, migrates from right subtree of 5 to left subtree of 7). We now propose a more general operation called c-rotation, where stands for chain, that is done on chains instead of single vertices and also requires three pointer changes and one subtree transfer. A standard rotation is a special case of c-rotation if the chain contains only one vertex. We let:

Terminology and notation. A left chain [-], , in a tree is a sequence of vertices connected to one another with left pointers, from (the highest) to (the lowest). A maximal left chain is such that no other left chain contains it. The complete left chain [-1] contains all the vertices linked with left pointers in the order , -1, …, 1. A right chain, a maximal right chain, and the complete right chain [1-] are similarly defined. A left or right chain containing only one vertex is denoted by []. and respectively denote the number of maximal left chains and of maximal right chains in .

In the tree of Figure 1: [7-4] is a maximal left chain; [5-4] is a not maximal left chain, being contained in [7-4]; [5-6] and [4] are maximal right chains. We have and . We have:

###### Proposition 1

In a tree of vertices we have .

Proposition 1 is proved by simple induction on . The basis is , for which we have and . Letting the proposition to be true for , insert a new leaf in as a child of an existing vertex . If is a left child of , is unchanged and is increased by 1. If is a right child of , is unchanged and is increased by 1. Then the proposition is true for . Similarly note that and are respectively equal to the numbers of non-null right and left pointers in plus 1. As a consequence the values of can be computed in time in a tree traversal.

As for standard rotations, a c-rotation can be inverted. If needed we shall distinguish between direct and inverse c-rotations, defined as follows:

###### Definition 1

A (direct) c-rotation rot- in a tree , where - is a left chain and is the right child of , is a local operation where: (i) takes the place of (i.e. becomes a child of the parent of , if any); (ii) becomes the left child of ; and (iii) the left subtree of , if any (i.e., if - is not maximal), becomes the right subtree of . The definition also holds for a right chain - exchanging the terms “left” and “right” whenever they occur.

In the tree now repeated in Figure 2, the c-rotation rot([7-5],3) produces the tree . Note that a direct c-rotation merges two chains into one, and [-] may be a maximal or a non maximal chain.

###### Definition 2

An (inverse) c-rotation rot,- in a tree , where - is a left chain and is left child of (then is in the same left chain of -), is a local operation where: (i) takes the place of (i.e. becomes a child of the parent of , if any) and becomes the right child of ; and (ii) the right subtree of , if any, becomes the left subtree of . Again the definition also holds for a right chain - exchanging the terms “left” and “right” whenever they occur.

In Figure 2 the inverse c-rotation rot(3,[7-5]) in produces the tree again. Note that an inverse rotation splits a chain in two, and the chain [-] cannot be maximal. We immediately have:

###### Proposition 2

In a (direct or inverse) c-rotation three pointers change. Namely, for rot-, where is the parent of if any, we change: (i) the pointer from to (or the outside pointer to the tree if is the tree root); (ii) the left (respectively right) pointer of ; (iii) the right (respectively left) pointer of . For rot,- we change: (i) the pointer from to (or the outside pointer to the tree if is the tree root); (ii) the right (respectively left) pointer of ; (iii) the left (respectively right) pointer of .

In rot([7-5],3) of Figure 2 the left pointer of the parent of now points to ; the left pointer of now points to ; the right pointer of now points to the left child 4 of . The effect of the inverse rotation rot(3,[7-5]) is specular. Note that if [-] is a maximal chain has no left (respectively right) child, then in a direct rotation rot- the right (respectively left) pointer of becomes null.

###### Definition 3

Given two trees of vertices, the chain distance is the minimum number of c-rotations needed to transform into .

As a trivial example we have in Figure 2, however, determining the chain distance in the general case is a hard problem.

## 2 Upper and lower bounds on

We start giving a transformation algorithm between two trees based on c-rotations. The possibility of inverting a c-rotation suggests a strategy often used for regular rotations, e.g. see [5]. First it is decided how to transform both and into a proper target tree . Then the overall procedure will be , where is done by inverting the c-rotations of and applying them in opposite order. is upper bounded by the sum of c-rotations in and . The target tree chosen here is either the complete left chain [-1] or the complete right chain [1-].

In Figure 3 we show the structure of two algorithms for transforming a tree of vertices into the chain [-1] (ROTLEFT), or into the chain [1-] (ROTRIGHT). Clearly both algorithms can be implemented to run in linear time. We have:

###### Proposition 3

min().

Proposition 3 has a simple constructive proof. The transformation can be performed as a combination of -1] and -1], or as a combination of [1-] and [1-], using ROTLEFT or ROTRIGHT. Since the number of c-rotations executed by ROTLEFT() and by ROTRIGHT() are and respectively, the proposition follows. Since ROTLEFT and ROTRIGHT use only direct c-rotations, the overall transformation will consists of direct c-rotations in and inverse c-rotations in . We can now derive an upper bound on as a function of , namely:

###### Proposition 4

.

Proof. For proving an upper bound valid for all trees we must compute the maximum of the function f = min() given in Proposition 3 under the variation of the parameters involved. Recalling from Proposition 1 that for any tree and letting , we can reformulate the function as f = min() with growing linearly in the range . It is now easy to prove that is maximized for and the given bound follows. Q.E.D.

For proving a lower bound on we must rely on the properties of c-rotations. Working on the maximal chains of we have:

###### Proposition 5

.

Proof. In a tree a direct c-rotation rot([-],), where [-] is a maximal left (respectively right) chain, induces the changes and (respectively and ). If instead [-] is not maximal the values of and remain unchanged (for example see the c-rotation in Figure 2). An inverse rotation rot(,[-]), where [-] is a left (respectively right) chain and has no right (respectively left) child, induces the changes and (respectively and ). If instead has a right (respectively left) child the values of and remain unchanged (for example invert the rotation in Figure 2). Then a c-rotation may change the value of by at most one. The given bound immediately follows because, after the transformation of into , the two trees must have the same number of maximal left chains. Q.E.D.

Note that from Proposition 1 we have and , hence . This implies that can be replaced with in Proposition 5. There are pairs of trees without equivalent edges for which the lower bound of Proposition 5 is particularly significant. For the trees of Figure 4, for example, we have and , hence where is an arbitrary integer constant, . In fact there are many ways of building pairs of trees with for arbitrary values of . In particular for we have:

###### Proposition 6

There are pairs of trees without equivalent edges for which .

The lower and upper bounds of Propositions 4 and 6 are tight. A comparable result is unknown for the standard rotation distance where the upper bound must be compared with the highest known lower bounds or proved in [4]. It is also worth noting that for the trees of Figure 4 we have by Proposition 3, matching the lower bound shown in the figure for any value of .

## 3 Concluding remarks

Standard rotations have been mainly studied in the framework of data organization and computational biology. When considered as a measure of tree distance, however, different rules may apply to different cases. As an example, finding pairs of trees that meet the upper bound of is outside the realm of balanced search trees where long chains of pointers prevent such a bound to be met [5]. Therefore it seems reasonable to investigate other concepts of distance going beyond standard rotations. In this respect we have proposed chain distance here. In fact we believe that a possible merit of this note, if any, is stimulating a discussion on the concept of distance between trees.

When looking for alternatives we must observe three main properties, fulfilled both by rotations and chain rotations. First the transformation rely on subtree transfer and on the replacement of a constant number of vertices; then an invariant is maintained in the tree (the infix ordering of vertices in our case); and finally the basic operation requires constant time (rotation or c-rotation require changing three pointers). An alternative to rotations and c-rotations could be moving a single vertex along a chain with a jump of any length to insert above one of its ancestors , thus defining a long distance between two trees. It can be easily seen that also in this case one subtree is relocated, the infix ordering of the vertices is maintained, and only three pointers are changed. Again is a special case of if only vertex jumps of length one are allowed. Upper and lower bounds on the distance should be established in this case.

In fact, a wealth of possibilities is open.

## References

- [1] S. Cleary and K. St. John, A linear-time approximation for rotation distance, Journal on Graph Algorithms and Applications 14 (2010) 385-390.
- [2] K. Culik and D. Wood, A note on some tree similarity measures, Information Processing Letters 15 (1982) 39-42.
- [3] B. DasGupta, X. He, T. Jang, M. Li, J. Tromp, L. Wang, and L. Zhang, Computing Distances between Evolutionary Trees, In: D.Z. Du and P.M. Pardalos, Eds., Handbook of Combinatorial Optimization, Kluwer Academic Publ. (1998) 35-76.
- [4] P. Dehornoy, On the rotation distance between binary trees, Advances in Mathematics 223 (4) (2010) 1316-1355.
- [5] F. Luccio and L. Pagli, On the upper bound on the rotation distance of binary trees, Information Processing Letters 31 (2) (1989) 57-60.
- [6] F. Luccio, A Mesa Enriquez, L. Pagli, Lower bounds on the rotation distance of binary trees, Information Processing Letters 110 (2010) 934-938.
- [7] J. Pallo, An efficient upper bound of the rotation distance of binary trees, Information Processing Letters 73 (3-4) (2000) 87-92.
- [8] R. Rogers, On finding shortest paths in the rotation graph of binary trees, in: Proc. Southeastern Internat. Conf. on Combinatorics, Graph Theory, and Computing, Vol. 137 (1999) 77-95.
- [9] D.D. Sleator, R.E. Tarjan, W.R. Thurston, Rotation distance, triangulations, and hyperbolic geometry, J. Amer. Math. Soc. 1 (3) (1988) 647-681.