Rooted Tree Arithmetic and Equations††thanks: Many thanks are due to Federico Poloni and Mahdi Amani for their comments and suggestions.
We propose a new arithmetic for non-empty rooted unordered trees simply called trees. After discussing tree representation and enumeration, we define the operations of tree addition, multiplication and stretch, prove their properties, and show that all trees can be generated from a starting tree of one vertex. We then show how a given tree can be obtained as the sum or product of two trees, thus defining prime trees with respect to addition and multiplication. In both cases we show how primality can be decided in time polynomial in the number of vertices and we prove that factorization is unique. We then define negative trees and suggest dealing with tree equations, giving some preliminary results. Finally we comment on how our arithmetic might be useful, and discuss preceding studies that have some relations with our. To the best of our knowledge our approach and results are completely new aside for a similar proposal deposited as an arXiv manuscript .
1 Basic properties and notation
We refer to rooted unordered trees simply called trees. Our trees are non empty. 1 denotes the tree containing exactly one vertex, and is the basic element of our theory.
In a tree , r denotes the root of ; denotes any of its vertices; and respectively denote the numbers of vertices and leaves. A subtree is the tree composed of a vertex and all its descendants in . The subtrees routed at the children of are called subtrees of . denotes the number of subtrees of r.
A tree can be represented as a binary sequences (the original reference for ordered trees is ). In our scheme is traversed in left to right preorder inserting 1 in the sequence for each vertex encountered, and inserting 0 for each move backwards. Then is composed of bits as shown in Figure 1, and has the recursive structure 1 . . . 0, where the are the sequences representing the subtrees of r. The sequences for tree 1 is 10. Note that all the prefixes of have more 1’s than 0’s except for the whole sequence that has as many 1’s as 0’s.
Since is unordered, the order in which the subsequences appear in is immaterial (i.e., in general many different sequences represent ). However a canonical form for trees is established so that their sequences will be uniquely determined, and will result to be ordered for increasing values if interpreted as binary numbers. To this end the trees are grouped into consecutive families as shown in Figure 2, where contains the trees of vertices. So the trees are ordered for increasing number of vertices, and inside each family the ordering is determined by the canonical form as follows. Trees and sequences are then numbered with increasing natural numbers.
If the sequences are interpreted as binary numbers, for two trees with we have because the initial character of each sequence is 1 and is shorter than . This is consistent with the property that the trees of precede the trees of in the ordering.
The families contain one tree each numbered 1, 2.
The ordering of the trees in is based on the ordering of the preceding families. Consider the multisets of positive integers whose sum is . E.g., for these multisets are: 1,1,1,1,1 - 1,1,1,2 - 1,1,3 - 1,2,2 - 1,4 - 2,3 - 5 ordered for non-decreasing value of the digits left to right. Each multiset corresponds to a group of consecutive trees in , where the digits in the multiset indicate the number of vertices of the subtrees of the root. For in Figure 2, multiset 1,1,1,1,1 refers to tree 18; multiset 1,1,1,2 refers to tree 19; multiset 1,1,3 refers to trees 20 and 21 that have the two trees of as third subtree, following the ordering in ; ; multiset 2,3 refers to trees 27 and 28; the last multiset 5 refers to trees 29 to 37 whose roots have only one child.
So the first tree in is the one of height 2 with subtrees of the root of one vertex each and sequence 1 1 0 1 0 1 0 . . . 1 0 0; and the last tree is the “chain” of vertices and sequence 1 1 . . . 1 0 0 . . . 0. As said the binary sequences representing the trees in are ordered for increasing values, see the listing for the first six canonical families in the Appendix.
Many of these trees (not necessarily all) of each family can be generated from the ones in using the following:
Doubling Rule DR. From each tree in build two trees in by adding a new vertex as the leftmost child of r, or adding a new root and appending to it as a unique subtree.
For example the four trees of in Figure 2 can be built by DR from the two trees of . The nine trees of can be built by DR from the four trees of , with the exception of tree 13. The twenty trees of can be built by DR from the nine trees of , with the exception of trees 27 and 28. In fact the number of extra trees that cannot be built with DR increases sharply with . Letting denote the number of trees in we immediately have for . But a deep analysis [3, 7] has shown that the asymptotic value of this function is much higher, and can be approximated as:
Then the minimum length of the sequences representing the trees of is given approximately by:
much less than the bits of our proposal. We only note that for all the binary sequences representing our trees begin with two 1’s and end with two 0’s (see the listing in the Appendix), then these four digits could be removed, leaving a sequence of bits to represent a tree. We shall see that our representation is amenable at working easily on the trees, so we maintain it, leaving the construction of a shorter efficient coding as a challenging open problem.
An arbitrary tree can be transformed into its canonical form with Algorithm CF of Figure 3. An elementary analysis shows that the algorithm is correct and each of its steps 1,2 can be executed in total O() time. The algorithm can be possibly improved, however, our present aim is just showing that the problem can be solved in polynomial time.
2 Operators and tree generation
Our basic operations are addition (symbol +) and multiplication (symbol , or simple concatenation) defined as follows. Referring to Figure 4, let be two arbitrary trees:
Addition. is built by merging the two roots r, r into a new root r. That is the subtrees of and (if any) become the subtrees of r. We have 1 = 1 .
Multiplication. is built by merging r with each vertex so that all the subtrees of r become new subtrees of . We have 1 = 1 .
In both operations it is immaterial in which order the subtrees are attached to the new parents. We also define the operation stretch (symbol over-bar) whose interest will be made clear in the following:
Stretch. consists of a new root r with attached as a subtree.
In the notation stretch has precedence over multiplication, and multiplication has precedence over addition. Two propositions immediately follow:
For we have . For we have . For we have .
Addition is commutative and associative. That is and .
For a positive integer and a tree we can define the product (not to be confused with the product of trees) as the sum of copies of . Due to Propositions 2 and 1, the copies of can be combined in any order and we have . However, for any given , the different trees of vertices obtained as a product are only , that is they constitute an exponentially small fraction of all the trees in . For example the “even” trees (obtained for even) are a small minority among all the trees with the same number of vertices. Similarly we can define the stretch-product as stretched times, and we have . Again for any given , the trees of vertices obtained as a stretch-product are only and constitute an exponentially small fraction of all the trees in .
For tree multiplication, associativity is simple but commutativity is more complicated. From the definition of multiplication we have with simple reasoning:
Multiplication is associative.
That is . For a positive integer and a tree we can define the power as the product of copies of . Due to Propositions 3 and 1 the multiplications can be done in any order and we have . Again, for any given , the different trees of vertices obtained as are only .
Multiplication is generally not commutative. For a product we consider the cases and (the case is symmetric), and pose the conditions below. Recall that, for any tree , and respectively denote the number of leaves of and the number of subtrees of r. For our conditions are only necessary.
For we have if and only if .
The if part is immediate. For the only if part let and . From the construction of the two products we immediately have and . If we have then , then since . Note that and contain subtrees rooted in the former leaves of and respectively, each coinciding with and respectively. Each of these subtrees contains vertices, while all the other subtrees of contain a different number of vertices. Then for having the former two groups of subtrees should be identical, that is each subtree coinciding with in must be equal to a subtree coinciding with in . That is . ∎
For we have only if the following conditions are all verified:
(ii) is a proper subtree of ;
(iii) if all the subtrees of r must be equal to some subtrees of r.
Let and .
Condition (i). Immediate from the observation that implies (see the proof of Proposition 5).
Condition (ii). As in the proof of Proposition 5, consider the subtrees of respectively attached to the former leaves of in and of in . Since (see the proof above) and we have . In there are such subtrees of vertices and in there are such subtrees of vertices. For having the above subtrees of (all coinciding with ) should be present also in where, by the construction of , they must appear as subtrees of the copies of in .
Condition (iii). By construction the subtrees of r appear also in as subtrees of r where they are the ones with fewer vertices because all the others have at least vertices. And the subtrees of r appear also in as subtrees of r where they are the ones with fewer vertices because all the others have at least vertices. Note that all these other subtrees of r have more vertices than the subtrees of r since . For having the subtrees of r that appear as subtrees of r must be equal to subtrees of r and, for what just seen about these subtrees, they must be equal to subtrees among the ones with fewer vertices, i.e. with subtrees of r. This also implies that if then . ∎
The trees and of Figure 2 do not comply with conditions (i) and (ii) of Proposition 5 and we have different from . Commutative products are in fact quite rare. An example with is shown in Figure 5 where the three conditions of Proposition 5 are verified. In this particular case we have hence . Finally multiplication is generally not distributive over addition. From Proposition 1 we can immediately prove:
if and only if 1.
A basic fact about our arithmetic is that all trees can be generated by the single generator 1 using addition and stretch.111Stretch been included in the operation set to allow the construction of all trees starting from a finite set of generators. The reader may check that addition and multiplication, or stretch and multiplication, are not sufficient for this purpose. Namely:
Tree 1 is the generator of itself.
Assuming inductively that each of the trees in with can be generated by the trees of the preceding families, then each tree in can also be generated. In fact if r has one subtree then can be generated as ; if r has subtrees then can be generated as where is deprived of and is deprived of .
3 Prime trees
In the arithmetic of natural numbers the basic operations are addition and multiplication, with and . Prime numbers under addition have no sense, since all greater than 1 can be constructed as the sum of two smaller terms other than 0 and . In our arithmetic for trees, instead, primality occurs in relation with addition and multiplication. In this whole section we refer to trees with . We pose:
(i) is prime under addition (shortly add-prime) if can be generated by addition only if the terms are 1 and (tree 1 has a companion role of integer 0 in ).
(ii) is prime under multiplication (shortly mult-prime) if can be generated by multiplication only if the factors are 1 and .
The definition of mult-primality is the natural counterpart of the one of primality in . As it may be expected its consequences are not easy to study. For add-primality, instead, the situation is quite simple. We have:
is add-prime if and only if r has only one subtree.
By contradiction. If part: for an arbitrary tree with 1, r has at least two subtrees, then for any pair 1. Only if part: if r has subtrees 1 then , where is equal to deprived of and is equal to deprived of , with 1. ∎
For the number of add-prime trees is .
From Equation (1) we have: for , that is the add-prime trees in are asymptotically about one third of the total. Each of the remaining add-composite (i.e., non add-prime) trees can be uniquely factorized in factors.
For mult-primality we start with two immediate statements respectively derived from Proposition 1, and from the definition of multiplication for trees with at least two vertices. More complex conditions for mult-primality can be found in .
If is a prime number all the trees with vertices are mult-prime.
If r has only one subtree then is mult-prime.
The converse of Propositions 9 and 10 do not hold in our arithmetic. That is if is a composite number or r has more than one subtree, tree may still be mult-prime. In a sense mult-prime trees are more numerous than primes in . For example out of the twenty trees in (see Figure 2) only trees 20, 22, 24, and 28 are mult-composite (i.e. non mult-prime), as they can be built as , , , and , respectively.
Since if is prime is mult-prime, and the problem of deciding if is prime is polynomial in , deciding if is mult-prime is straightforward for prime. However the problem is difficult for composite because may be mult-prime or mult-composite. An algorithm for composite may consist of building all the products and of two trees of vertices respectively for all the factorizations of as , and comparing with these products looking for a match. However this method is impracticable unless is very smal, then we must find a different way to decide mult-primality. To this end consider a property of product trees based on the observation that, if , all the subtrees of r are also subtrees of r. Namely:
Let with 1, and let be a subtree of r with maximum number of vertices. Then the subtrees of r are exactly the subtrees of r with at most vertices.
Since , the subtree has been inserted at r as the largest subtree of r. Then also the subtrees of r with at most vertices must have been inserted at r as subtrees of r since they have too few vertices for deriving from former subtrees of r whose vertices are merged with in . Furthermore the remaining subtrees of r cannot be subtrees of r since they have too many vertices by the hypothesis that is a largest subtree of r. ∎
In the mult-composite tree of Figure 5, if the first subtree of r (containing one vertex) is a subtree of maximal cardinallity of one of the factors, in this case, then consists of a root plus the first two subtrees of r. Similarly, if the third subtree of r is a subtree of maximal cardinality of one of the factors, in this case, then consists of a root plus the first four subtrees of r. We pose:
For an arbitrary tree : (i) are the groups of subtrees of r with the same number of vertices, ; (ii) , , i.e. each is the group of subtrees of r with up to vertices.
Based on Propositions 11 and Notation 1 we can build the primality Algorithm MP of Figure 6 that requires polynomial time in the number of vertices. Since all trees with a prime number of vertices are mult-prime, MP is intended for testing trees with composite. However MP works for all trees and can always be applied to avoid a preliminary test for the primality of .
Mult-primality of a tree can be decided in time polynomial in .
Refer to Algorithm MP. Correctness. Only step 3 requires an analysis. is the changing version of and is restored at each -th cycle. If one of the groups of subtrees can be erased from at all vertices encountered in the traversal, the cycle is completed and the algorithm terminates declaring that is mult-composite. In fact tree , whose root has the subtrees in , is one of the factors of (see Proposition 11). If none of the -cycles can be completed, that is no can be found as being the group of subtrees of in all vertices of , the tree is mult-prime as declared in step 4.
Complexity. A superficial analysis of the algorithm is the following. Step 1 requires O() time as discussed for Algorithm CF. Step 2 is executed with a linear time scan because the tree is now in canonical form and the number of vertices in each subtree of the root has been computed by algorithm CF in step 1. Step 3 requires O() copy operations of into in O() time, and O() traversals each composed of O() steps, for a total of O() steps. At each step at vertex the subtrees in must be compared with the subtrees of with the same cardinality; this can be done by representing such subtrees with their binary sequences and comparing these sequences. In the worst case vertex has O() subtrees of length O(), so that building and comparing all the sequences takes time O(), and the total time required by step 3 is O(). Note that this analysis is very rough because the number of vertices of decreases during the traversal, so the stated bound O() is exceedingly high. ∎
Note that if is mult-composite Algorithm MP allows to find a pair of factors at no extra cost, with mult-prime. In fact, if a cycle of step 3 is completed, the algorithm is interrupted on the return statement and the group contains exactly the subtrees of r, while the tree is reduced to . In particular is the last factor of a product of mult-prime trees, with . If Algorithm MP is not interrupted with the return statement, all these factors can be detected. As a consequence we have:
Mult-factorization of any tree is unique.
By contradiction assume that has two different factorizations and in multi-prime factors. Tracing back from and , let and be the first pair of factors encountered with . Then we have . By Proposition 11 must contain as one of its factors (or vice-versa), against the hypothesis that is mult-prime. ∎
Finally note that counting the number of add-prime trees is simple (Proposition 8), but an even approximate count for mult-prime trees is much more difficult. We pose:
Open problem. For a composite integer determine the number of mult-prime trees of vertices.
4 Negative trees and tree equations
Once addition and multiplication are known, it is natural to define the inverse operations.
We define the subtraction if and only if all the subtrees of r are also subtrees of r. Then equals deprived of such subtrees. This is the inverse of the addition . We have 1 = .
We define the division if and only if there exists a subset of the vertices of such that each has exactly the subtrees of r, and the tree obtained as deprived of such subtrees has exactly the vertices of . Then . This is the inverse of the multiplication . We have 1 = .
Also the operation of stretch has an inverse. We define the un-stretch (symbol underline) if and only if r has exactly one subtree , and we pose . In the notation un-stretch has precedence over multiplication and stretch has precedence over un-stretch.
As negative numbers arose from subtraction in integer arithmetic, the more intriguing concept of negative trees arises here from tree subtractions. We propose the following definition. All the vertices of a tree are either positive (then is positive) or negative (then is negative), except for the root that is neutral. Positive and negative vertices are respectively indicated with a black dot or an empty circlet. The root is also indicated with a black dot. Changing the sign of a tree amounts to changing the nature of all its vertices except for the root. Tree 1 is neutral and we have 1 1.
Addition and subtraction between and keep their definition with the additional condition that if is positive and is negative all the subtrees of r are also subtrees of r or vice-versa, and positive and negative subtrees with identical shape cancel each other out in the result (See Figure 7). Multiplication and division between and also keep their definition with the additional condition that if and are both positive or both negative the result is positive, otherwise is negative.
At this point we may open a window on tree equations whose terms have all the nature of a tree, but integers may appear as multiplicative coefficients or exponents. In a sense they are companions of the Diophantine equations with integers, but the solutions are now required to be trees. We may consider equations of different degrees with different number of variables, ask questions on the existence and on the number of solutions, study the computational complexity of finding them. In fact we give only some examples, leaving the field essentially open.
Denote trees and integers with capital and lower case letters respectively. The simplest equation is linear and has only one unknown . We put:
1, i.e. (2)
Equation (2) admits exactly one solution if and only if the subtrees of r can be divided in groups of identical subtrees, where each has cardinality for , see example E1 in Figure 8. In this case has subtrees that can be divided in groups of subtrees identical to the ones of . This solution can be easily built in time polynomial in starting with the transformation of in canonical form. Note that and have opposite sign.
A standard linear tree equation in two unknowns can be expressed as:
1, i.e (3)
This equation is the companion of the diophantine equation widely used in modular algebra, that admits an integer solution if and only if is divided by . So applying Proposition 1 to the trees of equation (3) we have and a necessary condition for the existence of a tree solution is that divides , as in the examples E2, E3 of Figure 8 where . In general equation (3) admits a solution if and only if one of the two non trivial Conditions 1 and 2 below hold, corresponding respectively to trees of equal sign or of opposite sign. In both cases the solution can be built in time polynomial in . We have:
Condition 1. The subtrees of r can be divided in groups and groups of identical subtrees, where each has cardinality for and each has cardinality for . In this case has subtrees divided in groups of subtrees identical to the ones of ; and has subtrees divided in groups of subtrees identical to the ones of . This solution can be built in time polynomial in . Note that and have the same sign, and has opposite sign. See Equation E2 in Figure 8.
Condition 2. Let the unknown trees and have opposite sign. W.l.o.g. let the subtrees of r be divided in groups of identical subtrees, and the subtrees of r be divided in groups of identical subtrees, with and . And let the subtrees of r be divided in groups of identical subtrees. , , respectively denote the cardinalities of .
To allow the addition the subtrees in must be identical to the ones in for ; the subtrees in must be identical to the ones in for ; and we have the system of diophantine equations:
whose integer solutions (if any) state that the copies of the subtrees of suffice to elide the copies of the subtrees of in , for ; and copies of the subtrees in appear as subtrees of , for . The system can be solved under the conditions:
integer for (iii)
integer for (iv)
for a value of established as the minimum value for which condition (iv) holds (this fixes also the value of ). Then if all conditions (iii) hold the system is solved in time polynomial in and two trees , satisfying equation (3) are immediately built from the values of , , out a potentially infinite number of solutions. In particular note that, for all , the values , must be both positive to represent subset cardinalities. If this does not happen, an alternative positive solution is built from the other by standard methods. See equation E3 in Figure 8.
Higher degree equations are more difficult to handle. For the quadratic tree equation:
1, i.e (4)
a necessary condition for the solution is the existence of two integers satisfying the algebraic equation , a well known NP-complete problem. To find a reasonably interesting approach for deciding whether equation (4) has a solution is left as an open problem.
A “more ambitious” problem can be expressed as:
with the question of deciding if equation (5) has a tree solution for any . In fact even for the problem is not simple. Due to Proposition 1 we have the necessary condition for its solution, i.e. the existence of a “quasi-Pythagorean” triple of integers. In fact such triples exist, as for example , but the existence of Pythagorean trees with such numbers of vertices is left as an open problem.
5 Possible applications and extensions
While the major purpose of the present study is the one of defining arithmetic concepts outside the realm of numbers, let us briefly discuss what the role of our proposal in applications might be.
Essentially all trees used in computer algorithms are rooted, and different families have been defined among them to deal with particular problems. We do not put any restriction on the tree structure. The trees considered here simply correspond to nested sets as for example hierarchical structures in computer science; or office plans in business organization; or phylogenetic trees in biology, etc. Note that the subtrees are essentially unordered at any vertex, although they must be stored in some standard form to be represented, e.g. following an alphanumeric label order of similar. Or, of course, in our canonical order.
Two main actions are generally required in a hierarchical structure. Namely: (i) add a new subtree to the root of a tree ; or (ii) join two independent trees to form a new tree with subtrees of the root. In our arithmetic, action (i) is represented as ; and action (ii) is represented as . Both actions can be respectively undone as: ; and , .
An important extension of action (i) is inserting a new subtree at a given vertex of . This is obtained by an iterative operation along the path , from r to . Letting be the subtrees rooted at vertices , hence , we set for ; then we set ; then we set for , where gives the transformed tree. A similar operation is required to extract a subtree at vertex . Propositions 2 and 3 hold for the subtrees rooted at , with obvious effects on the whole tree.
Other operations can be considered and their representation investigated along the lines above. In particular multiplication may be performed on subtrees only, and even be limited at leaves. Note that, even though multiplication could find fewer applications than addition and stretch, it may be useful in data compression because the information contained in a product is fully present in its factors, thereby reducing the storage space needed for the product from to . So the concept of primality may be of practical interest in the reverse-engineering operation of deciding if a tree has been generated as a product.
6 Other studies on tree arithmetic
Up to now only one major line of research, that we call LBY, has been directed to defining arithmetic on trees. Opened by J.L. Loday et al in connection with dendriform algebras , it was then developed by J.L. Loday himself who gave a full description of arithmetic operations on binary trees and their properties, showing an embedding of in the subsets of all binary trees of vertices . A. Bruno and D. Yasaki worked on Loday’s theory introducing primality and counting properties on subsets of trees in . LBY is limited to binary trees, which carries simpler consequences than in our general case. A non-commutative tree addition is defined in LBY, attaching the second addend to a deepest leaf of the first one, and this operation is given in two versions to express any tree by addition from one generator (as in our proposal two different operations are needed). From this construction stems a definition of tree multiplication to produce trees different from our products. Several interesting properties are derived, including some counting arguments on the different families of trees built. The most relevant extension done by Bruno and Yasaki over Loday’s theory is the definition and treatment of prime trees under multiplication. Aside from proceeding with similar purposes, none of the definitions and results of LBY applies to our theory, or vice-versa.
Another study on tree arithmetic, due to R. Sainudiin, is aimed at using binary trees for treating mapped partitions of a special class of intervals , and has nothing to share with LBY and with our theory. None of these works deals with aspect of computational complexity related to the operations on trees.
Along an independent line of research several papers have been directed to define graph multiplication, from the seminal work of G. Sibidussi  to the one of B. Zmazek and J. Zerownik . In this context prime graphs and graph factorization have been considered under various operations of multiplication, see . Again, if applied to trees as special graphs, all the definitions and results on tree multiplication are unrelated to ours.
We finally note that a preliminary work with partial overlapping with the present paper was deposited as an earlier arXiv manuscript . Such a version did not include negative trees and tree equations.
-  N. Bray and E.W. Weisstein. Graph Product. Math World - A Wolfram Web Resource. http://mathworld.wolfram.com/GraphProduct.html
-  A. Bruno and D. Yasaki. The Arithmetic of Trees. Involve 4 (1) (2011) 1-11.
-  S. Finch. Two Asymptotic Series. www.people.fas.harvard.edu/ sfinch/
-  J.L. Loday, A. Frabetti, F. Chapoton, and F. Goichot. Dialgebras and related operands. Lecture Notes in Mathematics 1763, Springer-Verlag, Berlin (2001).
-  J.L. Loday. Arithmetree. J. Algebra 258 (1) (2002) 275Ð309.
-  F. Luccio. Arithmetic for Rooted Trees. arXiv:1510.05512v2. (2015).
-  J.M. Plitkin and J.W. Rosenthal. How to obtain an asymptotic expansion of a sequence from an analytic identity satisfied by its generating function. J. Australian Math Soc. Ser. A56 (1994) 131-143.
-  G. Sabidussi. Graph Multiplication. Math. Z. 72 (1960) 446-457.
Algebra and Arithmetic of Plane Binary Trees: Theory Applications of Mapped Regular Pavings.
-  S. Zaks. Lexicographic Generation of Ordered Trees. Theoretical Computer Science 10 (1980) 63-82.
-  B. Zmazek and J. Zerownik. Weak Reconstruction of Small Product Graphs. Discrete Mathematics 307 (2007) 641-649.
The binary sequences representing the trees of the first six canonical families.