Improved Methods for Computing Distances between Unordered Trees Using Integer Programming
Abstract
Kondo et al. (DS 2014) proposed methods for computing distances between unordered rooted trees by transforming an instance of the distance computing problem into an instance of the integer programming problem. They showed that the tree edit distance, segmental distance, and bottomup segmental distance problem can be respectively transformed into an integer program which has variables and constraints, where and are the number of nodes of input trees. In this work, we propose new integer programming formulations for these three distances and the bottomup distance by applying dynamic programming approach. We divide the tree edit distance problem into subproblems each of which has only constraints. For the other three distances, each subproblem can be reduced to a maximum weighted matching problem in a bipartite graph which can be solved in polynomial time. In order to evaluate our methods, we compare our method to the previous one due to Kondo et al. The experimental results show that the performance of our methods have been improved remarkably compared to that of the previous method.
References
1 Introduction
In machine learning applications, it is important to compare (dis)similarities between treestructured data such as XML and RNA secondary structures. There are many measures of similarities between two trees. The tree edit distance [16] is one of the most widely used measures, which is defined as the minimum cost of edit operations to transform a tree into another. It is equivalent to finding the maximum cost of a Tai mapping between two trees. However, the tree edit distance may not be appropriate to use in some applications where structuresensitivity is required. In this context, many variants of Tai mapping have been proposed (see [12], for example). In this study, four measures are covered including the edit distance, segmental distance [9], bottomup segmental distance [9] and bottomup distance [17].
It is known that most of distances between ordered rooted trees can be computed in polynomial time. For example, Tai [16] showed that the tree edit distance between ordered rooted trees can be computed in time, where and are the number of nodes of input trees, and Demaine et al. [4] improved the running time to . However, if input trees are unordered, the problems of computing the above four distances are known to be not only NPhard [20], but also MAX SNPhard [9, 17, 19]. Akutsu et al. studied the tree edit distance problem between unordered trees from a theoretical algorithmic perspective. They gave an approximation algorithm and exact algorithms [1, 2, 3]. From the practical point of view, many researches have been done so far. Horesh et al. [7] proposed an A algorithm to solve this problem for unlabeled unordered trees and Higuchi et al. [6] extended it for labeled trees. Fukagawa et al. [5] proposed a method to reduce the edit distance problem into the maximum vertex weighted clique problem which can be solved by an algorithm due to [15]. They showed that the cliquebased method is as fast as A*based method. Mori et al. [14] improved it by applying a dynamic programming approach. They showed that their method is faster than the previous cliquebased method. Kondo et al. [11] proposed a method to reduce an instance of the edit distance problem into an instance of integer linear programming (IP) problem with variables and constraints, where and are the number of nodes of input trees, respectively. However, the instance of their IP formulation has a large number of constraints and hence their method may not be applicable to moderatesized instances. Although they showed that their method is faster than the cliquebased method of Mori et al. [14] when input trees have large degree nodes, their IPbased method is not very effective when input trees have no large degree nodes or the size of tree is large.
An advantage of IPbased method is that we can easily make an IP formulation representing variations of the edit distance by adding some additional constraints. In fact, Kondo et al. showed IP formulations which represent segmental distance and bottomup segmental distance by adding appropriate constraints. Another advantage of this method is that we can use stateoftheart IP solvers (e.g. Gurobi, CPLEX), which can quickly solve many hard problems.
In this paper, we propose improved methods to compute the edit distance, segmental distance, bottomup segmental distance and bottomup distance between unordered rooted trees. The improvement of computational efficiency is obtained by applying a dynamic programming approach due to [14]. However, it is not only sufficient to apply the dynamic programming but it is necessary to use a structural property of rooted trees. Their dynamic programming with this property allows us to drastically reduce the number of constraints in our IP formulations for the above distances. For the edit distance problem, our method has to solve subproblems each of which has only constraints. For the other distances, each subproblem except the problem of combining the solutions of subproblems can be reduced to the maximum weighted matching problem in a bipartite graph, which can be solved in polynomial time using the Hungarian method [13].
The rest of the paper is organized as follows. We give notations and preliminary results in Sect. 2 and briefly explain the previous method in Sect. 3. In Sect. 4, we introduce our new methods. In order to evaluate our methods, we implemented previous and our methods and conducted experiment using Glycan dataset [10] and CSLOGS dataset [18]. The results of our experiments are shown in Sect. 5. Finally, we conclude our paper with some discussions.
2 Preliminaries
Let be a rooted tree. The root of is denoted by . In this paper, we simply write to represent the set of nodes of . For , means that is on the unique path between the root and . If and , we write and say that is an ancestor of and is a descendant of . It is easy to see that the relation is a partial order on . A parent of , denoted by , is the closest ancestor of . The children of , denoted by , is the set of the closest nodes to among the all descendants of . We call the number of children of the degree of . A node is called a leaf if it has no children. The set of all leaves of a tree is denoted by . Nodes and are siblings if they has the same parent. A tree is called unordered tree if there is no order between siblings. Let be a finite alphabet and a labeling function. A tuple is called a labeled tree. For , we use to denote the subtree of rooted at . For notational convenience, we simply write to denote the subgraph of obtained by removing a node .
2.1 Tree Edit Distance
The tree edit distance between two trees is defined as the minimum cost of edit operations to transform a tree into another.
Definition 1 (Edit Operations).
Let T be a tree. Edit operations on T consist of the following three operations.
 Substitution

Replace the label of a node in with a new label.
 Deletion

Delete a nonroot node of , making all children of be the children of .
 Insertion

Insert a new node as a child of some node in , making some children of be the children of .
Let , where is a blank symbol not in . In order to describe costs on edit operations, we denote each of the edit operations by a pair in . Substituting a node labeled with by another node labeled with is denoted by . Inserting a node labeled with is denoted by . Deleting a node labeled with is denoted by . Let be a cost function on edit operations and assume, in this paper, that is a metric. In the following, we simply write for to represent , where and are labeling functions on two trees and , respectively.
Let be a sequence of edit operations, where for . The cost of the sequence is defined as .
Definition 2 (Tree Edit Distance [16]).
Let and be trees and be the set of all sequences of edit operations which transform into . The tree edit distance between and , denoted by , is defined as
A mapping between and is a subset of . The set of nodes that belongs to a mapping is denoted by . Tai [16] gave a combinatorial characterization of the tree edit distance by means of a mapping, which is called a Tai mapping.
Definition 3 (Tai Mapping [16]).
Let and be trees. A mapping is called a Tai mapping if it satisfies the following constraints for every in :
 Onetoone correspondence :

,
 Preserving ancestordescendant relationship:

.
The cost of a Tai mapping is defined as
Let be the set of all Tai mappings between and . Tai [16] showed the following theorem.
Theorem 1 ([16]).
For two trees and , .
2.2 Variants of Edit Distance
The tree edit distance is one of the most widely used to measure a similarity between two trees. However, it may not be appropriate for some applications because one may need a distance on which some specific structure of trees is reflected. Many variants of the tree edit distance have been proposed in the literature [9, 17]. We work on the following three variants, which are defined by mappings rather than edit operations.
Definition 4 (Segmental Mapping [9]).
Let and be trees. A Tai mapping between and is called a segmental mapping if for any with and , .
Definition 5 (Bottomup Segmental Mapping [9]).
Let and be trees. A segmental mapping between and is called a bottomup segmental mapping if for any , there is such that are leaves with and .
Definition 6 (Bottomup Mapping [17]).
Let and be trees. A Tai mapping between and is called a bottomup mapping if for any , the submapping obtained from by restricting to forms a bijection between and .
Let us note that the condition in Definition 6 can be restated in the following way: M is a bottomup mapping if for any , the submapping obtained from by restricting to is an isomorphism mapping, ignoring the label information.
Definition 7 ([9, 17]).
Let and trees. Denote the sets of all possible segmental mappings, bottomup segmental mappings, and bottomup mappings between and by , and , respectively. The segmental distance, bottomup segmental distance, and bottomup distance between and , which are denoted by , and respectively, are defined as follows:
3 Previous Method [11]
In the rest of this paper, fix input trees and , and let and . Kondo et al. [11] proposed an integer linear programming formulation for the tree edit distance. For the tree edit distance between and , we introduce a binary variable for every which takes value 1 if and only if . Then, we can reformulate the cost of a Tai mapping as:
The two constraints of Tai mapping are directly formulated as the following inequalities:
The first two constraints are equivalent to the onetoone correspondence of Tai mapping. It means that for any node (resp. ), at most one node of (resp. ) is allowed to be paired. The third constraint is equivalent to the ancestordescendant preservation. It means that for any two pairs which do not preserve the ancestordescendant relationship, both of them cannot be included in simultaneously. This formulation contains variables and constraints.
Kondo et al. also gave IP formulations for the segmental distance and bottomup segmental distance. These distances can be formulated by imposing additional constraints on the above formulation. In regard of the segmental mapping, the constraints of segmental mapping can be represented as follows:
The constraints of bottomup segmental mapping can also be represented as follows:
The above two formulations also contain variables and constraints.
4 Improved Method
4.1 Improved Method for Tree Edit Distance
In this section, we propose a new IP formulation for the edit distance problem by combining a dynamic programming approach due to [14]. The dynamic programming computes a minimum cost Tai mapping between and with for in a bottomup manner. Once we have the solutions for all pairs , we can construct a minimum cost Tai mapping between and .
First, we modify the objective function
to
where . This modification is valid since the second and third terms do not affect the minimization.
Since the solution of our subproblem for and must contain the root pair , the objective function on the input trees and can be represented as
(1) 
We denote by the maximum value of (1). If at least one of and is a leaf, . Thus, in the following, we assume that neither nor is a leaf. The idea for our dynamic programming is that can be recursively computed from the values for and . To be precise, let be the set of all Tai mappings between and such that and both and are antichains in and , respectively. For a Tai mapping , we let and to denote and , respectively. The following lemma is a key ingredient of our formulation.
Lemma 1.
.
Proof.
We first show that the lefthand side is at most the righthand side. Let be a Tai mapping between and with . Then, can be uniquely decomposed into such that for any , is a Tai mapping between and with and . Such a decomposition can be obtained by choosing minimal node pairs with respect to : For any either and , or and are not comparable to and , respectively. For each , we have . Therefore, .
To show the converse, let be maximizing the righthand side. For each , we let be a Tai mapping between and such that and . Since and are antichains, is a Tai mapping between and . Therefore, we have and hence the lemma holds. ∎∎
Mori et al. [14] reduced the problem of finding a maximum weight Tai mapping in to the maximum vertex weight clique problem, which corresponds to the maximum weight independent set problem on complement graphs. Their reduction can be interpreted as the following constraint:
However, this formulation contains constraints.
In order to reduce the number of constraints, we will exploit a structure of rooted trees. For a node and a leaf , let be the unique path between and in . Then, for any and any (resp. ), at most one node of (resp. ) can be chosen in , that is,
This is formalized by the following lemma.
Lemma 2.
Let and . Then, can be computed by the following IP.
Proof.
By Lemma 1, it suffices to prove that is in if and only if is a feasible solution.
Suppose first that . Since forms an antichain in , has at most one node in for each . Therefore, binary variables do not violate the first type constraints. A symmetric argument for implies that is a feasible solution for the IP.
Suppose, for contradiction, is a feasible solution and there are in that violate the condition of . There are two possibilities: and violate the onetoone correspondence of Tai mapping or at least one of or holds. For the former case, assume without loss of generality that and . In this case, the pairs contribute at least two to a constraint for each , which contradict the feasibility of . For the latter case, assume without loss of generality that . In this case, there is a path that contains both and . The pairs contribute at least two to a constraint for such , which also contradict the feasibility of . Therefore, the lemma holds. ∎∎
For and , we can compute by using the formulation of Lemma 2. The remaining task is to compute from the values .
Theorem 2.
Let be the optimal value of the following IP. Then, .
4.2 Improved Methods for Variants of Edit Distance
As the edit distance was computed in the previous section, the other distances can also be computed in the same manner: For each amd , compute , and then combine the solutions of subproblems as in Theorem 2.
Segmental Distance
Let and be nodes of two trees and , respectively. We denote here by the maximum weight, that is the maximum value of (1), of segmental mappings between and with . If either or is a leaf, we have . Thus, we suppose otherwise. Suppose have already computed for each . Observe that for any segmental mapping with , a child of must be paired with a child of in . Moreover, if a descendant of that is not a child of is in , the child of that is an ancestor of must be in . These observations imply that can be constructed by a union of mappings for and , where is a mapping between and with . Therefore, in order to compute , we construct a bipartite graph as follows. For each , we create a vertex and for each and , add an edge between and whose weight equals as in Fig. 1. It is wellknown that a maximum weight bipartite matching can be solved in polynomial time using Hungarian method [13].
When is computed for each and , we can compute the segmental distance between and by Theorem 2.
Bottomup Segmental Distance
Because any bottomup segmental mapping is a segmental mapping, the above observations also hold and each subproblem can be reduced to a maximum weight matching problem in a bipartite graph as well. The only difference from the case of segmental distance is that every segment must include at least one leaf. To this end, we need to exclude the following two cases from our solution. If exactly one of and is a leaf, then must be zero since violates the condition of bottomup segmental mapping. The other case is that neither nor is a leaf and the solution of the maximum weight matching equals zero. This implies that an optimal mapping between and consists of a single pair , which also violates the condition of bottomup segmental mapping. Therefore, we set in this case.
Bottomup Distance
First, we propose a naive IP formulation for computing bottomup distance. A straightforward implication from Definition 6 is that if , the mapping between and must be a bijection. The formulation can be obtained from that of Tai mapping by adding the following constraints:
This formulation contains variables and constraints.
Since bottomup mapping is a subclass of bottomup segmental mapping, we can apply the above technique as well. All we have to do is to consider the case when two trees and are structurally isomorphic. Thus, for and , we set if two subtrees and are not structurally isomorphic, i.e., they are isomorphic ignoring the labels.
Our improved methods contain subproblems which can be solved in polynomial time. For combining the solutions of these subproblems, we need to solve an integer program in Theorem 2. Such IPs also have variables and constraints.
5 Experiments
To compare the experimental performance of our methods and the previous methods, we applied them to real treestructured data. We used glycan data obtained from KEGG/Glycan database [10] and CSLOGS dataset [18] which consists of web log files. In our experiments, we adopt the unit cost for the cost function, which is defined as:
We implemented the previous methods for computing edit distance (IP_Edit), segmental distance (IP_Sg), and bottomup segmental distance (IP_BotSg) given by Kondo et al. [11] and a naive method for computing bottomup distance (IP_Bot) described in the previous section. We also implemented our methods for computing these four distances (DpIP_Edit, DpIP_Sg, DpIP_BotSg, and DpIP_Bot). In addition to the above implementations, we intended to compare our methods with the algorithm due to Mori et al. [14]. Their algorithm reduces the tree edit distance problem to the maximum weight clique problem and uses the maximum weight clique algorithm due to [15]. However, the purpose of our experiments is to compare formulations or reductions rather than the performance of specific IP or other solvers. Therefore, we used an ordinary IP formulation of the maximum weight clique problem instead of the algorithm of [15], which is denoted by IP_DpClique_E.
We implemented the methods mentioned above in Java 1.8 combined with IBM ILOG CPLEX 12.7. We have forced CPLEX to run in sequential mode, setting parameter IloCplex.IntParam.Threads to one. Every implementation of the presented methods is also singlethreaded. The experiments were performed using a computer with 3.7 GHz QuadCore Intel Xeon E5 and 32 GB RAM, under the Mac OS X.
5.1 Glycan dataset
The results for edit distance with Glycan dataset are shown in Table 1. “# of nodes” in the table means the total number of nodes of two input trees. We randomly selected at most 100 input tree pairs from the Glycan dataset for each range of total nomber of nodes. Avg and t.o. stand for average execution time (in seconds) and the number of instances timed out, respectively. The table shows that DpIP_Edit is much faster than IP_Edit. IP_DpClique_E is not faster than IP_Edit when the size of inputs are large, while IP_DpClique_E outperforms IP_Edit when the inputs are smallsized trees. It is shown that DpIP_Edit also outperforms IP_DpClique_E. It implies that it is not sufficient to adopt a dynamic programming aproach for improving on the practical performance, and the revised IP formulation derived from the dynamic programming is of great importance for reducing the running time on the tree edit distance problem.
Table 2 shows the results for the variants of edit distance. For segmental distance and bottomup segmental distance, the proposed methods (DpIP_Sg and DpIP_BotSg) finished computing within 1 second while the naive methods (IP_Sg and IP_BotSg) take longer than 30 seconds if the total size of input trees is large. For bottomup distance, the naive method (IP_Bot) was fast as all instances were computed within 30 seconds. However, our improved method (DpIP_Bot) is still much faster than the naive method.
# of nodes  # of instances  IP_Edit  DpIP_Edit  IP_DpClique_E  

avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  2.393  0  0.308  0  0.994  0 
55  59  100  4.661  0  0.417  0  1.576  0 
60  64  88  11.661  6  0.576  0  2.894  0 
65  69  36  17.774  4  0.669  0  3.433  0 
70  74  100  13.209  7  0.654  0  11.799  7 
75  79  29  20.771  9  0.823  0  11.411  7 
80  84  9  18.705  8  1.094  0  14.941  6 
85  89  5  0  5  1.330  0  21.838  3 
90  94  4  0  4  1.442  0  0  4 
# of nodes  # of instances  IP_Sg  DpIP_Sg  IP_BotSg  DpIP_BotSg  IP_Bot  DpIP_Bot  

avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  5.306  0  0.135  0  1.545  0  0.136  0  0.569  0  0.131  0 
55  59  100  9.070  5  0.135  0  2.539  0  0.139  0  0.785  0  0.131  0 
60  64  88  13.983  41  0.137  0  4.767  0  0.142  0  1.258  0  0.132  0 
65  69  36  23.813  27  0.140  0  6.219  0  0.147  0  1.544  0  0.133  0 
70  74  100  20.408  97  0.145  0  10.252  4  0.150  0  1.453  0  0.134  0 
75  79  29  21.274  27  0.148  0  12.794  5  0.154  0  2.021  0  0.137  0 
80  84  9  0  9  0.152  0  17.606  3  0.160  0  3.002  0  0.137  0 
85  89  5  0  5  0.157  0  29.157  4  0.163  0  3.869  0  0.142  0 
90  94  4  0  4  0.161  0  0  4  0.166  0  4.476  0  0.145  0 
5.2 CSLOGS Dataset
We divided CSLOGS dataset into two subsets: SUBLOG3 and SUBLOG49. Every tree in SUBLOG3 (resp. SUBLOG49) is restricted to have the maximum degree at most 3 (resp. 49). We randomly selected at most 100 pairs from each dataset with a specified range of the total number of nodes.
The results of computation for SUBLOG3 are shown in Table 3 and 4. Table 5 and 6 shows the results for SUBLOG49. Compared to the results in SUBLOG3, the naive methods (IP_Edit, IP_Sg, IP_BotSg, and IP_Bot) in SUBLOG49 works faster. This property is what has been observed in the previous work by Konto et al. In regard of IP_DpClique_E, it outperforms IP_Edit when the degrees of trees are small, though their performances are scarcely different with highdegree inputs.
# of nodes  # of instances  IP_Edit  DpIP_Edit  IP_DpClique_E  

avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  2.478  0  0.435  0  3.853  0 
55  59  100  3.892  0  0.510  0  5.393  2 
60  64  100  6.641  0  0.633  0  8.243  17 
65  69  100  9.921  1  0.760  0  7.191  34 
70  74  100  15.077  9  0.917  0  8.244  44 
75  79  100  16.534  29  1.112  0  6.352  47 
80  84  100  19.024  45  1.247  0  5.144  44 
85  89  100  21.249  70  1.449  0  4.711  48 
90  94  100  23.946  91  1.872  0  6.863  59 
95  99  100  26.599  92  2.136  0  7.971  61 
# of nodes  # of instances  IP_Sg  DpIP_Sg  IP_BotSg  DpIP_BotSg  IP_Bot  DpIP_Bot  

avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  5.978  0  0.136  0  1.970  0  0.140  0  0.568  0  0.131  0 
55  59  100  10.208  7  0.136  0  2.922  0  0.141  0  0.764  0  0.132  0 
60  64  100  13.791  31  0.141  0  5.245  0  0.145  0  1.076  0  0.134  0 
65  69  100  18.372  57  0.144  0  6.562  1  0.148  0  1.390  0  0.135  0 
70  74  100  20.195  75  0.146  0  8.513  15  0.151  0  1.856  0  0.137  0 
75  79  100  22.485  87  0.149  0  11.003  10  0.154  0  2.372  0  0.138  0 
80  84  100  22.865  91  0.150  0  12.489  18  0.157  0  3.031  0  0.139  0 
85  89  100  26.028  94  0.154  0  14.864  25  0.160  0  3.746  0  0.140  0 
90  94  100  26.866  98  0.158  0  17.244  48  0.167  0  4.861  0  0.144  0 
95  99  100  0  100  0.160  0  18.644  57  0.170  0  5.808  0  0.147  0 
# of nodes  # of instances  IP_Edit  DpIP_Edit  IP_DpClique_E  

avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  1.275  0  0.263  0  1.643  0 
55  59  100  2.323  0  0.317  0  3.014  0 
60  64  100  4.032  0  0.395  0  5.452  3 
65  69  100  4.756  0  0.402  0  6.721  6 
70  74  100  6.231  1  0.450  0  7.188  10 
75  79  100  8.808  10  0.567  0  9.787  19 
80  84  100  11.850  6  0.583  0  10.037  28 
85  89  100  12.429  21  0.665  0  10.145  34 
90  94  100  13.595  33  0.678  0  11.228  34 
95  99  100  15.711  30  0.829  0  12.084  39 
# of nodes  # of instances  IP_Sg  DpIP_Sg  IP_BotSg  DpIP_BotSg  IP_Bot  DpIP_Bot  

avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  avg  t.o.  
50  54  100  2.130  0  0.143  0  0.739  0  0.142  0  0.376  0  0.130  0 
55  59  100  4.704  0  0.147  0  1.521  0  0.145  0  0.514  0  0.133  0 
60  64  100  6.795  11  0.151  0  2.863  3  0.150  0  0.707  0  0.153  0 
65  69  100  7.741  8  0.162  0  2.544  1  0.154  0  0.830  0  0.135  0 
70  74  100  9.277  19  0.158  0  3.257  2  0.159  0  1.036  0  0.139  0 
75  79  100  12.421  38  0.162  0  5.143  6  0.162  0  1.376  0  0.139  0 
80  84  100  12.707  39  0.167  0  5.788  7  0.169  0  1.644  0  0.142  0 
85  89  100  14.817  46  0.170  0  7.136  3  0.176  0  2.129  0  0.144  0 
90  94  100  13.267  65  0.175  0  8.479  8  0.179  0  2.361  0  0.147  0 
95  99  100  16.752  65  0.181  0  8.776  16  0.184  0  2.881  0  0.148  0 
We can observe that the proposed methods (DpIP_Edit, DpIP_Sg, DpIP_BotSg, and DpIP_Bot) show remarkably improved the previous methods (IP_Edit, IP_Sg, IP_BotSg, and IP_Bot) as most of instances are computed within 0.2 seconds. In order to measure the scalability of the proposed methods, we used the wide range of dataset. We selected input tree pairs so that the number of total nodes ranges from around 0 to around 850. The results are shown in Fig. 2. For segmemtanl distance and bottomup segmental distance, the smallest instance which exceeds our time limit of 30 seconds appears when the total number of nodes belongs to range 450  500 whereas it appears for tree edit distance when the number of nodes belongs to range 150  200. For bottomup distance, all instances selected in this experiments are solved within 7 seconds.
6 Conclusion and Discussion
We have proposed improved methods for computing the tree edit distance and its variants. While the naive IP formulation proposed by Kondo et al. [11] has constraints, our efficient IP formulation, though it has subproblems, only has constraints. In case of segmental distance, bottomup segmental distance and bottomup distance, each subproblem, except for the problem combining the solutions of subproblems, can be reduced to the maximum weighted matching problem in a bipartite graph, which can be solved in polynomial time.
We performed some experiments using real treestructured dataset. While the previous method only works for smallsized trees, our methods are still effective for largesized trees. In particular, for segmental distance and bottomup segmental distance, our methods are available for trees whose total size is up to 450, and for bottomup distance, every instance is solved within 7 seconds.
An advantage of IPbased method is that we can easily give an IP fomulation for another distance by adding some constraints to the IP formulation for edit distance. Therefore, extending our method to another important distance measure between unordered trees such as tree alignment distance [8] would be our future work. It would be interesting to develop practical algorithms for computing those distances without using general purpose solvers such as IP solvers or SAT solvers.
References
References
 Akutsu, T., Fukagawa, D., Halldorsson, M.M., Takasu, A., Tanaka, K.: Approximation and parameterized algorithms for common subtrees and edit distance between unordered trees. Theoretical Computer Science 470, 10–22 (2013)
 Akutsu, T., Fukagawa, D., Takasu, A., Tamura, T.: Exact algorithms for computing the tree edit distance between unordered trees. Theoretical Computer Science 412(45), 352–364 (2011)
 Akutsu, T., Tamura, T., Fukagawa, D., Takasu, A.: Efficient exponentialtime algorithms for edit distance between unordered trees. Journal of Discrete Algorithms 25, 79–93 (2014)
 Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. ACM Transactions on Algorithms 6(1), 1–19 (2009)
 Fukagawa, D., Tamura, T., Takasu, A., Tomita, E., Akutsu, T.: A cliquebased method for the edit distance between unordered trees and its application to analysis of glycan structures. BMC Bioinformatics 12(Suppl 1), S13 (2011)
 Higuchi, S., Kan, T., Yamamoto, Y., Hirata, K.: An A* Algorithm for Computing Edit Distance between Rooted Labeled Unordered Trees. In: New Frontiers in Artificial Intelligence, pp. 186–196. Springer Berlin Heidelberg (2012)
 Horesh, Y., Mehr, R., Unger, R.: Designing an A* Algorithm for Calculating Edit Distance between RootedUnordered Trees. Journal of Computational Biology 13(6), 1165–1176 (2006)
 Jiang, T., Wang, L., Zhang, K.: Alignment of trees — an alternative to tree edit. Theoretical Computer Science 143(1), 137–148 (1995)
 Kan, T., Higuchi, S., Hirata, K.: Segmental Mapping and Distance for Rooted Labeled Ordered Trees. In: Algorithms and Computation, pp. 485–494. Springer Berlin Heidelberg (2012)
 Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1), 27–30 (2000)
 Kondo, S., Otaki, K., Ikeda, M., Yamamoto, A.: Fast Computation of the Tree Edit Distance between Unordered Trees Using IP Solvers. In: Discovery Science, pp. 156–167. Springer International Publishing (2014)
 Kuboyama, T.: Matching and Learning in Trees. Ph.D. thesis, The University of Tokyo (2007)
 Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2(12), 83–97 (1955)
 Mori, T., Tamura, T., Fukagawa, D., Takasu, A., Tomita, E., Akutsu, T.: A CliqueBased Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees. Journal of Computational Biology 19(10), 1089–1104 (2012)
 Nakamura, T., Tomita, E.: Efficient algorithms for finding a maximum clique with maximum vertex weight (in Japanese). Technical Report, the University of ElectroCommunications (2005)
 Tai, K.C.: The TreetoTree Correction Problem. Journal of the ACM 26(3), 422–433 (1979)
 Valiente, G.: An efficient bottomup distance between trees. In: Proceedings Eighth Symposium on String Processing and Information Retrieval. IEEE (2001)
 Zaki, M.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 17(8), 1021–1035 (2005)
 Zhang, K., Jiang, T.: Some MAX SNPhard results concerning unordered labeled trees. Information Processing Letters 49(5), 249–254 (1994)
 Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)