Towards a theory of statistical treeshape analysis
Abstract
In order to develop statistical methods for shapes with a treestructure, we construct a shape space framework for treelike shapes and study metrics on the shape space. This shape space has singularities, corresponding to topological transitions in the represented trees. We study two closely related metrics on the shape space, TED and QED. QED is a quotient Euclidean distance arising naturally from the shape space formulation, while TED is the classical tree edit distance. Using Gromov’s metric geometry we gain new insight into the geometries defined by TED and QED. We show that the new metric QED has nice geometric properties which facilitate statistical analysis, such as existence and local uniqueness of geodesics and averages. TED, on the other hand, does not share the geometric advantages of QED, but has nice algorithmic properties. We provide a theoretical framework and experimental results on synthetic data trees as well as airway trees from pulmonary CT scans. This way, we effectively illustrate that our framework has both the theoretical and qualitative properties necessary to build a theory of statistical treeshape analysis.
Keywords: Trees, Tree metric, Shape, Anatomical structure, Pattern matching, Pattern recognition, Geometry
1 Introduction
Treeshaped objects are fundamental in nature, where they appear, e.g., as delivery systems for gases and fluids [23], as skeletal structures, or describing hierarchies. Examples encountered in image analysis and computational biology are airway trees [38, 24, 39], vascular systems [5], shock graphs [31, 2, 36], scale space hierarchies [20, 6] and phylogenetic trees [3, 27, 14].
Statistical methods for treestructured data would have endless applications. For instance, one could make more consistent studies of changes in airway geometry and structure related to airway disease [42, 33] to improve tools for computer aided diagnosis and prognosis.
Due to the wide range of applications, extensive work has been done in the past years on comparison of trees and graphs in terms of matching [18, 24, 38], object recognition [6, 19] and machine learning [11, 28, 15] based on intertree distances. However, the existing treedistance frameworks are algorithmic rather than geometric. Very few attempts [41, 1, 26] have been made to build analogues of the theory for landmark point shape spaces using manifold statistics and Riemannian submersions [17, 32, 12]. There exists no principled approach to studying the space of treestructured data, and as a consequence, the standard statistical properties are not welldefined. As we shall see, difficulties appear even in the basic problem of finding the average of two treeshapes. This paper fills the gap by introducing a shapetheoretical framework for geometric trees, which is suitable for statistical analysis.
Most statistical measurements are based on a concept of distance. The most fundamental statistic is the mean (or prototype) for a dataset , which can be defined as the minimizer of the sum of squared distances to the data points:
(1) 
This definition of the mean, called Fréchet mean, assumes a space of treeshapes endowed with a distance , and is closely connected to geodesics, or shortest paths, between treeshapes. For example, the midpoint of a geodesic from to is a mean for the twopoint dataset . Hence, if there are multiple geodesics connecting to , with different midpoints, then there will also be multiple means. As a consequence, without (local, generic) uniqueness of geodesics, statistical properties are fundamentally illposed!
Thus, geometry enters the picture, and the idea of a geodesic treespace gives a constraint on the possible geometric structure of treespace. In a shape space where distances are given by path length, we must be able to continuously deform any given treeshape into any other by traveling along the shortest path that connects the two shapes. The deformationpaths are easy to describe when only branch shape is changed, while tree topology (branch connectivity) is fixed. Such deformations take place in portions of treespace where all trees have the same topological structure. It is more challenging to describe deformationpaths in which the treetopological structure is changed, for instance through a collapsed internal branch as in fig. 1(b). We model topologically intermediate trees as collapsed versions of trees with differing tree topology, and glue the portions of treespace together along subspaces that correspond to collapsed trees, as in fig. 2. As a consequence, treespace has self intersections and is not smooth, but has singularities!
The main theoretical contributions of this paper are the construction of a mathematical treeshape framework along with a geometric analysis of two natural metrics on the shape space. One of these metrics is the classical Tree Edit Distance (TED), into which we gain new insight. The second metric, called Quotient Euclidean Distance (QED), is induced from a Euclidean metric. Using Gromov’s approach to metric geometry [13], we show that QED generically gives locally unique geodesics and means, whereas finding geodesics and means for TED is illposed even locally. We explain why using TED for computing average trees must always be accompanied by a carefully engineered choice of edit paths in order to give welldefined results; choices which can yield average trees which are substantially different from the trees in the dataset [37]. The QED approach, on the contrary, allows us to investigate statistical methods for treelike structures which have previously not been possible, like different welldefined concepts of average tree. This is our motivation for studying the QED metric!
The paper is organized as follows: In section 1.1 we discuss related work. The treespace is defined in section 2, and the statistical properties of treespace are analyzed in section 3. In section 4, we discuss how to overcome the computational complexity of both metrics, and present a simple QED approximation. In section 5, we illustrate the properties of QED by computing geodesics and different types of average tree for synthetic planar data trees as well as by computing QED means for sets of airway trees from human lungs.
1.1 Related work
Metrics on sets of treestructured data have been studied by different research communities for the past 20 years. The bestknown approach is perhaps Tree Edit Distance (TED), which has been used extensively for shape matching and recognition based on medial axes and shock graphs [19, 37, 35]. TED and, more generally, graph edit distance (GED), are also popular in the pattern recognition community, and are still used for distancebased pattern recognition approaches to trees and graphs [11, 28]. The TED and GED metrics will nearly always have infinitely many shortest edit paths, or geodesics, between two given trees, since edit operations can be performed in different orders and increments. As a result, even the problem of finding the average of two trees is not well posed. With no kind of uniqueness of geodesics, it becomes hard to meaningfully define and compute average shapes or modes of variation. This problem can be solved to some extent by choosing a preferred edit path [11, 28, 37], but there will always be a risk that the choice has negative consequences in a given setting. Trinh and Kimia [37] face this problem when they use TED for computing average medial axes using the simplest possible edit paths, leading to average shapes which can be substantially different from most of the dataset shapes.
Statistics on treeshaped objects receive growing interest in the statistical community. Wang and Marron [41] study metric spaces of trees and define a notion of average tree called the medianmean as well as a version of PCA, which finds modes of variation in terms of treelines, encoding the maximum amount of structural and attributal variation. Aydin et al. [1] extend this work by finding efficient algorithms for PCA. This is applied to analysis of brain blood vessels. The metric defined by Wang and Marron does not give a natural geodesic structure on the space of trees, as it places a large emphasis on the treetopological structure of the trees. The metric has discontinuities in the sense that a sequence of trees with a shrinking branch will not converge to a tree that does not have that branch. Such a metric is not suitable for studying trees with continuous topological variations and noise, such as anatomical treestructures extracted from medical images, since the emphasis on topology makes the metric sensitive to structural noise.
A different approach is that of Jain and Obermayer [15], who study metrics on attributed graphs, represented as incidence matrices. The space of graphs is defined as a quotient of the Euclidean space of incidence matrices by the group of vertex permutations. The graphspace inherits the Euclidean metric, giving it the structure of an orbifold. This graphspace construction is similar to the treespace presented in this paper in the sense that both spaces are constructed as quotients of a Euclidean space. The graphspace framework does not, however, give continuous transitions in internal graphtopological structure, which leads to large differences between the geometries of the tree and graphspaces.
Trees also appear in genetics. Hillis et al. [14] visualize large sets of phylogenetic trees using multidimensional scaling. Billera et al. [3] have invented a phylogenetic treespace suitable for geodesic analysis of phylogenetic trees, and Owen and Provan [27] have developed fast algorithms for computing geodesics in phylogenetic treespace. Nye [26] has developed a notion of PCA in phylogenetic treespace, but is forced to make strict assumptions on possible principle components being ”simple lines” for the sake of computability. Phylogenetic trees are not geometric, and have fixed, labeled leaf sets, making the space of phylogenetic trees much simpler than the space of treelike shapes.
We have previously [10, 9] studied geodesics between small treeshapes in the same type of singular shape space as studied here, but most proofs have been left out. In [8], we study different algorithms for computing average trees based on the QED metric. This paper extends and continues [9], giving proofs, indepth explanations and more extensive examples illustrating the potential of the QED metric.
2 The space of treelike shapes
Let us discuss which properties are desirable for a treeshape model. As previously discussed, we require, at the very least, local existence and uniqueness for geodesics in order to compute average trees and analyze variation in datasets. When geodesics exist, we want the topological structure of the intermediate trees along the geodesic to reflect the resemblance in structure of the trees being compared. In particular, a geodesic passing through the trivial onevertex tree should indicate that the trees being compared are maximally different. Perhaps more importantly, we would like to compare trees where the desired edge matching is inconsistent with tree topology, as in fig. 1(a). Specifically, we would like to find geodesic deformations in which the tree topology changes when we have such edge matchings, for instance as in fig. 1(b).
2.1 Representation of trees
In this paper we shall work with two different treespaces: A treespace , which is the space of all trees of a certain size, and a subspace , which is a restricted space of trees, whose exact definition is flexible (see definition 10). The large treespace is the natural space for geometric trees. However, the available mathematical tools only allow us to prove our results locally, where the locality assumptions become very strict in . Using a set of natural assumptions, we can restrict to a treespace where our results hold in larger regions of treespace. This is discussed in detail in section 3.6. We also believe that our results hold in , as described in conjecture 3.6.
In this paper, a ”treeshape” is an embedded tree in or , and consists of a series of edge embeddings, glued together according to a rooted combinatorial tree. Treeshapes are invariant to translation, but our definition of treeshape does not remove scale and rotation. However, treeshapes can always be aligned with respect to scale and rotation prior to comparison, if this is important.
Any treelike shape is represented as a pair consisting of a rooted, planar tree with edge attributes . In , is the vertex set, is the edge set, and is the root vertex. The tree describes the treeshape topology, and the attributes describe edge shape, as illustrated in fig. 3. The shape attributes, represented by a point in the product space , is a concatenation of edgewise attributes from an attribute space . The attributes could, e.g., be edge length, landmark points or edge parametrizations. In this work, we mostly use open curves translated to start at the origin, described by a fixed number of landmark points along the edge. Thus, throughout the paper, the attribute space is where or and is the number of landmark points per edge. Collapsed edges are represented by a sequence of origin points. In some illustrations, we shall use scalar attributes for the sake of visualization, in which case .
In order to compare trees of different sizes and structures, we need to represent them in a unified way. We describe all shapes using the same tree to encode tree topology. By choosing a sufficiently large , we can represent all the trees in our dataset by filling out with empty (collapsed) edges. We call the maximal tree.
We model treeshapes using binary maximal trees . Treeshapes which are not binary are represented by the binary tree in a natural way by allowing constant, or collapsed, edges, represented by the zero scalar or vector attribute. In this way an arbitrary attributed tree can be represented as an attributed binary tree, see fig. 4. This is geometrically very natural. Binary trees are geometrically stable in the sense that small perturbations of a binary treeshape do not change the treetopological structure of the shape. Conversely, a trifurcation or higherorder vertex can always be turned into a series of bifurcations sitting close together by an arbitrarily small perturbation. In our representation, thus, trifurcations are represented as two bifurcations sitting infinitely close together, etc.
Trees embedded in the plane have a natural edge order induced by the leftright order on the children of any edge. We say that a tree is ordered whenever each set of sibling edges in the tree is endowed with such a total order. Conversely, an ordered combinatorial tree always has a unique, implicit embedding in the plane where siblings are ordered from left to right. For this reason we use the terms ”planar tree” and ”ordered tree” interchangingly. We initially study metrics on the set of ordered binary trees; later we use them to induce distances between unordered trees by considering all possible orders. This allows us to model trees in . Considering all orders leads to potential computational challenges, which are discussed in section 4.
In order to build a space of treeshapes, fix an ordered maximal binary tree with edges , which encodes the connectivity of all our trees. Any attributed tree is now represented by a point in , where the coordinate describes the shape of the edge . Since we allow zeroattributed edges, as discussed above, some treeshapes will be represented by several points in (fig. 4). As a result, some natural treedeformations are not found as continuous paths in . In figs. 4 and 5, the paths in corresponding to the indicated deformations require a ”teleportation” between two representations of the intermediate treeshape. We tackle this by using a refined treeshape space , where different representations are identified as being the same point. The original space is called the tree preshape space, analogous to Kendall’s terminology [17].
2.2 The singular space of ordered treeshapes
We go from preshapes to shapes by identifying those preshapes which define the same shape.
Consider two ordered treeshapes with collapsed edges. Replace their binary representations by collapsed representations, where the zero attributed edges have been removed. The orders of the original trees induce welldefined orders on the collapsed trees. We say that two ordered treeshapes are the same when their collapsed ordered, attributed versions are identical, as in fig. 4. Tree identifications come with an inherent bijection of subsets of the edge set : If we identify , denote by
(2)  
(3) 
the sets of noncollapsed edges with nonzero attributes. The identification of and is equivalent to an order preserving bijection , identifying those edges that correspond to the same edge in the collapsed treeshape. Varying the attributes , spans a family of treeshapes with fixed topology and several representatives. Thus, the edge sets and induce linear subspaces
(4)  
(5) 
of where, except for on the axes, the topological tree structure is constant. The treeshapes represented in are exactly the same as those represented in . The bijection induces a bijection given by , which connects each representation to the representation of the same shape. Note that the are spanned by axes in .
Define a map for each pair of identified treestructures, and form an equivalence on by setting for all and . For each , is the equivalence class . The quotient space
(6) 
of equivalence classes is the space of ordered treelike shapes.
Quotient spaces are standard constructions from topology and geometry, where they are used to glue spaces together [4, chapter 1.5]. The geometric interpretation of the identification in the treespace quotient is that we fold and glue the preshape space space along the identified subspaces; i.e., when , we glue the two points and together. See fig. 5 for an illustration.
2.3 Metrics on the space of ordered trees
Given a metric on the Euclidean preshapespace , we induce the standard quotient pseudometric [4] on the quotient space by setting
(7) 
This corresponds to finding the optimal path from to , consisting of any number of concatenated Euclidean lines, passing through identified subspaces, as shown in fig. 5. It is clear from the definition that the distance function is symmetric and transitive. It is, however, an infimum, giving a risk that the distance between two distinct treeshapes is zero, as occurs with some intuitive shape distance functions [25]. This is why is called a pseudometric, and it remains to prove that it actually is a metric; i.e., that implies .
We prove this for two specific metrics on , which come from two different ways of combining individual edge distances: The metrics and on are the norms
(8)  
(9) 
From now on, and will denote either the distance functions and , or and . We prove the following theorem in section 2.6:
Theorem
The distance function is a metric on , which is a contractible, complete, proper geodesic space.
Thus, given any two trees, we can always find a geodesic between them in both metrics and . ^{1}^{1}1It can be shown that for any metric on , the induced pseudometric on is a metric.
We may often want to restrict to a subset of the large treespace.
Definition 10 (Restricted treeshape space)
Consider a subset , which only contains all representations of trees of certain restricted topologies, defined by collapsed subtrees of the maximal tree . The collapsed subtree of is characterized by a subset consisting of the edges in the maximal tree which are not collapsed. Associated to it is a linear subspace containing representations of all the trees of this particular topology. We include all representations of each tree topology, and obtain a restricted preshape space
containing all the trees that have of one of the considered topologies. The equivalence relation on restricts to an equivalence relation on , from which we obtain a restricted treeshape space . The metric on induces a metric on which induces a quotient pseudometric on .
Example 11
Denote by the space of all trees in with leaves, now is the space of treeshapes with leaves.
Remark 12
Note that the space of treeshapes on leaves in i) is different from the BilleraHolmesVogtmann (BHV) space [3] of trees with vector attributes, because in the BHV space, geodesics will always deform leaves onto leaves, whereas in , leaves can be transformed to nonleaf branches by a geodesic.
Even in the restricted treeshape space, we obtain an induced metric:
Corollary
The pseudometric on is a metric, and a contractible, complete, proper metric space.
Proof.
First, we show that the pseudometric is a metric. The pseudometric in equation (7) defines the distance as the infimum of lengths of paths in connecting and . Any path in is also a path in , so if , then as well, so .
2.4 From ordered to unordered trees
The world is not twodimensional, and for most applications it is necessary to study embedded trees in . The main difference from the ordered case is that trees in have no canonical edge order. The leftright order on children of planar trees gives an implicit preference for edge matchings, and hence reduces the number of possible matches. When we no longer have this preference, we consider all orderings of the same tree and choose orders which minimize the distance.
We define the space of (unordered) treelike shapes in as the quotient , where is the group of reorderings of the maximal binary tree . The metric on induces a quotient pseudometric on . Again, we can prove:
Theorem
For induced by either or , the function is a metric and the space is a contractible, complete, proper geodesic space.
The same result holds for restricted treespaces with several restricted tree topologies: Let be a subspace containing only trees of certain restricted topologies, as in definition 10, which is saturated with respect to the reordering group^{2}^{2}2 is saturated if, for each tree topology appearing in , all reorderings of the same tree topology also appear in . Equivalently, , where . For the corresponding restricted treespace , the quotient pseudometric is a metric and the space is a contractible, complete, proper geodesic space.
Note that is a finite group, which means that is locally wellbehaved almost everywhere. In particular, off fixedpoints for the action of reorderings on , the projection is a local isometry, i.e., it is distance preserving within a neighborhood. Hence, the geometry from is preserved off the fixed points. Geometrically, a fixed point in is an ordered treeshape where a reordering of certain branches does not change the ordered treeshape; that is, some pair of sibling edges must have the same shape attributes. In particular, the fixed points are nongeneric because they belong to the lowerdimensional subset of where two sibling edges have identical shape. Theorem 2.4 can be proved using standard results on compact transformation groups along with similar techniques as for theorem 2.3.
While considering all different possible orderings of the tree is easy from the point of view of geometric analysis, in reality this becomes a computationally impossible task when the size of the trees grow beyond a few generations. In real applications we can, however, efficiently reduce complexity using heuristics and approximations, as discussed in section 4.
2.5 Geometric interpretation of the metrics
It follows from the definition that the metrics and coincide with the classical tree edit distance (TED) metric for ordered and unordered trees, respectively. In this way the abstract, geometric construction of treespace gives a new way of viewing the intuitive TED algorithm.
The metrics and are descents of the Euclidean metric on , and geodesics in this metric are concatenations of straight lines in flat regions. In section 3.7 we compare the two metrics using examples.
2.6 Proof of theorem 2.3
We now pass to the proof of theorem 2.3. The rest of the article is independent of this section, and during the first read, the impatient reader may skip to section 3. However, while the proof is technical, theorem 2.3 is a fundamental building block for the shape space framework. We shall assume that the reader has a good knowledge of metric geometry or general topology [4, 7]. It is crucial for the proof that we are only identifying subspaces of the Euclidean space which are spanned by Euclidean axes, and these are finite in number. This induces a wellbehaved projection , which carries many properties from to .
2.6.1 Precise shape space definition
We say that ordered treeshapes whose (collapsed) ordered structure is the same, belong to the same combinatorial treeshape type. For each combinatorial type of ordered treeshape () which can be represented by collapsing edges in the maximal tree , there is a family of subsets of , which induce that particular type when we endow the edges in () with nonzero attributes and leave all other edges with zero attributes. These subsets are characterized by the properties

the cardinality for all , .

there is a depthfirst order on each induced by the depthfirst order on , such that the ordered, combinatorial structure defined by any coincides with that defined by .
That is, the subset for any lists the set of edges in which have nonzero attributes for the representation of any shape of type . Corresponding to each is the linear subspace of given by
(13) 
and by condition b) we can define isometries by forgetting the zero entries in and keeping the depthfirst coordinate order. We generate the equivalence on by asking that whenever for some . We now define the space of ordered treelike shapes as the quotient , and define the quotient map .
2.6.2 The pseudometric is a metric
It is clear from the definition that the distance function defined in (7) is symmetric and satisfies the triangle inequality, which makes it a pseudometric.
Proposition
Let denote or . The pseudometric is a metric on .
Proof.
It suffices to show that implies (). Moreover, it is also easy to show that for any , so it suffices to show that implies . Hence, from now on, write for , and assume that for two treeshapes and .
Choose such that
(14) 
that is, is smaller than the size of any of the nonzero edges in and .
We may assume that since otherwise we may assume by symmetry that and .
Denote by the image of under the quotient projection for any .
We may assume that and belong to the same identified subspace; that is, there exist such that
(15) 
since otherwise,
(16) 
Since is a finite set, and is a closed set, (16) implies
(17) 
In this case, the path will have to go through some which does not contain points equivalent to , and
(18) 
since in order to reach , we need to remove edge attributes which are nonzero in , and for all . Thus, eq. 15 holds, and in fact, it holds for all the intermediate path points from eq. 7.
But if the path points stay in , then the path consists of shifting and changing the nonzero edge attributes of the trees in question. This will only give a sum if the trees are identical and the path is constant.
2.6.3 Topology of the space of treelike shapes
Here, we prove the rest of theorem 2.3, namely that the treeshape space is a complete, proper geodesic space, and is contractible. First, we note that although is not a vector space, there is a welldefined notion of size for elements of , induced by the norm on :
Lemma
Note that if , we must have ; hence we can define .
Proof.
The equivalence is generated recursively from the conditions whenever either , indicating ; or , indicating since the are isometries. Hence, the lemma holds by recursion.
We will prove that is a proper geodesic space using the HopfRinow theorem for metric spaces [4], which states that every complete locally compact length space is a proper geodesic space. A length space is a metric space in which the distance between two points can always be realized as the infimum of lengths of paths joining the two points. Note that this is a weaker property than being a geodesic space, as the geodesic joining two points does not have to exist; it is enough to have paths that are arbitrarily close to being a geodesic. It follows from [4, chapter I lemma 5.20] that is a length space for any metric on where is a metric.
To see that the treeshape space is locally compact, note that the projection is finitetoone, so any open subset of has as preimage a finite union of open subsets of , such that and is compact whenever is bounded.
We also need to prove that is complete:
Proposition
Let denote either of the metrics and . The shape space is complete.
The proof needs a lemma from general topology:
Lemma
[7, chapter XIV, theorem 2.3] Let be a metric space and assume that the metric has the following property: There exists such that for all the closed ball is compact. Then is complete.
Using the projection , we can prove:
Lemma
Bounded closed subsets of are compact.
Proof.
Since lemma 2.6.3 defines a notion of size in , any closed, bounded subspace in is contained in a closed ball in for some , where is the image . Since , it follows that , which is a closed and bounded ball in . Since closed, bounded subsets of are compact is compact. By continuity of , is compact. Then is compact too.
It is now very easy to prove proposition 2.6.3:
Proof (Proof of proposition 2.6.3).
Using the HopfRinow theorem [4, chapter I, proposition 3.7] we thus prove that is a complete, proper geodesic space. We still miss contractibility:
Lemma
Let be a normed vector space and let be an equivalence on such that implies for all . Then is contractible.
Proof.
Define a map by setting . Now is well defined because of the condition on , and so is a homotopy from to the constant zero map.
3 Curvature in the treeshape space
Having proved theorem 2.3, we may now pass to studying the geometry of the treeshape space through its geodesics. Uniqueness of geodesics and means is closely connected to the geometric notion of curvature, a concept which fundamentally depends on the underlying metric. Using methods from metric geometry [4, 13] we shall investigate the curvature of the treeshape space with the QED and TED metrics. Using curvature, we obtain wellposed statistical methods for QED.
The next theorem states that in the treeshape space endowed with the QED metric, any randomly selected point has a corresponding neighborhood within which the treespace has nonpositive curvature. We shall use this fact to show that datasets within that same neighborhood have unique averages.
Theorem

Endow with the QED metric . A generic point has a neighborhood in which the curvature is nonpositive. At nongeneric points, the curvature of is , or unbounded from above.

Endow with the TED metric . The metric space does not have locally unique geodesics anywhere, and the curvature of is everywhere.
Claims i) and ii) also hold in the subspace containing only trees of certain restricted topologies, as defined in definition 10.
Here, local uniqueness is defined as uniqueness within a sufficiently small neighborhood.
3.1 Genericity
Genericity is a key concept in this paper. Many of our results, e.g., uniqueness of means, do not hold in general, but they do hold for a randomly chosen dataset with respect to natural probability measures.
Definition 19 (Generic property)
A generic property in a metric space is a property which holds on an open, dense subset of .
In the treespaces and , one interpretation is that generic properties hold almost surely, or with probability one, with respect to natural probability measures. Thus, for a random treeshape, we can safely assume that it satisfies generic properties, e.g., that the treeshape is a binary tree. A nongeneric property is a property whose ”not happening” is generic. This is similarly interpreted as a property that may not hold for randomly selected treeshapes. A detailed discussion of the relation between genericity and probability is found in Appendix A.
One common misconception is that the term ”generic tree” refers to a particular class of trees with a particular generic property. It is important to note that many different properties, which do not necessarily happen on the same subsets of treespace, may all be generic at the same time. However, any finite set of generic properties will all happen on an open, dense subset.
Proposition
Treeshapes that are truly binary (i.e. their internal edges are not collapsed) are generic in the space of all treelike shapes.
Proof.
Let be a treeshape in or which is not truly binary, represented by a maximal binary tree . By adding arbitrarily small noise to the zero attributes on edges of , we obtain truly binary treeshapes which are arbitrarily close to . Thus, the set of full truly binary treeshapes in or is open and dense. Hence, truly binary treeshapes are generic both in and in .
The essence of the proposition is that binary treeshapes are generic, but that does not mean that nonbinary trees do not need to be considered! While nonbinary trees may not appear as randomly selected trees, they do appear in geodesics between randomly selected pairs of trees, as in fig. 6. Nonbinary treelike shapes also appear as samples in reallife applications, e.g., when studying airway trees. However, we interpret this as an artifact of resolution rather than true higherdegree vertices. For instance, airway extraction algorithms record trifurcations when the lengths of internal edges are below certain threshold values.
3.2 Curvature in metric spaces
In order to understand and prove theorem 3, we need a definition of curvature in metric spaces. In spite of its simplicity and elegance, this concept from metric geometry is novel in computer vision. We shall spend a little time introducing it before proceeding to prove theorem 3 in section 3.3.
Since general metric spaces can have all kinds of anomalies, the concept of curvature in such spaces is defined through a comparison with spaces that are well understood. More precisely, the metric spaces are studied using geodesic triangles, which are compared with corresponding comparison triangles in model spaces with a fixed curvature . The model spaces are spheres (), the plane () and hyperbolic spaces (), and through comparison with these spaces, we can bound the curvature of the metric space by . In this paper we shall use comparison with planar triangles, which gives us curvature bounded from above by .
Given a geodesic metric space , a geodesic triangle in consists of three points and geodesic segments joining them. A planar comparison triangle for the triangle consists of three points in the plane, such that the lengths of the sides in are the same as the lengths of the sides in , see fig. 7.
A space is a metric space in which geodesic triangles are ”thinner” than for their comparison triangles in the plane. That is, for any on the edge where is the unique point on the edge such that and . If the planar comparison triangle is replaced by a comparison triangle in the sphere or hyperbolic space of fixed curvature , we get a space.
A space has nonpositive curvature if it is locally , i.e., if any point has a radius such that the ball is . Similarly, define curvature bounded by as being locally .
Example 20

The space obtained by gluing a family Euclidean spaces together along isomorphic affine subspaces is a space. At any point in which is not a glued point, the local curvature is , since the space is locally isomorphic to the corresponding . At any glued point, it can be shown that the local curvature is .

The GPCA construction by Vidal et al. [40] defines a space, giving a potential use of spaces and metric geometry in machine learning.

The space of phylogenetic trees is a space [3].

As we are about to see, the space of treelike shapes is locally a space almost everywhere.
One of the main reasons why spaces (and in particular) are attractive, is due to the following result on existence and uniqueness of geodesics.
Proposition
[4, Proposition II 1.4] Let be a space. If , then all pairs of points have a unique geodesic joining them. For , the same holds for pairs of points at a distance less than .
More results on curvature in metric spaces can be found in the book by Bridson and Haefliger [4].
3.3 Curvature in the space of ordered treeshapes – proof of theorem 3
In this section we study the curvature of treeshape space using the theory of spaces. We show that at a generic tree, the shape space has bounded curvature. The results rest on the following theorem:
Theorem
At a generic point , the shape space is locally , and thus, has locally unique geodesics in a neighborhood of .
Proof.
Recall from section 2.2 how was formed by identifying subspaces , defined in eq. 4 and 5. These identified subspaces corresponded to different representations in of the same shape . The points in can now be divided into three categories:

points which do not belong to the image of an identified subspace because they only have one representative in ,
and two classes of points in which have more than one representative in .

The first class contains points at which the space is locally homeomorphic to an intersection of linear spaces, as in example 20 a). These points are images of points which belong to one single identified subspace .

The second class contains points whose preimages in are at the intersection of identified subspaces . An example of such points is the image of the origin in fig. 5. These points correspond to trees where, infinitely close to the same tree, we can find pairs of trees in between which geodesics are not unique.
These three classes of points correspond to local curvature , and . That is, the space is locally at points in categories i); at points from ii) it is for every , so has curvature ; and at points from iii) it is not for any ; hence the curvature is . It thus suffices to show that the points in category iii) are nongeneric, which follows easily from the fact that these must necessarily sit in a lowerdimensional subspace of .
The proof carries over to the subspace .
Definition 21 (Injectivity neighborhood)
We call a neighborhood of a point an injectivity neighborhood of .
Based on the above, we are now ready to prove:
Proof (Proof of theorem 3).

The QED case: Since is locally at generic points , the curvature of is nonpositive in a neighborhood of . At points , however, we will always find pairs of points arbitrarily close to with two geodesics joining them, just as in fig. 5.

The TED case: Consider a treeshape , represented by a point . Induce a second treeshape represented by , where such that and have one nonzero coordinate, found in different edges, which are both nonzero edges in . The topology of is the same as of . For any , we can find TED geodesics from to , where can be decomposed as . Thus, there are infinitely many TED geodesics from to , and does not have locally unique geodesics anywhere. As a consequence, its curvature is unbounded everywhere [4, proposition II 1.4].
The practical meaning of theorem 3 is that a) we can use techniques from metric geometry to search for QED averages, b) as we are about to see, for datasets contained in an injectivity neighborhood, there exist unique means, centroids and circumcenters for the QED metric, and c) the same techniques cannot be used to prove existence or uniqueness of prototypes for the TED metric, even if they were to exist. In fact, any geometric method which requires bounded curvature [16, 4, 3] will fail for the TED metric. This motivates our study of the QED metric.
3.4 Curvature in the space of unordered treeshapes
It is easy to prove that the same results also hold for unordered treeshapes:
Theorem
The space of unordered trees with the QED metric is generically nonpositively curved. With the TED metric , however, has everywhere unbounded curvature, geodesics are nowhere locally unique and neither are any of the types of average tree discussed in this paper. The same holds in the subspace , defined in theorem 2.4.
3.5 Means and related statistics for treelike shapes
In this section we use what we learned in the previous section to show that, given the QED metric on a space of treelike shapes, we can find various forms of average shape in the space of ordered treelike shapes, assuming that the data lie within an injectivity neighborhood.
There are many competing ways of defining central elements given a subset of a metric space. We discuss several: the circumcenter considered in [4], the centroid considered, among other places, in [3], and the mean [16].
The problem of existence and uniqueness of averages can be attacked using convex functions. Recall that a function is convex if for all and . If we can replace with whenever , then is strictly convex. Convex functions have minimizers, which are unique for strictly convex functions. Hence, existence and uniqueness of averages can be proven by expressing them as minimizers of strictly convex functions.
We say that a function on a geodesic metric space is (strictly) convex if for any two points and any geodesic from to , the function is (strictly) convex. We shall make use of the following standard properties of convex functions:
Lemma

If and are both convex, is monotonous and increasing, and is strictly convex, then is strictly convex.

If and are both convex, then is also convex. If either or is strictly convex, then is strictly convex as well.
The mean of a finite subset in a metric space is defined as in eq. 1, and is called the Frèchet mean. Local minimizers of eq. 1 are called Karcher means.
The following result follows from a more general theorem by Sturm [34, Proposition 4.3]; we include the basic version of the proof here for completeness.
Theorem
Means exist and are unique in spaces.
Proof.
The function given by is convex for any fixed by [4, Proposition II.2.2], so the function is strictly convex by lemma 3.5 i). But then is strictly convex by lemma 3.5 ii), and a mean is just a minimizer of the function . The function is coercive, so the minimizer exists. Since is strictly convex, the minimizer is unique.
We also consider two other types of statistical ”prototypes” for a dataset, namely circumcenters and centroids. These are both wellknown in the context of metric geometry and spaces.
Definition 22

Circumcenters. Consider a metric space and a bounded subset . There exists a unique smallest closed ball in which contains ; the center of this ball is the circumcenter of .

The centroid of a finite set. Let be a uniquely geodesic metric space (a metric space where any two points are joined by a unique geodesic). The centroid of a set of elements is defined recursively as a function of the centroids of subsets with elements as follows: Denote the elements of by . If contains two elements, , the centroid of is the midpoint of the geodesic joining and . If , define , which is a set with elements. Similarly, for larger , . All these sets have elements. If the elements of converge to a point as , then we say that is the centroid of in .
Based on the theory of spaces and our results for means, we have for the set of treelike shapes:
Theorem
Endow with the QED metric . A generic point has a neighborhood such that sets contained in have unique means, centroids and circumcenters.
The same statistical properties also hold for the QED metric on unordered treeshapes: Generic points in the space of unordered treeshapes with the QED metric have neighborhoods within which means, circumcenters and centroids exist and are unique.
For the TED metric, these are not unique.
Proof.
First consider and with the QED metric. By theorems 3 and 3.4, and are both locally spaces, and by theorems 2.3 and 2.4 they are both complete metric spaces.
We have seen in theorem 3.5 that means exist and are unique in spaces, so the statement holds for means. By [4, proposition 2.7], any subset of a complete space has a unique circumcenter. Hence, the statement holds for circumcenters. Similarly, by [3, theorem 4.1], finite subsets of spaces have centroids (unique by definition), so the statement holds for centroids.
We turn to the TED metric. By definition, for any point dataset, all these notions of mean reduce to finding the midpoint of a geodesic connecting the two points. We know that geodesics and midpoints are not unique in the TED metric. This ends the proof.
3.6 The injectivity neighborhood
We have shown that the local curvature is nonpositive almost everywhere in , which makes well suited for geometric definitions of statistical properties. However, our notion of ”almost everywhere” is strongly tied to (maximal) dimensionality, which again is strongly tied to the topological structure of the maximal tree . One consequence is that the injectivity neighborhoods in are rather small, as we are about to see through examples. In this section we impose natural constraints on treespace that allow us to increase the size of the injectivity neighborhoods, and make a conjecture for future expansion beyond the use of spaces.
Consider the following two examples; we thank the anonymous reviewer for the first!
Example 23

Consider the treeshapes and shown in fig. 8, left. These two treeshapes are joined by two geodesics, and thus and are not contained in the same injectivity neighborhood in . However, in a suitably chosen , they can be.

Consider the treeshape in the space of unordered treeshapes with attributes in , spanned by the maximal tree as in fig. 9, right. Arbitrarily close to we will find two trees and which are joined by two geodesics, as shown in the figure.
As a illustrated by these examples, the injectivity neighborhoods in the shapespace can be very small, mainly containing trees whose topology is the same as . However, we can obtain much larger injectivity neighborhoods by restricting to the natural subspaces and as in definition 10. As shown in theorem 3.5, and have the same nice geometric properties as and , and can be chosen so that the injectivity neighborhoods are bigger than in and by avoiding situations as in the examples above. Radius is not a good measure for the size of an injectivity neighborhood, as a tree may contain both small branches, which do not have much room to vary, as well as large branches, which will be allowed to move more throughout such a neighborhood. However, any convex neighborhood which does not contain points of curvature , will be an injectivity neighborhood.
All the results from section 3.5 hold at generic points, within an injectivity neighborhood where the property holds. We have seen examples of points where local property must fail, illustrating the limitaitons of techniques for analyzing general trees. However, most of the situations where the property fails are highly nongeneric, and we conjecture that more general results can be proven:
Conjecture
For a generic set of points in , , or , means exist and are unique.
3.7 Comparison of QED and TED
As we have seen in theorems 3, the QED metric gives locally nonpositive curvature at generic points, while the TED metric gives unbounded curvature everywhere on and . Equivalently, geodesics are locally unique almost everywhere in the QED metric, while nowhere locally unique in the TED metric. As emphasized by theorem 3.5, this means that we cannot imitate the classical statistical procedures on shape spaces using the TED metric, while for the QED metric, we can.
Note, moreover, that the QED metric is the quotient metric induced from the Euclidean metric on the preshape space , making it the natural choice of metric seen from the shape space point of view.
From a computational point of view, the TED metric has nice localtoglobal properties. If the trees and are decomposed into subtrees and as in fig. 10, such that the geodesic from to restricts to geodesics between and as well as and , then . Many dynamic programming algorithms for TED use this property, and the same does not hold for the QED metric, due to the square root.
For a qualitative comparison of QED and TED, we compare the geodesics defined by the TED and QED metrics between small, simple trees. Consider the two treepaths in fig. 10, where the edges are endowed with scalar attributes describing edge length. Computing the costs of the two different paths in both metrics, we find the shortest (geodesic) path.
Path indicates a matching left and right side edges and , while Path does not make the match. The cost of Path is in both metrics, while the cost of Path is in the QED metric and in the TED metric. Thus, TED will identify the and edges whenever , while QED makes the identification whenever . Thus, TED will be more prone to internal structural change than QED.
The same occurs in the comparison of TED and QED matching in figs. 11(a) and 11(b). Although the TED is more prone to matching trees with different treetopological structures, the edge matching results are similar, as is expected, since the metrics are closely related.
4 Computation and complexity
In general, computational complexity is a problem for both TED and QED. Computing TED distances between unordered trees is NPcomplete [43], and we conjecture that the QED metric is NPcomplete as well. We can, however, often use geometry and prior knowledge (e.g., anatomy) to find efficient approximations. Trees appearing in applications are often not completely unordered, but are semilabeled.
Example 24 (Semilabeling of the upper airway tree)
Most airway trees have similar, but not necessarily identical, topological structure, where several branches have names and can be identified by experts.
The top generations of the airway tree serve very clear purposes in terms of anatomy. The root edge is the trachea; the second generation edges are the left and right main bronchi; the third generation edges lead to the lung lobes. As these are easily identified, we find a semilabeling of the airway tree, and use it to simplify computations of interairway geodesics in section 5.2.
4.1 QED approximation and implementation
Since the decomposition strategy used for dynamic programming in TED is not available for QED, an approximation of QED is used in our experiments. Many anatomical trees have a somewhat fixed overall structure. For these, it is safe to assume that the number of internal structural transitions found in a geodesic deformation is low, and that the geodesics pass through identified subspaces of low codimension. The latter assumption is equivalent to assuming that the trees appearing throughout the geodesic deformation have nodes of low degree. For instance, for the airway trees studied in section 5.2 below, we find empirically that it is enough to allow for one structural change in each lobar subtree, which has at most a trifurcation. Recall the definition of the metric from eq. 7; the approximation consists of imposing upper bounds on the number and on the degrees of internal vertices appearing in eq. 7, respectively, giving:
(25) 
where all and have vertex degrees at most . Geometrically, is the number of Euclidean segments concatenated to form the geodesic. Bounding is equivalent to bounding the number of internal topological transitions throughout the geodesic. In fact, the number of internal topological transitions in the geodesic.
All edges are translated to start at , and represented by a fixed number of landmark points (in our case ) evenly distributed along the edge, the first at the origin. The distance between two edge attributes is the Euclidean distance .
We approximate of the QED metric using Algorithm 1; see Appendix B in the supplemental material for details. The complexity of Algorithm 1 is
where is the number of internal vertices, is the bound on , is the bound on vertex degree and is the optimization in line . For the unordered airway trees in section 5.2, we combine Algorithm 1 with a complete search through the set of branch orderings. Computing QED distances through a complete search is not optimal, and finding more exact and efficient algorithms is a nontrivial research problem.