Polyhedral computational geometry for averaging metric phylogenetic trees

# Polyhedral computational geometry for averaging metric phylogenetic trees

## Abstract

This paper investigates the computational geometry relevant to calculations of the Fréchet mean and variance for probability distributions on the phylogenetic tree space of Billera, Holmes and Vogtmann, using the theory of probability measures on spaces of nonpositive curvature developed by Sturm. We show that the combinatorics of geodesics with a specified fixed endpoint in tree space are determined by the location of the varying endpoint in a certain polyhedral subdivision of tree space. The variance function associated to a finite subset of tree space has a fixed algebraic formula within each cell of the corresponding subdivision, and is continuously differentiable in the interior of each orthant of tree space. We use this subdivision to establish two iterative methods for producing sequences that converge to the Fréchet mean: one based on Sturm’s Law of Large Numbers, and another based on descent algorithms for finding optima of smooth functions on convex polyhedra. We present properties and biological applications of Fréchet means and extend our main results to more general globally nonpositively curved spaces composed of Euclidean orthants.

## Introduction

The development of statistical methods for studying phylogenetic trees, and in particular the search for meaningful notions of consensus tree for phylogenetic data, has been of considerable importance in biology for four decades. Starting with the problem as posed by Adams [1], a great deal of research has been done, and a myriad of definitions proposed, relating to consensus trees in phylogenetics; see [13] for an excellent overview. The problem has been confounded by the combinatorial nature of the trees themselves. According to Cranston and Rannala [18], “Phylogenetic inference has long been troubled by the difficulty of performing statistical analysis on tree topologies. The topologies are discrete, categorical, and non-nested hypotheses about the species relationships. They are not amenable to standard summary analyses such as the calculation of means and variances and cause difficulties for many traditional forms of hypothesis testing.” Other papers share concerns about issues such as these [9, 26].

The introduction by Billera, Holmes, and Vogtmann of phylogenetic tree space [12] opened statistical analysis of tree-like data to a wide and computationally tractable variety of techniques [27]. Tree space, with its geodesic distance, is a globally nonpositively curved (abbreviated to global NPC) space, and as a result it has convexity properties that imply uniqueness of means as well as other important statistical and geometric objects, while also giving a framework for effective computational methods to calculate these objects. One of the major uses of the convexity properties was the discovery by Owen and Provan [41] of a fast algorithm for computing geodesics in this space (see Section 1 for this algorithm as well as the background tree space geometry necessary to state it). Chakerian and Holmes [17] subsequently showed that phylogenetic tree space provides an excellent platform for implementing several distance-based statistical techniques, and Nye [37] has shown how this space can be used to perform principal component analysis on tree data.

Perhaps the two most fundamental concepts of interest in statistical analysis of data are that of sample mean (or average) and its associated variance. The basic goal of this paper is to demonstrate the computational effectiveness of certain notions of statistical mean and variance for probability distributions on tree space. The average that we use is the Fréchet mean, or barycenter: the point in tree space that minimizes its sum of squared geodesic distances to the sample points (Section 2). Our decision to use this definition is motivated by work of Sturm [48], who identified the Fréchet mean as a theoretically rich statistical object associated with sampling from a specified distribution on a global NPC space (see Theorem 2.4). Fréchet means in tree space and the algorithm for computing them that arises from Sturm’s work (Algorithm 2.5) have been independently developed by Bačák [8].

Our principal theoretical contribution lies in the discovery of polyhedral structure governing the variation of geodesics in tree space as one endpoint varies (Section 3). To be more precise, if is a fixed point in tree space, then in appropriate coordinates on tree space, the set of points whose geodesics to  share the same combinatorics comprise a convex polyhedral cone called a vistal cell (Theorem 3.25), and the vistal cells constitute a polyhedral subdivision of tree space (Theorem 3.30). This metric combinatorics also arises in single source shortest path queries (see [35] for a survey), and has direct roots in surprisingly similar statements for boundaries of convex polyhedra [34], the parallel being unexpected because boundaries of convex polyhedra are positively curved, in contrast to the negative curvature of tree space. However, polyhedrality of the subdivision is generally not encountered outside of the planar or positively curved cases, and thus is completely unexpected here; see Example 3.2 for a hint of the complexity that can occur even for global NPC cubical complexes.

Metric combinatorics of tree space, particularly its polyhedral nature, combines with generalities on nonlinear optimization in NPC spaces to give a second iterative method converging to the mean (Algorithm 4.4) via descent procedures. The crucial observations are that the variance function has a unique local minimum on tree space, is continuously differentiable on each Euclidean orthant in tree space, and has a simple algebraic formula within the interior of each vistal cell.

Means in tree space have subtle, sometimes peculiar properties that inform our particular motivations (Section 5), which come primarily from biological and medical applications, although we expect these observations to impact other fields where distributions of metric trees naturally appear. Evolutionary biology, for instance, considers actual phylogenetic trees, each representing a putative evolutionary history of a set species or genes (Example 5.5). In medical imaging, trees can represent blood vessels in human brain scans [47] or lung airway trees [31], for example.

Some of the theory in Sections 14 extends to arbitrary global NPC spaces, and all of it extends to global NPC orthant spaces (Section 6). For the first iterative procedure (Algorithm 2.5) and the rest of Section 2, as well as for the shortest path combinatorics in Section 1, this means working in arbitrary global NPC spaces (Sections 6.16.2). For the second iterative procedure (Algorithm 4.4) and the rest of Section 4, as well as for the metric combinatorics in Section 3, this means working in piecewise Euclidean global NPC spaces that are formed by gluing orthants together by rules similar to — but substantially more general than — those defining tree space (Section 6.3). The extensions suggest exciting new research in applying both statistical methods and numerical nonlinear programming techniques to a wide variety of problems. Important note: readers interested in the generality of abstract orthant spaces or arbitrary NPC spaces are urged to begin with Section 6, which sets up the notation and concepts in Sections 14 from that perspective. Hence such readers can avoid checking the proofs in the earlier sections twice.

Acknowledgements Our thanks go to Michael Turelli and Elen Oneal for help with references and discussions on biological applications, to Antonis Rokas for kindly providing the yeast data set, and to Dennis Barden for comments on a draft of the paper. EM had support from NSF grants DMS-0449102 = DMS-1014112 and DMS-1001437. MO was partially supported by a desJardins Postdoctoral Fellowship in Mathematical Biology at University of California Berkeley and by the U.S. National Science Foundation under grant DMS-0635449 to the Statistical and Applied Mathematical Sciences Institute (SAMSI). Much of this research was facilitated by and carried out at SAMSI as an outgrowth of the 2008–2009 program on Algebraic Methods in Systems Biology and Statistics.

## 1 Tree space and the geodesic algorithm

In this section, we describe the space of phylogenetic trees introduced by Billera, Holmes, and Vogtmann [12], as well as a distance and characterization of geodesics in this space.

### 1.1 Phylogenetic tree space

A phylogenetic -tree , or simply an -tree, is an acyclic graph with edge set whose leaves (degree 1 nodes) are labeled with index set , and whose interior vertices have degree at least 3. (The label 0 is often referred to as the root of , although that is not relevant in this paper.) The maximum number of edges in an -tree is . Each edge of is assigned a nonnegative length , or in case the ambient tree is clear. Removal of any edge from determines a unique partition of the leaves of into two subsets and ; the pair is called the split associated with . A key property of splits in trees is that the splits and of any pair of edges and are compatible, that is, one of the sets , , , or is empty. A set of splits is called compatible if every pair of splits in is compatible. It turns out [46, Theorem 3.1.4] that any compatible set of splits on corresponds to a unique tree, and so from now on we identify a tree by simply giving the splits and edge lengths for each edge in .

A tree can have an edge whose associated length is . This corresponds to the edge having been contracted in . Denoting the set of edges of  with nonzero length by  allows the identification between two trees and  whenever (i)  and (ii) their nonzero edge lengths are equal.

###### Example 1.1.

Two 5-trees are depicted in Figure 1. For simplicity, we only give the splits and edge lengths for the three internal edges in each tree. The six internal edges are distinct since they have different splits, and the splits within each tree are compatible. The only compatible pairs between the two trees, however, are , , and .

The tree space introduced by Billera, Holmes, and Vogtmann [12] is the space of all phylogenetic -trees. It is obtained by representing each tree on edge set by a vector in the Euclidean orthant , whose coordinate values are equal to the corresponding lengths of the edges of . As above, trees and are identified between orthants whenever the associated trees satisfy . This makes a union of -dimensional orthants —called maximal orthants —whose interiors are disjoint and which are identified along their boundaries through the equivalence given above. A path in is the image of a continuous map . The Euclidean length of a path in is the sum of the Euclidean lengths of its restrictions to the maximal orthants. This length endows with the metric  in which is the infimum of the Euclidean lengths of the paths from to . Note that , since the space is path-connected: any two points can be joined by straight line segments through the origin.

### 1.2 Geodesics in tree space

Billera, Holmes, and Vogtmann [12] show that tree space is globally non-positively curved (a global NPC space), equivalently known in this context as CAT(0). Among other things, this implies that shortest paths in tree space are unique, so they are unambiguously referred to as geodesics. This section summarizes the key results of [39] and [41], which investigate the structure of geodesics in tree space and provide an -algorithm — the GTP algorithm —to find shortest paths. For notation, if is a tree with edge set  and , then we write

 ∥A∥T=√∑e∈A|e|2T

and use if the tree is clear. This means that whenever .

We express a geodesic with endpoints and as a parameterized curve with , , and for all . If an edge lies in both and , then it lies in every tree on the path , with length uniformly changing between the two terminal values [12, Section 4.2]. We therefore focus first on the case when and have no internal edges in common, and ignore the lengths of the pendant edges (those containing leaves) in the distance computation.

Each geodesic in tree space is a sequence of straight line segments, called legs, because tree space is piecewise Euclidean. Each leg is contained within a single orthant , where and . The precise properties of the sets and making up these legs were determined in [39]. In particular, define the support of a geodesic to consist of a pair consisting of a partition of and a partition of such that the following property holds:

1. for each , the union is compatible.

The geodesic has legs in , where

 Ei =Ai+1∪⋯∪Ak and Fi =B1∪⋯∪Bi.

The individual pairs are the support pairs for the geodesic.

Whether the shortest piecewise-linear path having these legs actually forms the geodesic between and is determined by the following two properties for .

1. . This is called the ratio sequence for .

2. For all and partitions of and of such that is compatible, the inequality holds.

The properties (P1)–(P3) determine the geodesic between and , as well as the algebraic description of this geodesic given in Theorem 2.4 in [41].

The case where and have a nonempty set  of common edges was addressed in [41, Section 4]: remove the common edges between and from each tree, and then find the paths between the remaining disjoint forests, matching trees by their leaf sets. The common edges are then placed into the path with the length of each such edge being

 (1−λ)|e|X+λ|e|T. (1)

This also allows pendant edges to be taken into account.

To be able to work more easily with trees having common edges, we extend Theorem 2.4 in [41] to the case where and have common edges, and in the process simplify the description of the geodesic considerably. To do this, we use the following three important conventions.

1. An edge is never compatible with itself; thus the pairs of identical edges in and  must appear in the same support pair .

2. for any set of edges of in common with .

3. We extend the notation for support pair by adding the additional sets

 A0=B0=Ak+1=Bk+1=∅

and define and .

With these conventions we can restate the unified result.

###### Theorem 1.2.

Let and be any two trees in (not necessarily disjoint), and let be a support for and satisfying (P2) and (P3). The unique geodesic from to  has legs

 γi={γ(λ):∥Ai∥∥Bi∥≤λ1−λ<∥Ai+1∥∥Bi+1∥} for i=0,…,k, (2)

The points on each leg are associated with the tree having edge set

 B1∪⋯∪Bi∪Ai+1∪⋯∪Ak

and edge lengths

 |e|Ti=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩(1−λ)∥Aj∥−λ∥Bj∥∥Aj∥|e|Xif e∈Ajλ∥Bj∥−(1−λ)∥Aj∥∥Bj∥|e|Tif e∈Bj. (3)

The length of is

 L(γ)=∥∥(∥A1∥+∥B1∥,…,∥Ak∥+∥Bk∥)∥∥. (4)
###### Proof.

The presentation in this theorem matches that of the original Theorem 2.4 in [41] except for the treatment of the common edges of and . Consider any edge common to and . The definition of a support ensures that lies in both and for some . Further, by convention the ratio is negative, so (P2) is never satisfied unless all of the common edges are placed at the front of the ratio sequence. This also means that for any , each common edge is contained in some for the computation of its edge length at that point along the geodesic. Furthermore, since the common edges are mutually compatible with each other, they are placed in different support pairs whenever the ratios differ. It follows that the common edges are always grouped in support pairs having for any in that support pair.

Now consider the length of a common edge in leg of the path. By (3),

 |e|Ti =λ∥Bj∥−(1−λ)∥Aj∥∥Bj∥|e|T=(λ−(1−λ)∥Aj∥∥Bj∥)|e|T =(λ+(1−λ)|e|X|e|T)|e|T=λ|e|T+(1−λ)|e|X,

which matches (1).

Next look at the term in (4) corresponding to a support pair of common edges:

 (∥Aj∥+∥Bj∥)2 =(∥Aj∥∥Bj∥+1)2∥Bj∥2=∑e∈Bj(1−|e|X|e|T)2|e|2T =∑e∈Bj(|e|T−|e|X)2.

Summing this over all such pairs yields

 ∑e∈C(|e|T−|e|X)2,

where is the set of common edges. This matches the expression given in [41, Section 4].

Finally, take the case where an edge lies in only one of the sets and , but is compatible with all edges in the other set. Intuitively, we can think of adding to the other set with length , and treatin these as common edges. Formally, if lies in , then it appears in a support pair with a set of edges compatible with all of ; and if lies in , then it appears in a support pair with a set of edges compatible with all of . Since the ratios of these pairs is either or , respectively (since ), these pairs appear before and after any nontrivial pairs, respectively. Further, the edge component values and path length are as indicated in (3) and (4), respectively. This completes the proof of the theorem. ∎

###### Example 1.3.

Figure 2 shows an example of the geodesic between the trees and in Figure 1 in Example 1.1. The associated support for has and , and the coordinates of seven equally spaced trees in are given in the table. The length of this path, as given by (4), is

 L(γ)=∥∥(||{e2,e3}||+||{e6}||,||{e1}||+||{e4,e5}||)∥∥=15√2.

We end the section by giving a canonical representation for any geodesic.

###### Lemma 1.4.

Any geodesic can be represented by unique support satisfying

 ∥A1∥∥B1∥<∥A2∥∥B2∥<⋯<∥Ak∥∥Bk∥. (5)

This support is called the minimal support.

###### Proof.

This is the content of the remark in [41, Section 2.3]. The basic argument is as follows. Any support of form (5) results in a different geodesic, since by (2) they have different legs. On the other hand, for any representation of having equalities in the ratio sequence, combine the respective sets in every equality subsequence. The resulting support continues to satisfy (P2), and hence there is a shortest piecewise linear path from to  through the prescribed orthants. Further, from (4) it follows that the length of this path equals that of , and hence defines the unique geodesic . ∎

###### Remark 1.5.

Theorem 1.2 positions the support pairs corresponding to edges compatible with both trees into (5) as follows.

1. The set of edges of that are not in but are compatible with all edges of is the set , with and ratio .

2. The set of edges of that are not in but are compatible with all edges of  is the set with , and so its ratio is .

3. Any edge that lies in both and (and hence has positive length in both sets) appears in both sets of some support pair , and so the ratio is .

4. All other support pairs have , so both sets in the support pair are nonempty.

The ordering of the support pairs in (i)–(iii) has no effect on the structure of the geodesic between and , so for the remainder of the paper we take the ratio sequence for a geodesic to represent only the positive ratios in the sequence.

## 2 The mean and variance in tree space

Given a finite point set of trees in , the mean of , alternatively known as the Fréchet mean or barycenter, is the tree that minimizes the sum of the squares of the distances from  to the points in . The variance of is . Since is constant throughout the following discussion, we abuse notation and henceforth refer to the variance as simply . The motivation for considering these notions of mean and variance as the appropriate statistical objects in tree space was given by Sturm [48], who established the mathematical foundations for probability theory on global NPC spaces. This section reviews the required basics of Sturm’s geometric methods in the context of tree spaces. (Readers interested in Fréchet means of more general distributions, and those in arbitrary global NPC spaces, should read Section 6 now to put the material below in these more general settings.)

### 2.1 The variance function

Let be a fixed tree, and consider the geodesics from to a variable tree . The tree can be thought of as a vector in , whose coordinates are expressed using the corresponding lower-case letter . If the geodesic from  to  has support pair as in Theorem 1.2, then the squared distance from to is expressed as the function

 ST(x)=k∑i=1(∥xAi∥+∥Bi∥)2 (6)

in which is the vector whose coordinates are restricted to edges in . It follows that for a set of trees, the variance function can be written

 S(x):=S(X,T)=r∑ℓ=1STℓ(x). (7)

Thus the mean can be thought of as the point that minimizes over .

To state the next result, a real-valued function on a metric space  is strictly convex if  is a strictly convex real-valued function on  for all geodesics ; that is, if

 f(γ(λ))<(1−λ)f(γ(0))+λf(γ(1)) whenever 0<λ<1.
###### Proposition 2.1.

The variance function is strictly convex as a function on . Consequently, the mean is the unique local minimum of  in .

###### Proof.

[48, Proposition 1.7]. See also Example 6.3. ∎

The differentiability of the variance function is critical to the construction of gradient-descent methods for minimizing . This in turn depends on the differentiability of the individual functions in (7). Identifying the geodesic from to  by its support , Eq. (6) yields the partial derivatives of with respect to each of the coordinates :

 ∂ST(X)∂xe =2(∥xAi∥+∥Bi∥)xe∥xAi∥ =2xe(1+∥Bi∥∥xAi∥), (8)

where is the set containing . This is well-defined whenever lies in the interior of its maximal orthant, although the functional form of (2.1) depends upon the combinatorial type of the geodesic, and in particular on . It turns out, however, that throughout the interior of any maximal orthant the function is continuously differentiable.

###### Theorem 2.2.

The variance function is continuously differentiable on the interior of every maximal orthant .

###### Proof.

From (7) it suffices to show that the function (2.1) is continuously differentiable on the interior of . By Lemma 1.4 the geodesic between and can be represented uniquely by support satisfying (5). Any other support for this geodesic consists of a sequence of sets partitioning the and into equality subsequences, with the ratios in the equality subsequence equal to the corresponding ratio of the sets from which they were partitioned. But this means that the ratio , and hence , is the same regardless of which representation we choose for the geodesic. It follows that the partial derivatives are continuous everywhere in the interior of . ∎

### 2.2 Sturm’s algorithm

The orthant structure of tree space  prevents the averaging of finite point sets using the standard Euclidean centroid. The following serves as an approximate replacement, introduced by Sturm [48, Definition 4.6] in the context of probability theory on arbitrary globally nonpositively curved spaces.

###### Definition 2.3.

For a set of points in and an index , the inductive mean value of is the point defined by setting and for , letting be the point that is along the geodesic from to .

Note that if all of the lie in the same orthant, then the inductive mean value of in fact equals the standard centroid of . For general points in , though, the inductive mean may not be the Fréchet mean; it may in fact give different points for different orderings of the points  (see Example 5.3). Sturm goes on to prove [48, Theorem 4.7] the following strong law of large numbers for the Fréchet mean.

###### Theorem 2.4.

Fix a set of trees. If is a sequence of points sampled uniformly and independently from , then with probability , the sequence of inductive mean values approaches the mean of .

To be precise, Sturm shows that if we take the inductive mean as a random variable dependent on the sampling of the points , then the distance from  to the true mean  has expected value bounded above by . This gives us a way of estimating the Fréchet mean through a sequence of inductive means obtained by randomly sampling trees from the set .

###### Algorithm 2.5 (Sturm’s algorithm).
• approximation of the mean tree

• or pairwise distances for are not all

• while-do

• , the approximation of the mean tree

###### Remark 2.6.

The choice of stopping criterion involves two parts.

1. Running the algorithm a specified initial number of iterations guarantees an upper bound of on the expected distance of the final tree to the mean . This is derived from the proof of Theorem 4.7 in [48].

2. Comparing the final sample means serves as a proxy for testing that the sample means act like a Cauchy sequence for steps.

Thus in principle, proper settings for , , and could be used to set confidence intervals on the distance by using Sturm’s result. This would involve a more sophisticated statistical analysis, which we did not undertake in this paper. In practice, we chose , , and to balance run-time with the desired precision, working under the rough assumption that if is chosen large enough, then the approximate mean will be within of the mean tree. For Example 5.5, we chose , , and .

###### Remark 2.7.

We have made software implementing this algorithm freely available [40].

## 3 The combinatorics of geodesics in tree space

This section investigates the combinatorial structure of geodesics in  and their relationship to the variance function. To be more precise, fix a source tree . The shortest path from an arbitrary tree to  has a “combinatorial type”, determined through Theorem 1.2 by the sequence of orthants that it passes through, or more specifically the support pair associated with the geodesic. This combinatorial type can change, even when has the same topology, depending on the precise values of the lengths of the edges in . We are interested in the partition of  — called1 the vistal subdivision of —into regions for which the geodesics to the fixed tree have the same combinatorial type.

We begin by describing a simple change of coordinates, the squaring map, and characterizing the faces of maximal dimension in the vistal subdivision (Section 3.1). In particular, Propositions 3.5 and 3.6 establish that after applying the squaring map the vistal facets are polyhedral regions that cover tree space but have disjoint interiors. Next we provide a simple description of the faces of lower dimension in these polyhedra (Section 3.2). Finally, we prove that the vistal facets constitute the maximal cells of a polyhedral subdivision of tree space, called the vistal polyhedral subdivision (Section 3.3).

###### Remark 3.1.

The idea of studying the paths taken by geodesics emanating from a source point has been studied in computational geometry, in the areas of single source shortest path queries [35] and polyhedral unfolding [34]. Recently Chepoi and Maftuleac studied the single source shortest path problem for CAT(0) rectangular complexes, where each cell is a 2D rectangle. When the underlying space is intrinsically 2D, these shortest path subdivisions are often polyhedral. However, in general, we can not expect this in higher dimensions, and indeed, without the squaring map, the vistal subdivisions are not polyhedral, as illustrated in Example 3.32. The squaring map is only possible in because all of the combinatorial complexity of the space happens about the origin.

###### Example 3.2.

Vistal subdivisions are in general far from polyhedral, even after changes of coordinates such as squaring. A prerequisite for a squaring map to produce polyhedrality would be that (every component of the) the bounding hypersurface has degree at most . However, global NPC cubical complexes can have vistal cells bounded by hypersurfaces of degree greater than . We conjecture that the bounding hypersurface can have components of arbitrarily high degree.

For a specific example, consider first the arrangement

of three -dimensional cubes in which the bottom two cubes are joined along a common face, the top cube meets one bottom cube along an edge, and all three cubes meet at a central vertex. Consider the set  of points whose shortest path back to a fixed starting point (the dot in the left bottom cube) passes through the interior of the edge joining the top cube to the bottom cubes, as opposed to passing through the central vertex. The set  is obtained by rotating the top shaded triangle about the shared cube edge. The boundary of  is a cone and hence is described as the vanishing locus (in the top cube) of a polynomial of degree .

To get the desired example, glue a -cube to the right-hand -face of the top cube in the figure. If it were merely a -cube glued on, then the cone  would simply continue to expand into the added cube, creating a frustum in the added -cube. But once the new facet is a -cube, the frustum rotates freely around the shared -face. The boundary of such a rotated frustum is the locus of zeros (in the -cube) of a polynomial, but any such polynomial  has degree at least . Indeed, intersecting the hypersurface in question with a generic hyperplane yields a union of two cones, each of which has degree . Therefore restricts to a polynomial of degree at least  on the hyperplane, whence the hypersurface itself has degree at least .

### 3.1 Vistal facets

###### Definition 3.3.

Given a source tree , a maximal orthant , and a support , let be the closure of the set of trees for which the geodesic joining to  has support satisfying (P2) and (P3) with strict inequalities. A previstal facet is any nonempty set of this form.

The description of becomes linear after a simple change of variables.

###### Definition 3.4.

The squaring map acts on by squaring coordinates:

 (xe∣e∈E)↦(ξe∣e∈E), where ξe=x2e.

Denote by the image of this map, and let denote the coordinate indexed by . The image of an orthant in is then the equivalent orthant in , and the image of a previstal facet in  is a vistal facet denoted . With this change of variables, for any set of splits .

The squaring map induces on the variance function a corresponding pullback function

 S2(ξ)=S(√ξ), where (√ξ)e=√ξe. (9)

Since the variance function is continuous on  with a uniquely attained minimum by Proposition 2.1, and continuously differentiable on the interior of each maximal orthant by Theorem 2.2, the same properties hold for . Thus we can apply steepest descent methods after squaring just as we would beforehand. This is further explored in Section 4.

Theorem 1.2 implies a nice description of the vistal facets of .

###### Proposition 3.5.

The vistal facet is a convex polyhedral cone in defined by the following inequalities on , where all norms are to be interpreted as .

1. ; that is, for all , and for , where .

2. for all .

3. for all and subsets , such that is compatible.

###### Proof.

For a vector to lie in , the tree must be in an orthant of and satisfy properties (P2) and (P3). The orthant condition immediately implies the nonnegativity conditions in (O). The inequalities corresponding to (P2) are

 ∥Ai∥∥Bi∥≤∥Ai+1∥∥Bi+1∥.

Squaring, cross-multiplying, and substituting  for yields the corresponding linear inequality in in . The inequalities for (P3) are obtained in the same manner. ∎

###### Proposition 3.6.

The vistal facets are of dimension , have pairwise disjoint interiors, and cover . A point lies interior to a vistal facet if and only if the inequalities in (O), (P2), and (P3) are strict.

###### Proof.

The vistal facets cover by definition: if the geodesic from to  has support . The second statement follows by definition and by standard properties of convex polyhedra presented as solutions to systems of linear inequalities. ∎

### 3.2 Vistal cells

Henceforth in this section we focus our attention on the squared tree space and its expression as a union of polyhedral vistal facets as given by Proposition 3.5. This subsection concerns faces of vistal facets, including compact characterizations thereof.

#### Signatures and vistal cells

###### Definition 3.7.

Fix a source tree , a (not necessarily maximal) orthant , and a support . A signature associated with the support is a length sequence of symbols . The previstal cell defined by , , , and is the set of points in for which the ratio sequence for at the point  has the following specific form:

 ∥A1∥∥B1∥σ1∥A2∥∥B2∥σ2⋯σk−2∥Ak−1∥∥Bk−1∥σk−1∥Ak∥∥Bk∥. (10)

The vistal cell is the image of under squaring.

###### Remark 3.8.

Vistal cells are convex polyhedra that need not be bounded, and as such they might not be topological cells. However, the interior of a convex polyhedron is a topological cell, so every vistal cell is the closure of a topological cell.

###### Lemma 3.9.

The dimension of the vistal cell is at most , where is the number of “=” components in . The vistal cell is full-dimensional if and only if there exists a point satisfying the following two properties.

1. For each , if is “=” and if is “”.

2. The inequalities in (P3) are satisfied strictly.

###### Proof.

This follows from standard polyhedral theory, as treated in [50], for instance. ∎

Proposition 3.5 implies that (i) vistal cells are faces of vistal facets, and that (ii) vistal facets are vistal cells for which is maximal and the signature contains only “” symbols. What we prove here is that all faces of vistal facets can be represented as vistal cells, and that under some simple conditions on , Definition 3.7 provides a canonical description of each vistal cell. We start by determining all supports and signatures associated with the geodesic from to a particular point . By Lemma 1.4, the geodesic  can be represented by a unique minimal support satisfying (5):

 ∥A1∥∥B1∥<∥A2∥∥B2∥<⋯<∥Ak∥∥Bk∥.

Any other support of corresponds to a ratio sequence in which at least one ratio is replaced by a ratio subsequence formed from a partition of and , with equalities between all terms. Any ratio subsequence for which continues to satisfy (P3) together with equalities between terms of the ratio subsequences constitutes a valid support for . We next give a specific method for determining all such support sequences.

#### Incompatibility graphs and equality subsequences

In [41] it was shown how condition (P3) for support pair can be rephrased in terms of conditions on a special node-weighted graph derived from the compatibility relations between and and their coordinate values. We summarize the technique here. Denote the coordinates of and by and , and let and be their squared coordinates.

###### Definition 3.10.

The incompatibility graph between and is the weighted bipartite graph with vertex set and an edge from to whenever and  are incompatible. The weight of each vertex is , and the weight of each vertex is . A (vertex) cover for is a set having the property that every edge of has at least one endpoint in . The weight of is the sum of the weights of its vertices.

###### Lemma 3.11 ([41, Section 3]).

Property (P3) holds for support pair if and only if every cover of has weight .∎

By Lemma 3.11, testing a support pair for property (P3) is equivalent to showing that the min weight cover for has weight 1. The problem of finding the minimum cover in in turn can be reduced to solving a max flow problem (see [2], Section 12.3) on a specially defined flow network . To construct , start with , attach a source  to the -vertices of and a sink  to the -vertices of , and direct all edges from toward . Set the capacity of each edge to , set the capacity of each edge to , and set the capacities of edges in to . The Max-Flow-Min-Cut Theorem implies that the value of the maximum -flow for is equal to the capacity of a minimum capacity of an -cut in , which in turn corresponds to a minimum weight cover for . Thus the condition in Lemma 3.11 for is equivalent to the property that the max flow in is . The precise relationship between max flows in and min covers in is crucial to determining the possible ratio subsequences that can replace a term in (5), and we clarify this relationship below.

###### Example 3.12.

Figure 3

demonstrates this for a hypothetical support pair with