Hierarchical QuasiClustering Methods for Asymmetric Networks
Hierarchical QuasiClustering Methods for Asymmetric Networks
Abstract
This paper introduces hierarchical quasiclustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. We show that this output structure is equivalent to a finite quasiultrametric space and study admissibility with respect to two desirable properties. We prove that a modified version of single linkage is the only admissible quasiclustering method. Moreover, we show stability of the proposed method and we establish invariance properties fulfilled by it. Algorithms are further developed and the value of quasiclustering analysis is illustrated with a study of internal migration within United States.
Gunnar Carlssongunnar@math.stanford.edu \icmladdressDepartment of Mathematics, Stanford University \icmlauthorFacundo Mémolimemoli@math.osu.edu \icmladdressDepartment of Mathematics and Department of Computer Science and Engineering, Ohio State University \icmlauthorAlejandro Ribeiroaribeiro@seas.upenn.edu \icmlauthorSantiago Segarrassegarra@seas.upenn.edu \icmladdressDepartment of Electrical and Systems Engineering, University of Pennsylvania
clustering, asymmetric networks
1 Introduction
Given a network of interactions, hierarchical clustering methods determine a dendrogram, i.e. a family of nested partitions indexed by a resolution parameter. Clusters that arise at a given resolution correspond to sets of nodes that are more similar to each other than to the rest and, as such, can be used to study the formation of groups and communities Shi & Malik (2000); Newman & Girvan (2002, 2004); Von Luxburg (2007); Ng et al. (2002); Lance & Williams (1967); Jain & Dubes (1988). For asymmetric networks, in which the dissimilarity from node to node may differ from the one from to Saito & Yadohisa (2004), the determination of said clusters is not a straightforward generalization of the methods used to cluster symmetric datasets Hubert (1973); Slater (1976); Boyd (1980); Tarjan (1983); Slater (1984); Murtagh (1985); Pentney & Meila (2005); Meila & Pentney (2007); Zhao & Karypis (2005).
This difficulty motivates formal developments whereby hierarchical clustering methods are constructed as those that are admissible with respect to some reasonable properties Carlsson & Mémoli (2010, 2013); Carlsson et al. (2013). A fundamental distinction between symmetric and asymmetric networks is that while it is easy to obtain uniqueness results for the former Carlsson & Mémoli (2010), there are a variety of methods that are admissible for the latter Carlsson et al. (2013). Although one could conceive of imposing further restrictions to winnow the space of admissible methods for clustering asymmetric networks, it is actually reasonable that multiple methods should exist. Since dendrograms are symmetric structures one has to make a decision as to how to derive symmetry from an asymmetric dataset and there are different stages of the clustering process at which such symmetrization can be carried out Carlsson et al. (2013). In a sense, there is a fundamental mismatch between having a network of asymmetric relations as input and a symmetric dendrogram as output.
This paper develops a generalization of dendrograms and hierarchical clustering methods to allow for asymmetric output structures. We refer to these asymmetric structures as quasidendrograms and to the procedures that generate them as hierarchical quasiclustering methods. Since the symmetry in dendrograms can be traced back to the symmetry of equivalence relations we start by defining a quasiequivalence relation as one that is reflexive and transitive but not necessarily symmetric (Section 3). We then define a quasipartition as the structure induced by a quasiequivalence relation, a quasidendrogram as a nested collection of quasipartitions, and a hierarchical quasiclustering method as a map from the space of networks to the space of quasidendrograms (Section 3.1). Quasipartitions are similar to regular partitions in that they contain disjoint blocks of nodes but they also include an influence structure between the blocks derived from the asymmetry in the original network. This influence structure defines a partial order over the blocks Harzheim (2005).
We proceed to study admissibility of quasiclustering methods with respect to the directed axioms of value and transformation. The Directed Axiom of Value states that the quasiclustering of a network of two nodes is the network itself. The Directed Axiom of Transformation states that reducing dissimilarities cannot lead to looser quasiclusters. We show that there is a unique quasiclustering method admissible with respect to these axioms and that this method is an asymmetric version of the single linkage clustering method (Section 3.4). The analysis in this section hinges upon an equivalence between quasidendrograms and quasiultrametrics (Section 3.2) that generalizes the wellknown equivalence between dendrograms and ultrametrics Jardine & Sibson (1971).
Exploiting the fact that quasidendrograms can be represented by quasiultrametrics, we propose a quantitative notion of stability of quasiclustering methods (Section 3.5). We prove that the unique method from Section 3.4 is stable in the sense that we propose. We also establish several invariance properties enjoyed by this method.
In order to apply the quasiclustering method to real data, we derive an algorithm based on matrix powers in a dioid algebra Gondran & Minoux (2008) (Section 3.6). As an example, we cluster a network that contains information about the internal migration between states of the United States for the year 2011 (Section 4). The quasiclustering output unveils that migration is dominated by geographical proximity. Moreover, by exploiting the asymmetric influence between clusters, one can show the migrational influence of California over the West Coast.
Proofs of results in this paper not contained in the main body can be found in the supplementary material.
2 Preliminaries
A network is a pair where is a finite set of points or nodes and is a dissimilarity function. The value is assumed to be nonnegative for all pairs and 0 if and only if . However, need not satisfy the triangle inequality and, more consequential for the problem considered here, may be asymmetric in that it is possible to have for some . We further define as the set of all networks. Networks can have different node sets and different dissimilarities .
A conventional nonhierarchical clustering of the set is a partition , i.e., a collection of sets which are pairwise disjoint, for , and required to cover , . The sets are called the blocks of and represent clusters. A partition of induces and is induced by an equivalence relation on such that for all we have that , if and only if , and combined with implies . In hierarchical clustering methods the output is not a single partition but a nested collection of partitions of indexed by a resolution parameter . For a given , we say that two nodes and are equivalent at resolution and write if and only if nodes and are in the same cluster of . The nested collection is termed a dendrogram Jardine & Sibson (1971). The interpretation of a dendrogram is that of a structure which yields different clusterings at different resolutions. At resolution each point is in a cluster of its own and as the resolution parameter increases, nodes start forming clusters. We denote by the equivalence class to which the node belongs at resolution , i.e. .
In our development of hierarchical quasiclustering methods, the concepts of chain and chain cost are important. Given a network and , a chain is an ordered sequence of nodes in ,
(1) 
which starts at and ends at . We say that links or connects to . The links of a chain are the edges connecting consecutive nodes of the chain in the direction given by the chain. We define the cost of a chain (1) as the maximum dissimilarity encountered when traversing its links in order.
3 QuasiClustering methods
A partition of a set can be interpreted as a reduction in data complexity in which variations between elements of a group are neglected in favor of the larger dissimilarities between elements of different groups. This is natural when clustering datasets endowed with symmetric dissimilarities because the concepts of a node being close to another node and being close to are equivalent. In an asymmetric network these concepts are different and this difference motivates the definition of structures more general than partitions.
Considering that a partition of is induced by an equivalence relation on we search for the equivalent of an asymmetric partition by removing the symmetry property in the definition of the equivalence relation. Thus, we define a quasiequivalence as a binary relation that satisfies the reflexivity and transitivity properties but is not necessarily symmetric as stated next.
Definition 1
A binary relation between elements of a set is a quasiequivalence if and only if the following properties hold true for all :

Points are quasiequivalent to themselves, .

If and then .
Quasiequivalence relations are more often termed preorders or quasiorders in the literature Harzheim (2005). We choose the term quasiequivalence to emphasize that they are a modified version of an equivalence relation.
We define a quasipartition of the set as a directed, unweighted graph with no selfloops where the vertex set is a partition of the space and the edge set is such that the following properties are satisfied (see Fig. 1):
1(QP1) Unidirectionality. For any given pair of distinct blocks , we have at most one edge between them. Thus, if for some we have then .
1(QP2) Transitivity. If there are edges between blocks and and between blocks and , then there is an edge between blocks and .
The vertex set of a quasipartition represents sets of nodes that can influence each other, whereas the edges in capture the notion of directed influence from one group to the next. In the example in Fig. 1, nodes which are drawn together can exert influence on each other. This gives rise to the blocks which form the vertex set of the quasipartition. Additionally, some blocks have influence over others in only one direction. E.g., block can influence but not vice versa. This latter fact motivates keeping and as separate blocks in the partition whereas the former motivates the addition of the directed influence edge . Likewise, can influence , can influence and can influence but none of these influences are true in the opposite direction. Block need not be able to directly influence , but can influence it through , hence the edge from to , in accordance with (QP2). All other influence relations are not meaningful, justifying the lack of connections between the other blocks. Observe that there are no bidirectional edges as required by (QP1).
Requirements (QP1) and (QP2) in the definition of quasipartition represent the relational structure that emerges from quasiequivalence relations as we state in the following proposition.
Proposition 1
Given a node set and a quasiequivalence relation on [cf. Definition 1] define the relation on as
(2) 
for all . Then, is an equivalence relation. Let be the partition of induced by . Define such that for all distinct
(3) 
for some and . Then, is a quasipartition of . Conversely, given a quasipartition of , define the binary relation on so that for all
(4) 
where is the block of the partition that contains the node and similarly for . Then, is a quasiequivalence on .
Proof : See Theorem 4.9, Ch. 1.4 in Harzheim (2005).
In the same way that an equivalence relation induces and is induced by a partition on a given node set , Proposition 1 shows that a quasiequivalence relation induces and is induced by a quasipartition on . We can then adopt the construction of quasipartitions as the natural generalization of clustering problems when given asymmetric data. Further, observe that if the edge set contains no edges, is equivalent to the regular partition when ignoring the empty edge set. In this sense, partitions are particular cases of quasipartitions having the generic form . To allow generalizations of hierarchical clustering methods with asymmetric outputs we introduce the notion of quasidendrogram in the following section.
3.1 Quasidendrograms
Given that a dendrogram is defined as a nested set of partitions, we define a quasidendrogram of the set as a collection of nested quasipartitions indexed by a resolution parameter . Recall the definition of from Section 2. Formally, for to be a quasidendrogram we require the following conditions:

(D̃1) Boundary conditions. At resolution all nodes are in separate clusters with no edges between them and for some sufficiently large all elements of are in a single cluster,
(5)

(D̃2) Equivalence hierarchy. For any pair of points for which at resolution we must have for all resolutions .

(D̃3) Influence hierarchy. If there is an edge between the equivalence classes and of nodes and at resolution , at any resolution we either have or .

(D̃4) Right continuity. For all there exists such that for all .
Requirement (D̃1) states that for resolution there should be no influence between any pair of nodes and that, for a large enough resolution , there should be enough influence between the nodes for all of them to belong to the same cluster. According to (D̃2), nodes become ever more clustered since once they join together in a cluster, they stay together in the same cluster for all larger resolutions. Condition (D̃3) states for the edge set the analogous requirement that (D̃2) states for the node set. If there is an edge present at a given resolution , that edge should persist at coarser resolutions except if the groups linked by the edge merge in a single cluster. Requirement (D̃4) is a technical condition that ensures the correct definition of a hierarchical structure [cf. (8) below].
Comparison of (D̃1), (D̃2), and (D̃4) with the three properties defining a dendrogram Carlsson & Mémoli (2010) implies that given a quasidendrogram on a node set , the component is a dendrogram on . I.e, the vertex sets of the quasipartitions for varying form a nested set of partitions. Hence, if the edge set for every resolution parameter, recovers the structure of the dendrogram . Thus, quasidendrograms are a generalization of dendrograms, or, equivalently, dendrograms are particular cases of quasidendrograms with empty edge sets. Regarding dendrograms as quasidendrograms with empty edge sets, we have that the set of all dendrograms is a subset of , the set of all quasidendrograms.
A hierarchical clustering method is defined as a map from the space of networks to the space of dendrograms . This motivates the definition of a hierarchical quasiclustering method as follows.
Definition 2
A hierarchical quasiclustering method is defined as a map from the space of networks to the space of quasidendrograms ,
(6) 
Since we have that every clustering method is a quasiclustering method but not vice versa. Our goal here is to study quasiclustering methods satisfying desirable axioms that define the concept of admissibility. In order to facilitate this analysis, we introduce quasiultrametrics as asymmetric versions of ultrametrics and show their equivalence to quasidendrograms in the following section.
Remark 1
Unidirectionality (QP1) ensures that no cycles containing exactly two nodes can exist in any quasipartition . If there were longer cycles, transitivity (QP2) would imply that every two distinct nodes in a longer cycle would have to form a twonode cycle, contradicting (QP1). Thus, conditions (QP1) and (QP2) imply that every quasipartition is a directed acyclic graph (DAG). The fact that a DAG represents a partial order shows that our construction of a quasipartition from a quasiequivalence relation is consistent with the known set theoretic construction of a partial order on a partition of a set given a preorder on the set Harzheim (2005).
3.2 Quasiultrametrics
Given a node set , a quasiultrametric on is a function satisfying the identity property and the strong triangle inequality as we formally define next.
Definition 3
Given a node set , a quasiultrametric is a nonnegative function satisfying the following properties for all :

if and only if .

satisfies
(7)
Quasiultrametrics may be regarded as ultrametrics where the symmetry property is not imposed. In particular, the space of quasiultrametric networks, i.e. networks with quasiultrametrics as dissimilarity functions, is a superset of the space of ultrametric networks . See Gurvich & Vyalyi (2012) for a study of some structural properties of quasiultrametrics.
The following constructions and theorem establish a structure preserving equivalence between quasidendrograms and quasiultrametrics.
Consider the map defined as follows: for a given quasidendrogram over the set write , where we define for each as the smallest resolution at which either both nodes belong to the same equivalence class , i.e. , or there exists an edge in from the equivalence class to the equivalence class ,
(8)  
We also consider the map constructed as follows: for a given quasiultrametric on the set and each define the relation on as
(9) 
Define further and the edge set for every as follows: are such that
(10) 
Finally, , where .
Theorem 1
The maps and are both well defined. Furthermore, is the identity on and is the identity on .
Theorem 1 implies that every quasidendrogram has an equivalent representation as a quasiultrametric network defined on the same underlying node set . This result allows us to reinterpret hierarchical quasiclustering methods [cf. (6)] as maps
(11) 
from the space of networks to the space of quasiultrametric networks. Apart from the theoretical importance of Theorem 1, this equivalence result is of practical importance since quasiultrametrics are mathematically more convenient to handle than quasidendrograms. Indeed, the results in this paper are derived in terms of quasiultrametrics. However, quasidendrograms are more convenient for representing data as illustrated in Section 4.
Given a quasidendrogram , the value of the associated quasiultrametric for is given by the minimum resolution at which can influence . This may occur when and belong to the same block of or when they belong to different blocks , but there is an edge from the block containing to the block containing , i.e. . Conversely, given a quasiultrametric network , for a given resolution the graph has as a vertex set the classes of nodes whose quasiultrametric is less than in both directions. Furthermore, contains a directed edge between two distinct equivalence classes if the quasiultrametric from some node in the first class to some node in the second is not greater than .
In Fig. 2 we present an example of the equivalence between quasidendrograms and quasiultrametric networks stated by Theorem 1. At the top left of the figure, we present a quasiultrametric defined on a threenode set . At the top right, we depict the dendrogram component of the quasidendrogram equivalent to as given by Theorem 1. At the bottom of the figure, we present graphs for a range of resolutions .
To obtain from , we first obtain the dendrogram component by symmetrizing to the maximum [cf. (9)], nodes and merge at resolution 2 and merges with at resolution 3. To see how the edges in are obtained, at resolutions , there are no edges since there is no quasiultrametric value between distinct nodes in this range [cf. (10)]. At resolution , we reach the first nonzero values of and hence the corresponding edges appear in . At resolution , nodes and merge and become the same vertex in graph . Finally, at resolution all the nodes belong to the same equivalence class and hence contains only one vertex. Conversely, to obtain from as depicted in the figure, note that at resolution two edges and appear in , thus the corresponding values of the quasiultrametric are fixed to be . At resolution , when and merge into the same vertex in , an edge is generated from to the equivalence class of at resolution which did not exist before, implying that . Moreover, we have that , hence . Finally, at there is only one equivalence class, thus the values of that have not been defined so far must equal 3.
3.3 Admissible quasiclustering methods
We encode desirable properties of quasiclustering methods into axioms which we use as a criterion for admissibility. The Directed Axiom of Value (Ã1) and the Directed Axiom of Transformation (Ã2) winnow the space of quasiclustering methods by imposing conditions on their output quasiultrametrics which, by Theorem 1, is equivalent to imposing conditions on the output quasidendrograms. Defining an arbitrary twonode network with and for some ,

(Ã1) Directed Axiom of Value. for every twonode network .

(Ã2) Directed Axiom of Transformation. Consider two networks and and a dissimilarityreducing map , i.e. a map such that for all it holds . Then, for all , the outputs and satisfy
(12)
The Directed Axiom of Transformation (Ã2) states that no influence relation can be weakened by a dissimilarity reducing transformation. That is, if relations in the network are strengthened, the tendency of nodes to cluster cannot decrease. The Directed Axiom of Value (Ã1) simply recognizes that in any twonode network, the dissimilarity function is itself a quasiultrametric and that there is no valid justification to output a different quasiultrametric.
3.4 Existence and uniqueness of admissible quasiclustering methods: directed single linkage
We call a quasiclustering method admissible if it satisfies axioms (Ã1) and (Ã2) and we want to find methods that are admissible with respect to these axioms. This is not difficult. Define the directed minimum chain cost between nodes and as the minimum chain cost among all chains connecting to . Formally, for all ,
(13) 
Define the directed single linkage (DSL) hierarchical quasiclustering method as the one with output quasiultrametrics given by the directed minimum chain cost function . The DSL method is valid and admissible as we show in the following proposition.
Proposition 2
The hierarchical quasiclustering method is valid and admissible. I.e., defined by (13) is a quasiultrametric and satisfies axioms (Ã1)(Ã2).
We next ask which other methods satisfy (Ã1)(Ã2) and what special properties DSL has. As it turns out, DSL is the unique quasiclustering method that is admissible with respect to (Ã1)(Ã2) as we assert in the following theorem.
Theorem 2
Let be a valid hierarchical quasiclustering method satisfying axioms (Ã1) and (Ã2). Then, where is the DSL method with output quasiultrametrics as in (13).
In Carlsson & Mémoli (2010), it was shown that single linkage is the only admissible hierarchical clustering method for finite metric spaces. Admissibility was defined by three axioms, two of which are undirected versions of (Ã1) and (Ã2). In Carlsson et al. (2013), they show that when replacing metric spaces by more general asymmetric networks, the uniqueness result is lost and an infinite number of methods satisfy the admissibility axioms. In our paper, by considering the more general framework of quasiclustering methods, we recover the uniqueness result even for asymmetric networks. Moreover, Theorem 2 shows that the only admissible method is a directed version of single linkage. In this way, it becomes clear that the nonuniqueness result for asymmetric networks in Carlsson et al. (2013) is originated in the symmetry mismatch between the input asymmetric network and the output symmetric dendrogram. When we allow the more general asymmetric quasidendrogram as output, the uniqueness result is recovered.
DSL was identified as a natural extension of single linkage hierarchical clustering to asymmetric networks in Boyd (1980). In our paper, by developing a framework to study hierarchical quasiclustering methods and leveraging the equivalence result in Theorem 1, we show that DSL is the unique admissible way of quasiclustering asymmetric networks. Furthermore, stability and invariance properties are established in the following section.
Remark 2 (Axiomatic strength and directed chaining effect)
DSL, having a strong resemblance to single linkage hierarchical clustering on finite metric spaces, is likely to be sensitive to a directed version of the so called chaining effect Jain & Dubes (1988). By requiring a weaker version of (Ã2), the most stringent of our two axioms, the uniqueness result in Theorem 2 is lost and density aware methods, that do not suffer from the chaining effect, become admissible. This direction, shown to be successful for finite metric spaces Carlsson & Mémoli (2013), appears to be an interesting research avenue.
3.5 Stability and invariance properties of DSL
DSL is stable in the sense that if it is applied to similar networks then it outputs similar quasidendrograms. This notion has been used to study stability of clustering methods for finite metric spaces Carlsson & Mémoli (2010). In order to formalize this concept, we define a notion of distance between networks. We define an analogue to the GromovHausdorff distance Gromov (2007) between metric spaces, which we denote and defines a legitimate metric on (see A.4 in supplementary material for details). Since we may regard DSL as a map and is a subset of , we are in a position in which we can use to express the stability of .
Theorem 3
For all
Theorem 3 states that the distance between the output quasiultrametrics is upper bounded by the distance between the input networks. Thus, for DSL, nearby networks yield nearby quasiultrametrics. This is important when we consider noisy dissimilarity data. Theorem 3 ensures that noise has limited effect on output quasidendrograms. Furthermore, the theorem implies that DSL is permutation invariant; see A.7 in supplementary material.
For a nondecreasing function such that if and only if , and we write to denote the network . Any such will be referred to as a change of scale function. Then, DSL is a scale invariant method as the following proposition asserts.
Proposition 3
For all and all change of scale functions one has .
Since Proposition 3 asserts that the quasiultrametric outcome is transformed by the same function that alters the dissimilarity function in the original network, DSL is invariant to change of units. More precisely, in terms of quasidendrograms, a transformation of dissimilarities through results in a transformed quasidendrogram where the order in which influences between nodes arise is the same as in the original one while the resolution at which they appear changes according to . For further invariances of DSL, see A.7 in the supplementary materials.
3.6 Algorithms
In this section we interpret as a matrix of dissimilarities and as a symmetric matrix with entries corresponding to the quasiultrametric values for all . By (13), DSL quasiclustering searches for directed chains of minimum infinity norm cost in to construct the matrix . This operation can be performed algorithmically using matrix powers in the dioid algebra Gondran & Minoux (2008).
In the dioid algebra the regular sum is replaced by the minimization operator and the regular product by maximization. Using and to denote sum and product on this dioid algebra we have and for all . The matrix product is therefore given by the matrix with entries
(14) 
Dioid powers with of a dissimilarity matrix are related to quasiultrametric matrices . For instance, the elements of the dioid power of a given quasiultrametric matrix are given by
(15) 
Since satisfies the strong triangle inequality we have that for all . And for in particular we further have that . Combining these two observations it follows that the result of the minimization in (15) is since none of its arguments is smaller that and one of them is exactly . This being valid for all implies . Furthermore, a matrix satisfying is such that for all , which is just a restatement of the strong triangle inequality. Therefore, a nonnegative matrix represents a finite quasiultrametric space if and only if and only the diagonal elements are null. Building on this fact, we state the following algorithm to compute the quasiultrametric output by the DSL method.
Proposition 4
For every network with , the quasiultrametric is given by
(16) 
where the operation denotes the st matrix power in the dioid algebra with matrix product as defined in (14).
Matrix powers in dioid algebras are tractable operations. Indeed, there exist sub cubic dioid power algorithms Vassilevska et al. (2009); Duan & Pettie (2009) of complexity . Thus, Proposition 4 shows computational tractability of the DSL quasiclustering method. There exist related methods with lower complexity. For instance, Tarjan’s method Tarjan (1983), which takes as input an asymmetric network but in contrast to our method enforces symmetry in its output, runs in time for complete networks. It seems of interest to ascertain whether one might be able to modify his algorithm to suit our (asymmetric) output construction. In the following section we use (16) to quasicluster a realworld network.
4 Applications
The number of migrants from state to state is published yearly by the geographical mobility section of the U.S. census bureau United States Census Bureau (2011). We denote as the set containing every state plus the District of Columbia and as a migrational dissimilarity such that for all and for all is a monotonically decreasing function of the fraction of immigrants to state that come from (see A.9 in supplementary material for details). A small dissimilarity from state to state implies that, among all the immigrants into , a high percentage comes from . We then construct the asymmetric network with node set and dissimilarities . The application of hierarchical clustering to migration data has been extensively investigated by Slater, see Slater (1976, 1984).
The outcome of applying DSL with output quasiultrametric defined in (13) to the migration network is computed via (16). By Theorem 1, the output quasiultrametric is equivalent to a quasidendrogram . By analyzing the dendrogram component of the quasidendrogram , the influence of geographical proximity in migrational preference is evident; see Fig. 4 in Section A.9 of the supplementary material.
To facilitate display and understanding, we do not present quasipartitions for all the nodes and resolutions. Instead, we restrict the quasiultrametric to a subset of states representing an extended West Coast including Arizona and Nevada. In Fig. 3, we depict quasipartitions at four relevant resolutions of the quasidendrogram equivalent to the restricted quasiultrametric. States represented with the same color in the maps in Fig. 3 are part of the same cluster at the given resolution and states in white form singleton clusters. Arrows between clusters for a given resolution represent the edge set which we interpret as a migrational influence relation between the blocks of states.
The DSL quasiclustering method captures not only the formation of clusters but also the asymmetric influence between them. E.g. the quasipartition in Fig. 3 for resolution is of little interest since every state forms a singleton cluster. The influence structure, however, reveals a highly asymmetric migration pattern. At this resolution California has migrational influence over every other state in the region as depicted by the four arrows leaving California and entering each of the other states. This influence can be explained by the fact that California contains the largest urban areas of the region such as Los Angeles. Hence, these urban areas attract immigrants from all over the country, reducing the proportional immigration into California from its neighbors and generating the asymmetric influence structure observed. Since this influence structure defines a partial order over the clusters, the quasipartition at resolution permits asserting the reasonable fact that California is the dominant migration force in the region.
At larger resolutions we can ascertain the relative importance of clusters. At resolution we can say that California is more important than the cluster formed by Oregon and Washington as well as more important than Arizona and Nevada. We can also see that Arizona precedes Nevada in the migration ordering at this resolution while the remaining pairs of the ordering are undefined. At resolution there is an interesting pattern as we can see the cluster formed by the three West Coast states preceding Arizona and Nevada in the partial order. At this resolution the partial order also happens to be a total order as Arizona is seen to precede Nevada. This is not true in general as we have already seen.
Hierarchical quasiclustering methods can also be used to study, e.g., the relations between sectors of an economy. Due to space restrictions, we include this second application in A.9 in the supplementary material.
5 Conclusion
When clustering asymmetric networks, requiring the output to be symmetric – as in hierarchical clustering – might be undesirable. Hence, we defined quasidendrograms, a generalization of dendrograms that admits asymmetric relations, and developed a theory for quasiclustering methods. We formalized the notion of admissibility by introducing two axioms. Under this framework, we showed that DSL is the unique admissible method. We pointed out that less stringent frameworks that give rise to new admissible methods can be explored by weakening the Directed Axiom of Transformation. Furthermore, we proved an equivalence between quasidendrograms and quasiultrametrics that generalizes the wellknown equivalence between dendrograms and ultrametrics, and established the stability and invariance properties of the DSL method. Finally, we illustrated the application of DSL to a migration network.
Acknowledgments
Work in this paper is supported by NSF CCF0952867, AFOSR MURI FA95501010567, DARPA GRAPHS FA95501210416, AFOSR FA955009010531, AFOSR FA95500910643, NSF DMS 0905823, and NSF DMS0406992.
References
 Boyd (1980) Boyd, J.P. Asymmetric clusters of internal migration regions of france. Ieee Transactions on Systems Man and Cybernetics, (2):101–104, 1980.
 Bureau of Economic Analysis (2011) Bureau of Economic Analysis. Inputoutput accounts: the use of commodities by industries before redefinitions. U.S. Department of Commerce, 2011. URL http://www.bea.gov/iTable/index_industry.cfm.
 Carlsson & Mémoli (2010) Carlsson, G. and Mémoli, F. Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425–1470, 2010.
 Carlsson & Mémoli (2013) Carlsson, G. and Mémoli, F. Classifying clustering schemes. Foundations of Computational Mathematics, 13(2):221–252, 2013.
 Carlsson et al. (2013) Carlsson, G., Memoli, F., Ribeiro, A., and Segarra, S. Axiomatic construction of hierarchical clustering in asymmetric networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 5219–5223, 2013.
 Duan & Pettie (2009) Duan, R. and Pettie, S. Fast algorithms for (max, min)matrix multiplication and bottleneck shortest paths. Symposium on discrete algorithms, 2009.
 Gondran & Minoux (2008) Gondran, M. and Minoux, M. Graphs, dioids and semi rings: New models and algorithms. Springer, 2008.
 Gromov (2007) Gromov, M. Metric structures for Riemannian and nonRiemannian spaces. Birkhäuser Boston Inc., Boston, MA, 2007. ISBN 9780817645823; 0817645829.
 Gurvich & Vyalyi (2012) Gurvich, V. and Vyalyi, M. Characterizing (quasi) ultrametric finite spaces in terms of (directed) graphs. Discrete Applied Mathematics, 160(12):1742–1756, 2012.
 Harzheim (2005) Harzheim, E. Ordered sets. Springer, 2005.
 Hubert (1973) Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika, 38(1):63–72, 1973.
 Jain & Dubes (1988) Jain, A.K. and Dubes, R. C. Algorithms for clustering data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., 1988.
 Jardine & Sibson (1971) Jardine, N. and Sibson, R. Mathematical taxonomy. John Wiley & Sons Ltd., London, 1971. Wiley Series in Probability and Mathematical Statistics.
 Lance & Williams (1967) Lance, G. N. and Williams, W. T. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4):373–380, 1967.
 Meila & Pentney (2007) Meila, M. and Pentney, W. Clustering by weighted cuts in directed graphs. Proceedings of the 7th SIAM International Conference on Data Mining, 2007.
 Murtagh (1985) Murtagh, F. Multidimensional clustering algorithms. Compstat Lectures, Vienna: Physika Verlag, 1985, 1, 1985.
 Newman & Girvan (2002) Newman, M. and Girvan, M. Community structure in social and biological networks. Proc. Ntnl. Acad. Sci., 99(12):7821–7826, 2002.
 Newman & Girvan (2004) Newman, M. and Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E, 69, 026113, 2004.
 Ng et al. (2002) Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In T.K. Leen, T.G. Dietterich and V. Tresp (Eds.), Advances in neural information processing systems 14, MIT Press, Cambridge, 2:849–856, 2002.
 Pentney & Meila (2005) Pentney, W. and Meila, M. Spectral clustering of biological sequence data. Proc. Ntnl. Conf. Artificial Intel., 2005.
 Saito & Yadohisa (2004) Saito, T. and Yadohisa, H. Data analysis of asymmetric structures: advanced approaches in computational statistics. CRC Press, 2004.
 Shi & Malik (2000) Shi, J. and Malik, J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
 Slater (1976) Slater, P.B. Hierarchical internal migration regions of france. Systems, Man and Cybernetics, IEEE Transactions on, (4):321–324, 1976.
 Slater (1984) Slater, P.B. A partial hierarchical regionalization of 3140 us counties on the basis of 19651970 intercounty migration. Environment and Planning A, 16(4):545–550, 1984.
 Tarjan (1983) Tarjan, R. E. An improved algorithm for hierarchical clustering using strong components. Inf. Process. Lett., 17(1):37–41, 1983.
 United States Census Bureau (2011) United States Census Bureau. Statetostate migration flows. U.S. Department of Commerce, 2011. URL http://www.census.gov/hhes/migration/data/acs/statetostate.html.
 Vassilevska et al. (2009) Vassilevska, V., Williams, R., and Yuster, R. All pairs bottleneck paths and maxmin matrix products in truly subcubic time. Theory of Computing, 5:173–189, 2009.
 Von Luxburg (2007) Von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 12 2007.
 Zhao & Karypis (2005) Zhao, Y. and Karypis, G. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10:141–168, 2005.
Appendix A Supplementary Material
a.1 Proof of Theorem 1
In order to show that is a welldefined map, we must show that is a quasiultrametric network for every quasidendrogram . Given an arbitrary quasidendrogram , for a particular consider the quasipartition . Consider the range of resolutions associated with such quasipartition. I.e.,
(17) 
Right continuity (D̃4) of ensures that the minimum of the set in (17) is welldefined and hence definition (8) is valid. To prove that in (8) is a quasiultrametric we need to show that it attains nonnegative values as well as the identity and strong triangle inequality properties. That attains nonnegative values is clear from the definition (8). The identity property is implied by the first boundary condition in (D̃1). Since for all , we must have . Conversely, since for all , and we must have that for and the identity property is satisfied. To see that satisfies the strong triangle inequality in (7), consider nodes , , and such that the lowest resolution for which or is and the lowest resolution for which or is . Right continuity (D̃4) ensures that these lowest resolutions are welldefined. According to (8) we then have
(18) 
Denote by . From the equivalence hierarchy (D̃2) and influence hierarchy (D̃3) properties, it follows that or and or . Furthermore, from transitivity (QP2) of the quasipartition , it follows that or . Using the definition in (8) for , we conclude that
(19) 
By definition , hence we substitute this expression in (19) and compare with (A.1) to obtain
(20) 
Consequently, satisfies the strong triangle inequality and is therefore a quasiultrametric, proving that the map is welldefined.
For the converse result, we need to show that is a welldefined map. Given a quasiultrametric on a node set and a resolution , we first define the relation
(21) 
for all . Notice that is a quasiequivalence relation as defined in Definition 1 for all . The reflexivity property is implied by the identity property of the quasiultrametric and transitivity is implied by the fact that satisfies the strong triangle inequality. Furthermore, definitions (9) and (10) are just reformulations of (2) and (3) respectively, for the special case of the quasiequivalence defined in (21). Hence, Proposition 1 guarantees that is a quasipartition for every resolution . In order to show that is welldefined, we need to show that these quasipartitions are nested, i.e. that satisfies (D̃1)(D̃4).
The first boundary condition in (D̃1) is implied by (9) and the identity property of . The second boundary condition in (D̃1) is implied by the fact that takes finite real values on a finite domain since the node set is finite. Hence, any satisfying
(22) 
is a valid candidate to show fulfillment of (D̃1).