Hierarchical Quasi-Clustering Methods for Asymmetric Networks

# Hierarchical Quasi-Clustering Methods for Asymmetric Networks

###### Abstract

This paper introduces hierarchical quasi-clustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. We show that this output structure is equivalent to a finite quasi-ultrametric space and study admissibility with respect to two desirable properties. We prove that a modified version of single linkage is the only admissible quasi-clustering method. Moreover, we show stability of the proposed method and we establish invariance properties fulfilled by it. Algorithms are further developed and the value of quasi-clustering analysis is illustrated with a study of internal migration within United States.

\icmlauthor

Gunnar Carlssongunnar@math.stanford.edu \icmladdressDepartment of Mathematics, Stanford University \icmlauthorFacundo Mémolimemoli@math.osu.edu \icmladdressDepartment of Mathematics and Department of Computer Science and Engineering, Ohio State University \icmlauthorAlejandro Ribeiroaribeiro@seas.upenn.edu \icmlauthorSantiago Segarrassegarra@seas.upenn.edu \icmladdressDepartment of Electrical and Systems Engineering, University of Pennsylvania

\icmlkeywords

clustering, asymmetric networks

## 1 Introduction

Given a network of interactions, hierarchical clustering methods determine a dendrogram, i.e. a family of nested partitions indexed by a resolution parameter. Clusters that arise at a given resolution correspond to sets of nodes that are more similar to each other than to the rest and, as such, can be used to study the formation of groups and communities Shi & Malik (2000); Newman & Girvan (2002, 2004); Von Luxburg (2007); Ng et al. (2002); Lance & Williams (1967); Jain & Dubes (1988). For asymmetric networks, in which the dissimilarity from node to node may differ from the one from to Saito & Yadohisa (2004), the determination of said clusters is not a straightforward generalization of the methods used to cluster symmetric datasets Hubert (1973); Slater (1976); Boyd (1980); Tarjan (1983); Slater (1984); Murtagh (1985); Pentney & Meila (2005); Meila & Pentney (2007); Zhao & Karypis (2005).

This difficulty motivates formal developments whereby hierarchical clustering methods are constructed as those that are admissible with respect to some reasonable properties Carlsson & Mémoli (2010, 2013); Carlsson et al. (2013). A fundamental distinction between symmetric and asymmetric networks is that while it is easy to obtain uniqueness results for the former Carlsson & Mémoli (2010), there are a variety of methods that are admissible for the latter Carlsson et al. (2013). Although one could conceive of imposing further restrictions to winnow the space of admissible methods for clustering asymmetric networks, it is actually reasonable that multiple methods should exist. Since dendrograms are symmetric structures one has to make a decision as to how to derive symmetry from an asymmetric dataset and there are different stages of the clustering process at which such symmetrization can be carried out Carlsson et al. (2013). In a sense, there is a fundamental mismatch between having a network of asymmetric relations as input and a symmetric dendrogram as output.

This paper develops a generalization of dendrograms and hierarchical clustering methods to allow for asymmetric output structures. We refer to these asymmetric structures as quasi-dendrograms and to the procedures that generate them as hierarchical quasi-clustering methods. Since the symmetry in dendrograms can be traced back to the symmetry of equivalence relations we start by defining a quasi-equivalence relation as one that is reflexive and transitive but not necessarily symmetric (Section 3). We then define a quasi-partition as the structure induced by a quasi-equivalence relation, a quasi-dendrogram as a nested collection of quasi-partitions, and a hierarchical quasi-clustering method as a map from the space of networks to the space of quasi-dendrograms (Section 3.1). Quasi-partitions are similar to regular partitions in that they contain disjoint blocks of nodes but they also include an influence structure between the blocks derived from the asymmetry in the original network. This influence structure defines a partial order over the blocks Harzheim (2005).

We proceed to study admissibility of quasi-clustering methods with respect to the directed axioms of value and transformation. The Directed Axiom of Value states that the quasi-clustering of a network of two nodes is the network itself. The Directed Axiom of Transformation states that reducing dissimilarities cannot lead to looser quasi-clusters. We show that there is a unique quasi-clustering method admissible with respect to these axioms and that this method is an asymmetric version of the single linkage clustering method (Section 3.4). The analysis in this section hinges upon an equivalence between quasi-dendrograms and quasi-ultrametrics (Section 3.2) that generalizes the well-known equivalence between dendrograms and ultrametrics Jardine & Sibson (1971).

Exploiting the fact that quasi-dendrograms can be represented by quasi-ultrametrics, we propose a quantitative notion of stability of quasi-clustering methods (Section 3.5). We prove that the unique method from Section 3.4 is stable in the sense that we propose. We also establish several invariance properties enjoyed by this method.

In order to apply the quasi-clustering method to real data, we derive an algorithm based on matrix powers in a dioid algebra Gondran & Minoux (2008) (Section 3.6). As an example, we cluster a network that contains information about the internal migration between states of the United States for the year 2011 (Section 4). The quasi-clustering output unveils that migration is dominated by geographical proximity. Moreover, by exploiting the asymmetric influence between clusters, one can show the migrational influence of California over the West Coast.

Proofs of results in this paper not contained in the main body can be found in the supplementary material.

## 2 Preliminaries

A network is a pair where is a finite set of points or nodes and is a dissimilarity function. The value is assumed to be non-negative for all pairs and 0 if and only if . However, need not satisfy the triangle inequality and, more consequential for the problem considered here, may be asymmetric in that it is possible to have for some . We further define as the set of all networks. Networks can have different node sets and different dissimilarities .

A conventional non-hierarchical clustering of the set is a partition , i.e., a collection of sets which are pairwise disjoint, for , and required to cover , . The sets are called the blocks of and represent clusters. A partition of induces and is induced by an equivalence relation on such that for all we have that , if and only if , and combined with implies . In hierarchical clustering methods the output is not a single partition but a nested collection of partitions of indexed by a resolution parameter . For a given , we say that two nodes and are equivalent at resolution and write if and only if nodes and are in the same cluster of . The nested collection is termed a dendrogram Jardine & Sibson (1971). The interpretation of a dendrogram is that of a structure which yields different clusterings at different resolutions. At resolution each point is in a cluster of its own and as the resolution parameter increases, nodes start forming clusters. We denote by the equivalence class to which the node belongs at resolution , i.e. .

In our development of hierarchical quasi-clustering methods, the concepts of chain and chain cost are important. Given a network and , a chain is an ordered sequence of nodes in ,

 C(x,x′)=[x=x0,x1,…,xl−1,xl=x′], (1)

which starts at and ends at . We say that links or connects to . The links of a chain are the edges connecting consecutive nodes of the chain in the direction given by the chain. We define the cost of a chain (1) as the maximum dissimilarity encountered when traversing its links in order.

## 3 Quasi-Clustering methods

A partition of a set can be interpreted as a reduction in data complexity in which variations between elements of a group are neglected in favor of the larger dissimilarities between elements of different groups. This is natural when clustering datasets endowed with symmetric dissimilarities because the concepts of a node being close to another node and being close to are equivalent. In an asymmetric network these concepts are different and this difference motivates the definition of structures more general than partitions.

Considering that a partition of is induced by an equivalence relation on we search for the equivalent of an asymmetric partition by removing the symmetry property in the definition of the equivalence relation. Thus, we define a quasi-equivalence as a binary relation that satisfies the reflexivity and transitivity properties but is not necessarily symmetric as stated next.

###### Definition 1

A binary relation between elements of a set is a quasi-equivalence if and only if the following properties hold true for all :

• Points are quasi-equivalent to themselves, .

• If and then .

Quasi-equivalence relations are more often termed preorders or quasi-orders in the literature Harzheim (2005). We choose the term quasi-equivalence to emphasize that they are a modified version of an equivalence relation.

We define a quasi-partition of the set as a directed, unweighted graph with no self-loops where the vertex set is a partition of the space and the edge set is such that the following properties are satisfied (see Fig. 1):

\needspace

1(QP1) Unidirectionality. For any given pair of distinct blocks , we have at most one edge between them. Thus, if for some we have then .

\needspace

1(QP2) Transitivity. If there are edges between blocks and and between blocks and , then there is an edge between blocks and .

The vertex set of a quasi-partition represents sets of nodes that can influence each other, whereas the edges in capture the notion of directed influence from one group to the next. In the example in Fig. 1, nodes which are drawn together can exert influence on each other. This gives rise to the blocks which form the vertex set of the quasi-partition. Additionally, some blocks have influence over others in only one direction. E.g., block can influence but not vice versa. This latter fact motivates keeping and as separate blocks in the partition whereas the former motivates the addition of the directed influence edge . Likewise, can influence , can influence and can influence but none of these influences are true in the opposite direction. Block need not be able to directly influence , but can influence it through , hence the edge from to , in accordance with (QP2). All other influence relations are not meaningful, justifying the lack of connections between the other blocks. Observe that there are no bidirectional edges as required by (QP1).

Requirements (QP1) and (QP2) in the definition of quasi-partition represent the relational structure that emerges from quasi-equivalence relations as we state in the following proposition.

###### Proposition 1

Given a node set and a quasi-equivalence relation on [cf. Definition 1] define the relation on as

 x↔x′⟺x⇝x′andx′⇝x, (2)

for all . Then, is an equivalence relation. Let be the partition of induced by . Define such that for all distinct

 (Bi,Bj)∈E⟺xi⇝xj, (3)

for some and . Then, is a quasi-partition of . Conversely, given a quasi-partition of , define the binary relation on so that for all

 x⇝x′⟺[x]=[x′]or([x],[x′])∈E, (4)

where is the block of the partition that contains the node and similarly for . Then, is a quasi-equivalence on .

Proof : See Theorem 4.9, Ch. 1.4 in Harzheim (2005).

In the same way that an equivalence relation induces and is induced by a partition on a given node set , Proposition 1 shows that a quasi-equivalence relation induces and is induced by a quasi-partition on . We can then adopt the construction of quasi-partitions as the natural generalization of clustering problems when given asymmetric data. Further, observe that if the edge set contains no edges, is equivalent to the regular partition when ignoring the empty edge set. In this sense, partitions are particular cases of quasi-partitions having the generic form . To allow generalizations of hierarchical clustering methods with asymmetric outputs we introduce the notion of quasi-dendrogram in the following section.

### 3.1 Quasi-dendrograms

Given that a dendrogram is defined as a nested set of partitions, we define a quasi-dendrogram of the set as a collection of nested quasi-partitions indexed by a resolution parameter . Recall the definition of from Section 2. Formally, for to be a quasi-dendrogram we require the following conditions:

• (D̃1) Boundary conditions. At resolution all nodes are in separate clusters with no edges between them and for some sufficiently large all elements of are in a single cluster,

 ~DX(0)=({{x},x∈X}, ∅), ~DX(δ0)=({X},∅)for some δ0≥0. (5)
• (D̃2) Equivalence hierarchy. For any pair of points for which at resolution we must have for all resolutions .

• (D̃3) Influence hierarchy. If there is an edge between the equivalence classes and of nodes and at resolution , at any resolution we either have or .

• (D̃4) Right continuity. For all there exists such that for all .

Requirement (D̃1) states that for resolution there should be no influence between any pair of nodes and that, for a large enough resolution , there should be enough influence between the nodes for all of them to belong to the same cluster. According to (D̃2), nodes become ever more clustered since once they join together in a cluster, they stay together in the same cluster for all larger resolutions. Condition (D̃3) states for the edge set the analogous requirement that (D̃2) states for the node set. If there is an edge present at a given resolution , that edge should persist at coarser resolutions except if the groups linked by the edge merge in a single cluster. Requirement (D̃4) is a technical condition that ensures the correct definition of a hierarchical structure [cf. (8) below].

Comparison of (D̃1), (D̃2), and (D̃4) with the three properties defining a dendrogram Carlsson & Mémoli (2010) implies that given a quasi-dendrogram on a node set , the component is a dendrogram on . I.e, the vertex sets of the quasi-partitions for varying form a nested set of partitions. Hence, if the edge set for every resolution parameter, recovers the structure of the dendrogram . Thus, quasi-dendrograms are a generalization of dendrograms, or, equivalently, dendrograms are particular cases of quasi-dendrograms with empty edge sets. Regarding dendrograms as quasi-dendrograms with empty edge sets, we have that the set of all dendrograms is a subset of , the set of all quasi-dendrograms.

A hierarchical clustering method is defined as a map from the space of networks to the space of dendrograms . This motivates the definition of a hierarchical quasi-clustering method as follows.

###### Definition 2

A hierarchical quasi-clustering method is defined as a map from the space of networks to the space of quasi-dendrograms ,

 ~H:N→~D. (6)

Since we have that every clustering method is a quasi-clustering method but not vice versa. Our goal here is to study quasi-clustering methods satisfying desirable axioms that define the concept of admissibility. In order to facilitate this analysis, we introduce quasi-ultrametrics as asymmetric versions of ultrametrics and show their equivalence to quasi-dendrograms in the following section.

###### Remark 1

Unidirectionality (QP1) ensures that no cycles containing exactly two nodes can exist in any quasi-partition . If there were longer cycles, transitivity (QP2) would imply that every two distinct nodes in a longer cycle would have to form a two-node cycle, contradicting (QP1). Thus, conditions (QP1) and (QP2) imply that every quasi-partition is a directed acyclic graph (DAG). The fact that a DAG represents a partial order shows that our construction of a quasi-partition from a quasi-equivalence relation is consistent with the known set theoretic construction of a partial order on a partition of a set given a preorder on the set Harzheim (2005).

### 3.2 Quasi-ultrametrics

Given a node set , a quasi-ultrametric on is a function satisfying the identity property and the strong triangle inequality as we formally define next.

###### Definition 3

Given a node set , a quasi-ultrametric is a non-negative function satisfying the following properties for all :

• if and only if .

• satisfies

 ~uX(x,x′)≤max(~uX(x,x′′),~uX(x′′,x′)). (7)

Quasi-ultrametrics may be regarded as ultrametrics where the symmetry property is not imposed. In particular, the space of quasi-ultrametric networks, i.e. networks with quasi-ultrametrics as dissimilarity functions, is a superset of the space of ultrametric networks . See Gurvich & Vyalyi (2012) for a study of some structural properties of quasi-ultrametrics.

The following constructions and theorem establish a structure preserving equivalence between quasi-dendrograms and quasi-ultrametrics.

Consider the map defined as follows: for a given quasi-dendrogram over the set write , where we define for each as the smallest resolution at which either both nodes belong to the same equivalence class , i.e. , or there exists an edge in from the equivalence class to the equivalence class ,

 ~uX(x,x′) :=min{δ≥0∣∣ (8) [x]δ=[x′]δor([x]δ,[x′]δ)∈EX(δ)}.

We also consider the map constructed as follows: for a given quasi-ultrametric on the set and each define the relation on as

 x∼~uX(δ)x′⟺max(~uX(x,x′),~uX(x′,x))≤δ. (9)

Define further and the edge set for every as follows: are such that

 (B1,B2)∈EX(δ)⟺minx1∈B1x2∈B2~uX(x1,x2)≤δ. (10)

Finally, , where .

###### Theorem 1

The maps and are both well defined. Furthermore, is the identity on and is the identity on .

Theorem 1 implies that every quasi-dendrogram has an equivalent representation as a quasi-ultrametric network defined on the same underlying node set . This result allows us to reinterpret hierarchical quasi-clustering methods [cf. (6)] as maps

 ~H:N→~U, (11)

from the space of networks to the space of quasi-ultrametric networks. Apart from the theoretical importance of Theorem 1, this equivalence result is of practical importance since quasi-ultrametrics are mathematically more convenient to handle than quasi-dendrograms. Indeed, the results in this paper are derived in terms of quasi-ultrametrics. However, quasi-dendrograms are more convenient for representing data as illustrated in Section 4.

Given a quasi-dendrogram , the value of the associated quasi-ultrametric for is given by the minimum resolution at which can influence . This may occur when and belong to the same block of or when they belong to different blocks , but there is an edge from the block containing to the block containing , i.e. . Conversely, given a quasi-ultrametric network , for a given resolution the graph has as a vertex set the classes of nodes whose quasi-ultrametric is less than in both directions. Furthermore, contains a directed edge between two distinct equivalence classes if the quasi-ultrametric from some node in the first class to some node in the second is not greater than .

In Fig. 2 we present an example of the equivalence between quasi-dendrograms and quasi-ultrametric networks stated by Theorem 1. At the top left of the figure, we present a quasi-ultrametric defined on a three-node set . At the top right, we depict the dendrogram component of the quasi-dendrogram equivalent to as given by Theorem 1. At the bottom of the figure, we present graphs for a range of resolutions .

To obtain from , we first obtain the dendrogram component by symmetrizing to the maximum [cf. (9)], nodes and merge at resolution 2 and merges with at resolution 3. To see how the edges in are obtained, at resolutions , there are no edges since there is no quasi-ultrametric value between distinct nodes in this range [cf. (10)]. At resolution , we reach the first non-zero values of and hence the corresponding edges appear in . At resolution , nodes and merge and become the same vertex in graph . Finally, at resolution all the nodes belong to the same equivalence class and hence contains only one vertex. Conversely, to obtain from as depicted in the figure, note that at resolution two edges and appear in , thus the corresponding values of the quasi-ultrametric are fixed to be . At resolution , when and merge into the same vertex in , an edge is generated from to the equivalence class of at resolution which did not exist before, implying that . Moreover, we have that , hence . Finally, at there is only one equivalence class, thus the values of that have not been defined so far must equal 3.

We encode desirable properties of quasi-clustering methods into axioms which we use as a criterion for admissibility. The Directed Axiom of Value (Ã1) and the Directed Axiom of Transformation (Ã2) winnow the space of quasi-clustering methods by imposing conditions on their output quasi-ultrametrics which, by Theorem 1, is equivalent to imposing conditions on the output quasi-dendrograms. Defining an arbitrary two-node network with and for some ,

• (Ã1) Directed Axiom of Value. for every two-node network .

• (Ã2) Directed Axiom of Transformation. Consider two networks and and a dissimilarity-reducing map , i.e. a map such that for all it holds . Then, for all , the outputs and satisfy

 ~uX(x,x′)≥~uY(ϕ(x),ϕ(x′)). (12)

The Directed Axiom of Transformation (Ã2) states that no influence relation can be weakened by a dissimilarity reducing transformation. That is, if relations in the network are strengthened, the tendency of nodes to cluster cannot decrease. The Directed Axiom of Value (Ã1) simply recognizes that in any two-node network, the dissimilarity function is itself a quasi-ultrametric and that there is no valid justification to output a different quasi-ultrametric.

### 3.4 Existence and uniqueness of admissible quasi-clustering methods: directed single linkage

We call a quasi-clustering method admissible if it satisfies axioms (Ã1) and (Ã2) and we want to find methods that are admissible with respect to these axioms. This is not difficult. Define the directed minimum chain cost between nodes and as the minimum chain cost among all chains connecting to . Formally, for all ,

 ~u∗X(x,x′)=minC(x,x′)maxi|xi∈C(x,x′)AX(xi,xi+1). (13)

Define the directed single linkage (DSL) hierarchical quasi-clustering method as the one with output quasi-ultrametrics given by the directed minimum chain cost function . The DSL method is valid and admissible as we show in the following proposition.

###### Proposition 2

The hierarchical quasi-clustering method is valid and admissible. I.e., defined by (13) is a quasi-ultrametric and satisfies axioms (Ã1)-(Ã2).

We next ask which other methods satisfy (Ã1)-(Ã2) and what special properties DSL has. As it turns out, DSL is the unique quasi-clustering method that is admissible with respect to (Ã1)-(Ã2) as we assert in the following theorem.

###### Theorem 2

Let be a valid hierarchical quasi-clustering method satisfying axioms (Ã1) and (Ã2). Then, where is the DSL method with output quasi-ultrametrics as in (13).

In Carlsson & Mémoli (2010), it was shown that single linkage is the only admissible hierarchical clustering method for finite metric spaces. Admissibility was defined by three axioms, two of which are undirected versions of (Ã1) and (Ã2). In Carlsson et al. (2013), they show that when replacing metric spaces by more general asymmetric networks, the uniqueness result is lost and an infinite number of methods satisfy the admissibility axioms. In our paper, by considering the more general framework of quasi-clustering methods, we recover the uniqueness result even for asymmetric networks. Moreover, Theorem 2 shows that the only admissible method is a directed version of single linkage. In this way, it becomes clear that the non-uniqueness result for asymmetric networks in Carlsson et al. (2013) is originated in the symmetry mismatch between the input asymmetric network and the output symmetric dendrogram. When we allow the more general asymmetric quasi-dendrogram as output, the uniqueness result is recovered.

DSL was identified as a natural extension of single linkage hierarchical clustering to asymmetric networks in Boyd (1980). In our paper, by developing a framework to study hierarchical quasi-clustering methods and leveraging the equivalence result in Theorem 1, we show that DSL is the unique admissible way of quasi-clustering asymmetric networks. Furthermore, stability and invariance properties are established in the following section.

###### Remark 2 (Axiomatic strength and directed chaining effect)

DSL, having a strong resemblance to single linkage hierarchical clustering on finite metric spaces, is likely to be sensitive to a directed version of the so called chaining effect Jain & Dubes (1988). By requiring a weaker version of (Ã2), the most stringent of our two axioms, the uniqueness result in Theorem 2 is lost and density aware methods, that do not suffer from the chaining effect, become admissible. This direction, shown to be successful for finite metric spaces Carlsson & Mémoli (2013), appears to be an interesting research avenue.

### 3.5 Stability and invariance properties of DSL

DSL is stable in the sense that if it is applied to similar networks then it outputs similar quasi-dendrograms. This notion has been used to study stability of clustering methods for finite metric spaces Carlsson & Mémoli (2010). In order to formalize this concept, we define a notion of distance between networks. We define an analogue to the Gromov-Hausdorff distance Gromov (2007) between metric spaces, which we denote and defines a legitimate metric on (see A.4 in supplementary material for details). Since we may regard DSL as a map and is a subset of , we are in a position in which we can use to express the stability of .

###### Theorem 3

For all

 dN(~H∗(NX),~H∗(NY))≤dN(NX,NY).

Theorem 3 states that the distance between the output quasi-ultrametrics is upper bounded by the distance between the input networks. Thus, for DSL, nearby networks yield nearby quasi-ultrametrics. This is important when we consider noisy dissimilarity data. Theorem 3 ensures that noise has limited effect on output quasi-dendrograms. Furthermore, the theorem implies that DSL is permutation invariant; see A.7 in supplementary material.

For a non-decreasing function such that if and only if , and we write to denote the network . Any such will be referred to as a change of scale function. Then, DSL is a scale invariant method as the following proposition asserts.

###### Proposition 3

For all and all change of scale functions one has .

Since Proposition 3 asserts that the quasi-ultrametric outcome is transformed by the same function that alters the dissimilarity function in the original network, DSL is invariant to change of units. More precisely, in terms of quasi-dendrograms, a transformation of dissimilarities through results in a transformed quasi-dendrogram where the order in which influences between nodes arise is the same as in the original one while the resolution at which they appear changes according to . For further invariances of DSL, see A.7 in the supplementary materials.

### 3.6 Algorithms

In this section we interpret as a matrix of dissimilarities and as a symmetric matrix with entries corresponding to the quasi-ultrametric values for all . By (13), DSL quasi-clustering searches for directed chains of minimum infinity norm cost in to construct the matrix . This operation can be performed algorithmically using matrix powers in the dioid algebra Gondran & Minoux (2008).

In the dioid algebra the regular sum is replaced by the minimization operator and the regular product by maximization. Using and to denote sum and product on this dioid algebra we have and for all . The matrix product is therefore given by the matrix with entries

 [A⊗B]ij = n⨁k=1(Aik⊗Bkj) = mink∈[1,n]max(Aik,Bkj). (14)

Dioid powers with of a dissimilarity matrix are related to quasi-ultrametric matrices . For instance, the elements of the dioid power of a given quasi-ultrametric matrix are given by

 [~u(2)]ij=mink∈[1,n]max(~uik,~ukj). (15)

Since satisfies the strong triangle inequality we have that for all . And for in particular we further have that . Combining these two observations it follows that the result of the minimization in (15) is since none of its arguments is smaller that and one of them is exactly . This being valid for all implies . Furthermore, a matrix satisfying is such that for all , which is just a restatement of the strong triangle inequality. Therefore, a non-negative matrix represents a finite quasi-ultrametric space if and only if and only the diagonal elements are null. Building on this fact, we state the following algorithm to compute the quasi-ultrametric output by the DSL method.

###### Proposition 4

For every network with , the quasi-ultrametric is given by

 ~u∗X=A(n−1)X, (16)

where the operation denotes the st matrix power in the dioid algebra with matrix product as defined in (14).

Matrix powers in dioid algebras are tractable operations. Indeed, there exist sub cubic dioid power algorithms Vassilevska et al. (2009); Duan & Pettie (2009) of complexity . Thus, Proposition 4 shows computational tractability of the DSL quasi-clustering method. There exist related methods with lower complexity. For instance, Tarjan’s method Tarjan (1983), which takes as input an asymmetric network but in contrast to our method enforces symmetry in its output, runs in time for complete networks. It seems of interest to ascertain whether one might be able to modify his algorithm to suit our (asymmetric) output construction. In the following section we use (16) to quasi-cluster a real-world network.

## 4 Applications

The number of migrants from state to state is published yearly by the geographical mobility section of the U.S. census bureau United States Census Bureau (2011). We denote as the set containing every state plus the District of Columbia and as a migrational dissimilarity such that for all and for all is a monotonically decreasing function of the fraction of immigrants to state that come from (see A.9 in supplementary material for details). A small dissimilarity from state to state implies that, among all the immigrants into , a high percentage comes from . We then construct the asymmetric network with node set and dissimilarities . The application of hierarchical clustering to migration data has been extensively investigated by Slater, see Slater (1976, 1984).

The outcome of applying DSL with output quasi-ultrametric defined in (13) to the migration network is computed via (16). By Theorem 1, the output quasi-ultrametric is equivalent to a quasi-dendrogram . By analyzing the dendrogram component of the quasi-dendrogram , the influence of geographical proximity in migrational preference is evident; see Fig. 4 in Section A.9 of the supplementary material.

To facilitate display and understanding, we do not present quasi-partitions for all the nodes and resolutions. Instead, we restrict the quasi-ultrametric to a subset of states representing an extended West Coast including Arizona and Nevada. In Fig. 3, we depict quasi-partitions at four relevant resolutions of the quasi-dendrogram equivalent to the restricted quasi-ultrametric. States represented with the same color in the maps in Fig. 3 are part of the same cluster at the given resolution and states in white form singleton clusters. Arrows between clusters for a given resolution represent the edge set which we interpret as a migrational influence relation between the blocks of states.

The DSL quasi-clustering method captures not only the formation of clusters but also the asymmetric influence between them. E.g. the quasi-partition in Fig. 3 for resolution is of little interest since every state forms a singleton cluster. The influence structure, however, reveals a highly asymmetric migration pattern. At this resolution California has migrational influence over every other state in the region as depicted by the four arrows leaving California and entering each of the other states. This influence can be explained by the fact that California contains the largest urban areas of the region such as Los Angeles. Hence, these urban areas attract immigrants from all over the country, reducing the proportional immigration into California from its neighbors and generating the asymmetric influence structure observed. Since this influence structure defines a partial order over the clusters, the quasi-partition at resolution permits asserting the reasonable fact that California is the dominant migration force in the region.

At larger resolutions we can ascertain the relative importance of clusters. At resolution we can say that California is more important than the cluster formed by Oregon and Washington as well as more important than Arizona and Nevada. We can also see that Arizona precedes Nevada in the migration ordering at this resolution while the remaining pairs of the ordering are undefined. At resolution there is an interesting pattern as we can see the cluster formed by the three West Coast states preceding Arizona and Nevada in the partial order. At this resolution the partial order also happens to be a total order as Arizona is seen to precede Nevada. This is not true in general as we have already seen.

Hierarchical quasi-clustering methods can also be used to study, e.g., the relations between sectors of an economy. Due to space restrictions, we include this second application in A.9 in the supplementary material.

## 5 Conclusion

When clustering asymmetric networks, requiring the output to be symmetric – as in hierarchical clustering – might be undesirable. Hence, we defined quasi-dendrograms, a generalization of dendrograms that admits asymmetric relations, and developed a theory for quasi-clustering methods. We formalized the notion of admissibility by introducing two axioms. Under this framework, we showed that DSL is the unique admissible method. We pointed out that less stringent frameworks that give rise to new admissible methods can be explored by weakening the Directed Axiom of Transformation. Furthermore, we proved an equivalence between quasi-dendrograms and quasi-ultrametrics that generalizes the well-known equivalence between dendrograms and ultrametrics, and established the stability and invariance properties of the DSL method. Finally, we illustrated the application of DSL to a migration network.

## Acknowledgments

Work in this paper is supported by NSF CCF-0952867, AFOSR MURI FA9550-10-1-0567, DARPA GRAPHS FA9550-12-1-0416, AFOSR FA9550-09-0-1-0531, AFOSR FA9550-09-1-0643, NSF DMS 0905823, and NSF DMS-0406992.

## References

• Boyd (1980) Boyd, J.P. Asymmetric clusters of internal migration regions of france. Ieee Transactions on Systems Man and Cybernetics, (2):101–104, 1980.
• Bureau of Economic Analysis (2011) Bureau of Economic Analysis. Input-output accounts: the use of commodities by industries before redefinitions. U.S. Department of Commerce, 2011.
• Carlsson & Mémoli (2010) Carlsson, G. and Mémoli, F. Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425–1470, 2010.
• Carlsson & Mémoli (2013) Carlsson, G. and Mémoli, F. Classifying clustering schemes. Foundations of Computational Mathematics, 13(2):221–252, 2013.
• Carlsson et al. (2013) Carlsson, G., Memoli, F., Ribeiro, A., and Segarra, S. Axiomatic construction of hierarchical clustering in asymmetric networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 5219–5223, 2013.
• Duan & Pettie (2009) Duan, R. and Pettie, S. Fast algorithms for (max, min)-matrix multiplication and bottleneck shortest paths. Symposium on discrete algorithms, 2009.
• Gondran & Minoux (2008) Gondran, M. and Minoux, M. Graphs, dioids and semi rings: New models and algorithms. Springer, 2008.
• Gromov (2007) Gromov, M. Metric structures for Riemannian and non-Riemannian spaces. Birkhäuser Boston Inc., Boston, MA, 2007. ISBN 978-0-8176-4582-3; 0-8176-4582-9.
• Gurvich & Vyalyi (2012) Gurvich, V. and Vyalyi, M. Characterizing (quasi-) ultrametric finite spaces in terms of (directed) graphs. Discrete Applied Mathematics, 160(12):1742–1756, 2012.
• Harzheim (2005) Harzheim, E. Ordered sets. Springer, 2005.
• Hubert (1973) Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika, 38(1):63–72, 1973.
• Jain & Dubes (1988) Jain, A.K. and Dubes, R. C. Algorithms for clustering data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., 1988.
• Jardine & Sibson (1971) Jardine, N. and Sibson, R. Mathematical taxonomy. John Wiley & Sons Ltd., London, 1971. Wiley Series in Probability and Mathematical Statistics.
• Lance & Williams (1967) Lance, G. N. and Williams, W. T. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4):373–380, 1967.
• Meila & Pentney (2007) Meila, M. and Pentney, W. Clustering by weighted cuts in directed graphs. Proceedings of the 7th SIAM International Conference on Data Mining, 2007.
• Murtagh (1985) Murtagh, F. Multidimensional clustering algorithms. Compstat Lectures, Vienna: Physika Verlag, 1985, 1, 1985.
• Newman & Girvan (2002) Newman, M. and Girvan, M. Community structure in social and biological networks. Proc. Ntnl. Acad. Sci., 99(12):7821–7826, 2002.
• Newman & Girvan (2004) Newman, M. and Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E, 69, 026113, 2004.
• Ng et al. (2002) Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In T.K. Leen, T.G. Dietterich and V. Tresp (Eds.), Advances in neural information processing systems 14, MIT Press, Cambridge, 2:849–856, 2002.
• Pentney & Meila (2005) Pentney, W. and Meila, M. Spectral clustering of biological sequence data. Proc. Ntnl. Conf. Artificial Intel., 2005.
• Saito & Yadohisa (2004) Saito, T. and Yadohisa, H. Data analysis of asymmetric structures: advanced approaches in computational statistics. CRC Press, 2004.
• Shi & Malik (2000) Shi, J. and Malik, J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
• Slater (1976) Slater, P.B. Hierarchical internal migration regions of france. Systems, Man and Cybernetics, IEEE Transactions on, (4):321–324, 1976.
• Slater (1984) Slater, P.B. A partial hierarchical regionalization of 3140 us counties on the basis of 1965-1970 intercounty migration. Environment and Planning A, 16(4):545–550, 1984.
• Tarjan (1983) Tarjan, R. E. An improved algorithm for hierarchical clustering using strong components. Inf. Process. Lett., 17(1):37–41, 1983.
• United States Census Bureau (2011) United States Census Bureau. State-to-state migration flows. U.S. Department of Commerce, 2011.
• Vassilevska et al. (2009) Vassilevska, V., Williams, R., and Yuster, R. All pairs bottleneck paths and max-min matrix products in truly subcubic time. Theory of Computing, 5:173–189, 2009.
• Von Luxburg (2007) Von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 12 2007.
• Zhao & Karypis (2005) Zhao, Y. and Karypis, G. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10:141–168, 2005.

## Appendix A Supplementary Material

### a.1 Proof of Theorem 1

In order to show that is a well-defined map, we must show that is a quasi-ultrametric network for every quasi-dendrogram . Given an arbitrary quasi-dendrogram , for a particular consider the quasi-partition . Consider the range of resolutions associated with such quasi-partition. I.e.,

 {δ≥0∣∣~DX(δ)=~DX(δ′)}. (17)

Right continuity (D̃4) of ensures that the minimum of the set in (17) is well-defined and hence definition (8) is valid. To prove that in (8) is a quasi-ultrametric we need to show that it attains non-negative values as well as the identity and strong triangle inequality properties. That attains non-negative values is clear from the definition (8). The identity property is implied by the first boundary condition in (D̃1). Since for all , we must have . Conversely, since for all , and we must have that for and the identity property is satisfied. To see that satisfies the strong triangle inequality in (7), consider nodes , , and such that the lowest resolution for which or is and the lowest resolution for which or is . Right continuity (D̃4) ensures that these lowest resolutions are well-defined. According to (8) we then have

 ~uX(x,x′′)=δ1, ~uX(x′′,x′)=δ2. (18)

Denote by . From the equivalence hierarchy (D̃2) and influence hierarchy (D̃3) properties, it follows that or and or . Furthermore, from transitivity (QP2) of the quasi-partition , it follows that or . Using the definition in (8) for , we conclude that

 ~uX(x,x′)≤δ0. (19)

By definition , hence we substitute this expression in (19) and compare with (A.1) to obtain

 (20)

Consequently, satisfies the strong triangle inequality and is therefore a quasi-ultrametric, proving that the map is well-defined.

For the converse result, we need to show that is a well-defined map. Given a quasi-ultrametric on a node set and a resolution , we first define the relation

 x⇝~uX(δ)x′⟺~uX(x,x′)≤δ, (21)

for all . Notice that is a quasi-equivalence relation as defined in Definition 1 for all . The reflexivity property is implied by the identity property of the quasi-ultrametric and transitivity is implied by the fact that satisfies the strong triangle inequality. Furthermore, definitions (9) and (10) are just reformulations of (2) and (3) respectively, for the special case of the quasi-equivalence defined in (21). Hence, Proposition 1 guarantees that is a quasi-partition for every resolution . In order to show that is well-defined, we need to show that these quasi-partitions are nested, i.e. that satisfies (D̃1)-(D̃4).

The first boundary condition in (D̃1) is implied by (9) and the identity property of . The second boundary condition in (D̃1) is implied by the fact that takes finite real values on a finite domain since the node set is finite. Hence, any satisfying

 δ0≥maxx,x′∈X~uX(x,x′), (22)

is a valid candidate to show fulfillment of (D̃1).

To see that satisfies (D̃2) assume that for a resolution we have two nodes such that as in (9), then it follows that . Thus, if we pick any it is immediate that which by (9) implies that