Hypergraph Partitioning With Embeddings

Hypergraph Partitioning With Embeddings

Justin Sybrandt\scalerel* ,   Ruslan Shaydulin\scalerel* ,   Ilya Safro\scalerel* ,   Authors are with the School of Computing, Clemson University, Clemson SC 29634 USA. Emails: {jsybran, rshaydu, isafro}@clemson.edu.Manuscript received 9 Sept. 2019.
Abstract

The problem of placing circuits on a chip or distributing sparse matrix operations can be modeled as the hypergraph partitioning problem. A hypergraph is a generalization of the traditional graph wherein each “hyperedge” may connect any number of nodes. Hypergraph partitioning, therefore, is the NP-Hard problem of dividing nodes into similarly sized disjoint sets while minimizing the number of hyperedges that span multiple partitions. Due to this problem’s complexity, many partitioners leverage the multilevel heuristic of iteratively “coarsening” their input to a smaller approximation until an inefficient algorithm becomes feasible. The initial solution is then propagated back to the original hypergraph, which produces a reasonably accurate result provided the coarse representation preserves structural properties of the original. The multilevel hypergraph partitioners are considered today as state-of-the-art solvers that achieve an excellent quality/running time trade-off on practical large-scale instances of different types. In order to improve the quality of multilevel hypergraph partitioners, we propose leveraging graph embeddings to better capture structural properties during the coarsening process. Our approach prioritizes dense subspaces found at the embedding, and contracts nodes according to both traditional and embedding-based similarity measures.
Reproducibility: All source code, plots and experimental data are available at https://sybrandt.com/2019/partition.

\definechangesauthor

[color=red]is \definechangesauthor[color=blue]js \definechangesauthor[color=green]rs

1 Introduction

In order to model problems that contain interconnected groups of items, such as the various data dependencies between processes found in large scientific applications, many leverage the formalism of hypergraphs. A hypergraph is similar to a traditional graph, with the added generalization that the “hyperedges” may connect any number of nodes. Hypergraphs have been used in VLSI design [karypis1999multilevel], machine learning [zhou2007learning, hein2013total, zhang2017re], parallel algorithms [catalyurek1999hypergraph], combinatorial scientific computing [naumann2012combinatorial], and social network analysis [shepherd1990transient, zhang2010hypergraph].

The hypergraph partitioning problem is that of dividing the nodes of a hypergraph among similarly-sized disjoint sets. A good partitioning is one that minimizes the number of hyperedges spanning multiple partitions. In the context of combinatorial scientific computing and load balancing, this is the problem of dividing logical threads (nodes) across the various available machines (partitions) in order to reduce the amount of communication necessary between machines (cut hyperedges). Unfortunately, it both is NP-Hard to solve [lengauer2012combinatorial] or accurately approximate [bui1992finding] a solution to this problem.

To mange the complexity of hypergraph partitioning, practitioners turn to heuristical algorithms [shhmss2016alenex], such as the multilevel paradigm [andre2018memetic, shaydulin2019relaxation, karypis2000multilevel, boman2009advances, devine2006parallel, cheval-mlpartcompar]. The multilevel approach consists of a V-Cycle containing three phases, depicted in Figure 1. The V-cycle starts by iteratively coarsening the input hypergraph. Each iteration of the coarsening creates new coarse nodes by contracting groups of nodes in the the current set. These contractions are determined through a matching process that is informed by some similarity measure so that the resulting approximation retains the structural features of the original problem. This allows the coarse level partition to be interpolated to the next-finer level without applying too many refinement steps that may substantially slow down the entire multilevel framework. Coarsening continues until the approximate hypergraph is small enough to partition directly, forming the initial solution. Multilevel partitioners then expand this approximate solution by iteratively uncoarsening to the original input. At each stage of the uncoarsening process, solvers interpolate the coarse solution and perform a local search or other methods to refine it. The resulting solution, once gradually refined to the highest level, becomes the final partitioning.

Because the initial solution to a multilevel algorithm propagates through the entire uncoarsening process, it is important to create a coarsened representation that shares structural properties with the original hypergraph. In order to improve coarsening, other solvers have exploited clustering and community detection techniques [hs2017sea], algebraic distance [shaydulin2019relaxation], and others. However, recent advances in graph embedding [sybrandt2019heterogeneous] indicate that the latent spaces found by unsupervised machine learning algorithms can better identify structural similarities between nodes.

1.1 Our Contribution

In this work we propose exploiting latent node representations gained through embeddings to better coarsen large hypergraphs for partitioning. First, we apply star-expansion [agarwal2006higher] to gain a bipartite representation of the input hypergraph. Then we learn latent structural features of this graph using a graph embedding method. Note that our algorithm is agnostic to the particular embedding. These dense real-valued embeddings inform our coarsening algorithm to prioritize more similar nodes at each level of coarsening. Then, we identify coarsening partners by comparing latent features in conjunction with traditional edge-wise features. After each iteration, we assign newly coarsened nodes an embedding equal to the centroid of their primal embeddings.

We implement our coarsening algorithm in both the -level solver KaHyPar [shhmss2016alenex], as well as the -level solver Zoltan [devine2006parallel]. In the case of KaHyPar, we evaluate our coarsening under its original uncoarsening strategy, as well as its recent flow-based refinement [heuer2018network]. We also compare our solution quality when using six different graph embeddings: Node2Vec [grover2016node2vec], Metapath2Vec++ [dong2017metapath2vec], Boolean and Algebraic Heterogeneous Bipartite Graph Embeddings [sybrandt2019heterogeneous], as well as two combination embeddings, also proposed in [sybrandt2019heterogeneous].

We evaluate our implementations against five state-of-the-art partitioners: hMetis [karypis1998hmetis], Zoltan [devine2006parallel], PaToH [ccatalyurek2011patoh], KaHyPar (with community-based coarsening [hs2017sea]), and KaHyPar Flow (with both community-based coarsening and flow-based refinement [heuer2018network]). For each method we additionally compare both the cut and optimization objectives111Note that hMetis does not optimize for . For this objective we only compare against the remaining four.. Our evaluation spans a range of the number of partitions from 2 to 128, and 96 graphs from the SuiteSparse Matrix Collection [davis2011university]. For each combination of proposed implementation, baseline method, optimization metric, partition count, and hypergraph we perform twenty trials each with a different random seed and relabeling of the input graph. This analysis consists of over half-a-million individual experiments.

We report summary statistics for the improvement of each proposed implementation when compared to each baseline method. Specifically, we consider the improvement relative to the minimum, maximum, and average observed objective value as well as the standard deviation of trials. We additionally supply more detailed plots in the online appendix for the improvements of all graphs across all method comparisons in order to highlight graph-wise difference. These plots display the averages and standard deviations of each graph per comparison, and include statistical significant values. Additionally, all experimental data for each individual trial, including parameter settings, is available as a publicly downloadable MongoDB database dump222https://sybrandt.com/2019/partition.

Using our proposed coarsening, we observe a significant improvement between each implementation and its directly comparable baseline (e.g., our modified Zoltan against the baseline Zoltan). We observe, however, that the improvement gradually vanishes as the number of parts is increasing, indicating a promising future research direction. In some specific cases, such as hypergraphs representing social networks, our coarsening can find partitioning solutions that are over 400% better than the existing solutions. Our coarsening also improves the standard deviation of results. Typical multilevel solvers visit nodes in a random order for each level of coarsening. Our approach replaces this with a prioritized visit order derived from embeddings. This change decreases the standard deviation for almost all scenarios by over and often as high as . All experimental code, data, visualization scripts, and results are publicly available at https://sybrandt.com/2019/partition.

Fig. 1: A standard V-cycle, consisting of coarsening, and initial partition, and uncoarsening. Node size corresponds to the weight of hypothetical coarse nodes. The dashed line demonstrates the initial partition and iterative local searches at each uncoarsening level. In this example, the multilevel hierarchy consists of three levels.

2 Background

A hypergraph is an ordered pair , where is the set of nodes and is the set of hyperedges. Each hyperedge is a non-empty subset of . In hypergraph -partitioning the goal is to split the set of nodes into disjoint subsets or parts such that while minimizing an objective function over cut hyperedges subject to an imbalance constraint factor . A hyperedge belongs to the cut if it contains nodes from at least two parts: iff . Both nodes and hyperedges can have weights, namely, , and , for each , and , respectively. In this paper we consider two objective functions: “cut” and “”. The cut is the sum of weights of cut hyperedges: . Connectivity of an edge is defined as the number of parts an edge spans. The metric is then defined as . Note that for these two metrics are equivalent. The imbalance factor ensures that for each part the following holds .

Many partitioning algorithms assign weights to both nodes and hyperedges. Initially, weights are all set equally to 1. Once coarsened, the weight of a newly coarsened node is set equal to the sum of the weights of the contracted fine nodes. Coarse hyperedges are similarly weighted whenever two hyperedges are merged.

2.1 Multilevel Hypergraph Partitioning

Multilevel algorithms solve problems by constructing a hierarchy of sub-problems that approximate the original. These “coarsened” sub-problems contain fewer degrees of freedom and are therefore easier to solve. The multilevel approach captures the global structure of the problem by combining local information at different levels of coarseness. Originally introduced to speed up existing algorithms [barnard1994fast] and inspired by multigrid and multiscale optimization strategies [vlsicad], the multilevel method was quickly recognized to be a good way to improve the quality of partitioning [karypis1998fast] and is currently considered to be one of the state-of-the-art methods [bulucc2016recent] for this problem. In the context of hypergraphs, one constructs a multilevel hierarchy by merging nodes — multiple nodes at the finer level become a single node in the coarser level. Once reduced to a sufficiently small problem, a multilevel partitioner can solve the coarse parititoning problem using an algorithm that would be infeasible on large-scale instances. This initial solution is iteratively uncoarsened by first interpolating it onto a finer level and then refining it. The refinement is typically performed using local search or other methods. The coarsening-uncoarsening pipeline is commonly referred to as V-cycle (see Fig. 1). Traditionally, at each level of the coarsening process all or almost all nodes have at least one merging partner, resulting in levels. This is the approach used by Mondriaan [vastenhouw2005two], hMetis2 [karypis1999multilevel], Zoltan [devine2006parallel], and PaToH [ccatalyurek2011patoh]. However, KaHyPar [shhmss2016alenex] implements an -level approach where at each level only one pair of nodes is contracted. Over the years the multilevel method has become the gold standard in hypergraph partitioning to achieve an excellent time/quality trade-off in many practical cases and is used by most state-of-the-art solvers, including all of the ones discussed in this paper. For an extensive review of (hyper)graph partitioning methods, the reader is referred to [bulucc2016recent, bichot2011graph].

When constructing coarser hypergraphs, state-of-the-art partitioners contract nodes according to some heuristic such that during the uncoarsening the solution can be interpolated from coarser levels without the loss of quality. These methods typically make coarsening decisions based on some similarly measure that can be computed on node-pairs. Most multilevel hypergraph partitioners, including Mondriaan [vastenhouw2005two], hMetis2 [karypis1999multilevel] and Zoltan [devine2006parallel], measure inner product or its variations, such as absorption (PaToH [ccatalyurek2011patoh]) and heavy edge (hMetis2 [karypis1999multilevel], Parkway [trifunovic2008parallel], PaToH [ccatalyurek2011patoh] and KaHyPar [hs2017sea]).

The inner product of two nodes is defined as the Euclidean inner product of the weighted hyperedge incidence vectors [devine2006parallel]. This similarity measure and its variations are simple and computationally inexpensive, but are limited due to only using local information. Recently, a number of advanced coarsening schemes were introduced to address this limitation. Shaydulin et al. introduce a relaxation-based similarity metric algebraic distance [shaydulin2019relaxation], extending a similar approach from graphs [chen2011algebraic]. In [shaydulin2018sea] this approach is extended and incorporated within an aggregative coarsening scheme, inspired by algebraic multigrid and stable matching approaches. An unfinished but promising attempt to generalize hypergraph coarsening using algebraic multigrid (AMG) on graphs [safro2009multilevel] was published in Sandia Labs Summer Reports [buluc-boman]. In AMG coarsening [safro2015advanced, Safro2006, ron2011relaxation], instead of being contracted, the nodes are split into fractions which form coarse aggregates. Heuer and Schlag introduce a community-aware coarsening that uses global clustering information to restrict matching between communities [hs2017sea].

During uncoarsening, nodes are uncontracted and the coarser-level partition is interpolated to the finer-level node set. Then the solution is iteratively refined using a local node-moving heuristic. A majority of hypergraph partitioners use a variation of Fiduccia-Mattheyses [fiduccia1988linear] or Kernighan-Lin [kernighan1970efficient] to perform these local searches [heuer2018network, vastenhouw2005two, karypis1999multilevel, devine2006parallel, ccatalyurek2011patoh, trifunovic2008parallel]. Using a local search heuristic at the uncoarsening stage allows these partitioners to locally improve the global solution interpolated from coarser levels. Recently, Heuer et al. introduced a flow-based refinement scheme for -way hypergraph partitioning [heuer2018network], extending similar approaches from graph partitioning [sanders2011engineering].

2.2 Hypergraph Embeddings

The Skip-Gram text embedding model presented by Mikolov et al. learns embeddings by discovering the relationship between each word and its typical context [mikolov2013efficient, mikolov2013distributed]. This model underpins many graph embedding models [perozzi2014deepwalk, grover2016node2vec, dong2017metapath2vec]. In order to efficiently handle large volumes of text the Skip-Gram model samples “windows.” Each window is centered around a target word, and includes local context both leading and trailing the target. The Skip-Gram model learns to predict each window’s contents given the target word. The underlying assumption behind this approach is that “similar words share similar company,” and has shown to result in semantically rich latent features [tsvetkov2015evaluation, gladkova2016analogy].

Deepwalk, a pioneer graph embedding technique presented by Perozzi et al., reduces the graph structure to an analogous textual problem in order to also leverage the Skip-Gram approach [perozzi2014deepwalk]. Here, the underlying assumption is “similar nodes share similar company,” however graphs present additional challenges not found in text. Firstly, representing a node’s “company” is nontrivial. To simply take all first-order neighbors of a target node may not be sufficient, or may contain more neighbors than fit in memory. To reconcile this, Perozzi et al. proposes random-walk sampling. Pseudo-sentences form when traversing a graph, wherein each node is analogous to a word and each random walk is analogous to a sentence. These random walks serve as input to the hierarchical Skip-Gram model, similar to that proposed by Mikolov et al.

Extending this approach, Grover and Leskovec modify random-walk sampling to add a “return probability” parameter [grover2016node2vec]. They observe that typical depth-first random walks capture structural similarities, while breadth-first approaches (such as the LINE embedding method [tang2015line]) capture homophilic relationships. In order to improve overall embedding quality, Node2Vec random walks strike a balance between the two traversal strategies by probabilistically doubling-back on old neighborhoods, or forging onward to unseen areas.

The above-mentioned graph embedding techniques assume nothing is known about the considered graph’s structure. However, recent methods address particular graph techniques that are applicable to hypergraphs. Metapath2Vec++, proposed by Dong et al., assumes each node is of a particular type, and that certain “metapaths,” path descriptions containing only type information, are known to be meaningful [dong2017metapath2vec]. In the case of hypergraphs, we can perform a star-expansion to convert each hyperedge to a new layer of nodes, which converts our input hypergraph into a traditional bipartite graph [agarwal2006higher]. This representation has two node types, original nodes, and those derived from hyperedges. Due to its bipartite structure the only metapath is that of alternating types. However, due to the model architecture of Metapath2Vec++, we can learn some type-specific latent features for each.

FOBE + HOBE Details: Further recent work addresses the bipartite case specifically. Sybrandt and Safro present multiple methods for embedding bipartite graphs [sybrandt2019heterogeneous]. These include First- and High-Order Bipartite Embeddings, as well as a combination approach to learn joint embeddings on bipartite graphs. These approaches are applicable to hypergraphs as represented through star-expansion. These methods model the two distinct types of nodes present in a bipartite network separately in order to better capture same-typed features. In the context of hypergraphs this is analogous to modeling nodes and hyperedges separately. For the purpose of this work however, we only consider the embedding of nodes present in the original hypergraph.

The first-order approach, FOBE, presented by Sybrandt and Safro samples observed node similarities and then learns embeddings to encode those similarities via dot product. Nodes are deemed “similar” in this context if they share an edge, or a neighbor. Formally, if is an undirected bipartite graph with nodes and edges , an edge from nodes is indicated as , and indicates the neighborhood of , then the similarity measured by FOBE is:

(1)

Note that for two nodes to be measured as similar by the above equation, they must either be of different bipartite types and share an edges, or of the same type and share a neighbor of the opposite type. FOBE then encodes the above similarities into node embeddings. However, the objective used to learn these embeddings is constructed to only explicitly compare nodes of the same type. If and are disjoint subsets of indicating the two types present in the bipartite graph, and is a function that related node to its embedding in , then the various encoded similarities are represented as the following:

(2)
(3)
(4)

Here, and indicate the similarity shared between the nodes of the same type. Then, decomposes the similarity of cross-typed nodes into sets of same-typed comparisons. These predicted similarity measures derived from embeddings are fit to the observed samples above simultaneously.

The high-order embedding method (HOBE) presented by Sybrandt and Safro extends FOBE by sampling neighbors-of-neighbors, and prioritizes these similarities through the local heuristic signal provided by algebraic distance on graphs [ron2011relaxation, chen2011algebraic, shaydulin2019relaxation, john2016single]. This approach begins with a fast iterative relaxation technique that places all bipartite nodes on the interval such that locally similar nodes are more likely to have similar values. Multiple trials with random initializations boost this signal by reducing the effect of incidental proximity observed between distant nodes in a single trial. Formally, this algebraic similarity measure is determined by first calculating algebraic coordinates for each node . These coordinates are randomly initialled , and are refined over iterations via the following:

(5)

Here, indicates the trail (), indicates the iteration, and is the damping factor (suggested in [shaydulin2019relaxation]). These coordinates per-trial are then combined into a more robust similarity measure through the following:

(6)

Building from this heuristic signal, the HOBE similarity measures the presence of highly-similar shared neighbors through the following:

(7)
(8)

In a manner similar to FOBE, these three similarity measures are encoded into the dot product of embeddings through a combined objective function.

In this work we explore the solution quality of our coarsening algorithm using a number of different embedding methods. We select Node2Vec and Metapath2Vec++ as well as both FOBE and HOBE to explore. In addition, we train two combination embeddings, one that merges all four methods, and another that combines only FOBE and HOBE. We do not attempt to demonstrate that any individual embedding is superior for hypergraph partitioning, on the contrary we demonstrate in Section 5 that all embeddings improve the partitioning quality, showing that such embeddings are an excellent tool for advanced coarsening schemes potentially not only for the partitioning problem. Node2Vec allows us to evaluate a generic embedding technique not designed with hypergraphs in mind, Metapath2Vec++ evaluates a method shown to transfer well to hypergraphs [sybrandt2019heterogeneous], while the Heterogeneous Bipartite approaches are designed to facilitate hypergraph learning.

3 Method

In order to improve the quality of multilevel hypergraph partitioning solvers, such as Zoltan [devine2006parallel] and KaHyPar [shhmss2016alenex], we take advantage of graph embedding techniques. These methods learn dense, real-valued representations in a fixed-sized vector space for each node. In the case of traditional graphs, Grover et al. demonstrate that these embeddings can capture both structural and homophilic latent relationships [grover2016node2vec]. Additional work from Sybrandt and Safro demonstrates that these methods extend to hypergraphs [sybrandt2019heterogeneous] through star expansion [agarwal2006higher].

Graph embedding methods typically encode observed similarities through some similarity measure. In the case of Algebraic and Boolean Heterogeneous Bipartite Embeddings, these similarities are explicitly modeled using the dot product [sybrandt2019heterogeneous]. The same similarity measure is also found in more traditional methods such as LINE [tang2015line]. Semantically, dot product implies that two nodes are similar if they share common prominent features. Unlike cosine similarity, the dot product is not normalized, and therefore does not significantly penalize nodes for being dissimilar, provided their dissimilar values are near zero. We observe that dot product also applies to other graph embedding techniques, such as the Skip-Gram-based methods used in Node2Vec [grover2016node2vec], Deepwalk [perozzi2014deepwalk], and by extension, MetaPath2Vec++ [dong2017metapath2vec]. While the specifics of each method are beyond the scope of this work, we note that dot product is a robust measure of similarity across embeddings.

We exploit graph embeddings to better match nodes during coarsening. The typical matching process, in both -level and -level coarsening, identifies pairs of similar nodes, called coarsening partners, to merge in the next-coarsest representation. The resulting coarsened node becomes a member of all hyperedges incident to both and . As a result, the overall partitioning solution can be drastically altered by the quality of these node-pairs, as demonstrated below in Section 5.

One common node similarity measure for finding coarsening partners is an inner product of edge features. In KaHyPar [shhmss2016alenex], this measure is a ratio between edge weight and size, as reproduced in (9). Here, corresponds to the weight of a coarsened hyperedge, which indicates the number of original hyperedges containing the same coarsened node set.

(9)

This measure prioritizes nodes sharing many “tight” hyperedges, those with fewer members, as these tend to be more meaningful in real-world applications. For instance, members of a selective club or shoppers buying a niche ingredient are likely more self-similar than those buying bread or belonging to a massive organization. However, this model equally prioritizes all hyperedges of similar size, even if they contain a random assortment of nodes. To improve this coarsening measure, we introduce a term derived from a pretrained graph embedding.

Hypergraph embeddings, typically derived from the bipartite representation, project nodes into a fixed-dimensionality vector space [sybrandt2019heterogeneous]. While the dimensionality of this space is a hyperparameter to an embedding model, typical values range from 100 to 1,000 and are robust to small changes. As a result, many methods capture similarities mathematically through the inner product of embeddings [grover2016node2vec, sybrandt2019heterogeneous, dong2017metapath2vec]. Formally, we represent the pretrained embedding as a function mapping each node to a -dimensional vector. We represent the embedding similarity between two nodes as

(10)

These embeddings can capture both structural and homophilic latent properties [grover2016node2vec]. Structural properties include hubs, bridges, and leafs, while homophilic properties include clusters and common neighbors. Different embedding techniques prioritize different latent features, and we explore six different embedding schemes to underpin our coarsening. These methods are outlined in detail in Section 2.2. However, we observe that all six considered embedding improve overall coarsening results (see Figure 4 as well as the online appendix).

We combine both hyperedge-wise and embedding-wise similarities into a single measure for each node pair. As a result, two nodes will be selected as coarsening partners if they share both many hyperedges as well as many latent features. This formulation provides a mechanism to lessen the impact of hyperedges without self-similar content, because the similarity conveyed by a tight hyperedge will be lessened by the dissimilarity conveyed in the embedding. In addition, we add in a regularization term to maintain balance between node weights. The weight of a coarsened node is simply the number of original nodes that have been merged together in the coarsened representation. Without this penalty, dense subregions of the hypergraph could be coarsened entirely before anything else (in the -level case), resulting in an imbalanced solution. Our resulting score is formally put

(11)

Note that to receive a high score given our proposed method, two nodes must share hyperedges, have similar latent features, and be of reasonably small weights. By including the edge-wise inner product, our method cannot coarsen disparate regions of the network that happen to share similar latent features, which can arise from some embedding techniques. For instance, disconnected subgraphs may be embedded in overlapping subspaces, and a simpler embedding-only similarity measure would then conjoin the disconnected components.

We additionally apply the latent information present in embeddings to order nodes when identifying coarsening partners. Our goal is to match the pairs with the highest similarity first, so that the resulting coarsened nodes more likely to share the same higher-order structural feature, such as a cluster or role. We sort nodes by their nearest neighbor in the embedding space, and penalize this similarity again by weights. We restrict the nearest-neighbor search to those nodes actually sharing hyperedges, as to match the scores calculated above. Formally, the sorting criteria we propose is as follows

(12)

where represents the neighborhood of node , namely, .

We present our overall matching algorithm in Procedure 1. All nodes begin unmatched, as indicated by , a Boolean characteristic vector of (un)matched nodes. We then visit each node in sorted order, according to the above criteria. Provided a visited node is unmatched, we iterate its neighborhood and consider any unmatched neighbor that would not result in a coarse node above the weight tolerance. Out of these considered nodes, we select whichever has the highest score according to Eq. 11.

After coarsening, newly contracted nodes are assigned an embedding equal to the centroid of its primal nodes. In this context, a primal node is a fully uncoarsened node specified at the finest level of the problem. For instance, if at a given level of coarsening we match and , the resulting coarse node would have the following properties. Here represents the newly coarsened node, represents the modified edge set, and represents the set of primal nodes corresponding to node . At the finest level, .

(13)
(14)
(15)
(16)
(17)
(18)
(19)
0:  Hypergraph and corresponding weights and . Node weight tolerance .
0:  Set of matches to be further coarsened.
1:  
2:  Sort with respect to (Equation 12) in decreasing order.
3:  for  in sorted order do
4:     if  then
5:        ,
6:        for  do
7:           if  and and  then
8:              
9:              
10:              
11:              if  then
12:                 ,
13:        if  then
14:           Match with for coarsening
15:           
Procedure 1 Match nodes for coarsening.

4 Experimental Design

In order to evaluate the partitioning quality of our proposed coarsening method, we implement our matching algorithm in both KaHyPar [shhmss2016alenex] and Zoltan [devine2006parallel]. Our KaHyPar implementation adds a new coarsening class to replace the existing community-based structure, and maintains other KaHyPar features such as its direct -way initial solution. We evaluate this implementation with both traditional vertex-swapping refinement as well as more recent flow-based refinement [heuer2018network]. In the case of Zoltan we introduce a new function to evaluate nodes during matching. Our implementation also requires minor modifications elsewhere in the software package in order to address re-indexing during recursive bisection. These changes do not effect the actual coarsening algorithm, as each call to recursive bisection begins with a subset of nodes and hyperedges from the original hypergraph.

In order to quantify the improvement in quality gained by embedding-based coarsening, we compute a number of partitions under a variety of scenarios. This begins with a set of embeddings. Due to resource constraints, we only embed each graph once for each considered technique and reuse this embedding in different runs. This compromise is necessary because graph embedding can more expensive than the considered multi-level hypergraph partitioners by orders of magnitude, determined often by the efficiency of the embedding software. Furthermore we note that the problem of embedding coarsened hypergraphs is nontrivial. We observe a significant decrease in overall solution quality when attempting to recompute embeddings at intermediate coarse levels, as the considered methods were not indented to capture to small weighted structures. Ultimately we find that this challenge lies outside the scope of this work.

The set of embedding techniques we explore consists of Node2Vec [grover2016node2vec], Metapath2Vec++ [dong2017metapath2vec], FOBE, and HOBE [sybrandt2019heterogeneous], as well as two combination embeddings (also presented in [sybrandt2019heterogeneous]). The first combination merges only FOBE and HOBE, while the second combination merges all four previously stated embeddings. All considered embeddings are in . While higher-dimensional embeddings have the ability to capture more complex latent structure, this complexity can also lead to poorer convergence while training. We performed an initial experiment comparing 100- to 500-dimensional embeddings of our hypergraph set, and observed no significant difference in solution quality. In addition, we do not claim to extensively test our coarsening against all state of the art embeddings, only that our proposed technique is robust to different embedding algorithms.

Each of the six input embeddings combines with each of the three proposed implementations, KaHyPar, KaHyPar Flow, and Zoltan, to create a set of eighteen proposed partitioners with embedding-based coarsening. We add to this five baseline methods: hMetis [karypis1998hmetis], Zoltan [devine2006parallel], PaToH [ccatalyurek2011patoh], KaHyPar (with community-based coarsening [hs2017sea]), and KaHyPar Flow (with both community-based coarsening and flow-based refinement [heuer2018network]). This results in 23 considered partitioners. For each of the partitioners, we run separate trials optimizing for cut and respectively. The differences between these objectives is defined in detail in Section 2.

For all combination of partitioner and objective we additionally compare across a range of -values. Many solvers identify a larger number of partitions through recursive-bisection (all considered except KaHyPar), which iteratively partitions the input hypergraph into two parts until reaching the desired number of partitions. For this reason we compare different numbers of partitions corresponding to the powers of two from 2 to 128. For each of these scenarios, we apply an overall imbalance tolerance of 3%. Then, for each combination of partitioner, objective, and -value, we compare across a benchmark of hypergraphs.

Our benchmark consists of 86 sparse matrices selected from the SuiteSparse Matrix Collection [davis2011university]. These matrices span a range of domains including social networks, power grids, and linear systems. We interpret each matrix as the incidence matrix of a hypergraph. In doing so, we consider each row to represent a node, each column to be a hyperedge, and a nonzero value in to indicate node participates in hyperedge .

We additionally include ten synthetic hypergraphs that were designed to test the robustness of the coarsening process, extending a similar approach from graphs [safro2015advanced]. These graphs are a mixture of graphs that are weakly connected between each other, with less than of edges connecting different graphs in the mixture. In multilevel setting, this can cause the coarsening process to incorrectly contract edges between different graphs in the mixture, resulting in uneven coarsening, overloaded refinement and worse quality of the final solution. This structure can be found in many real-world graphs, including multi-mode networks [tang2008community] and logistics multi-stage system networks [Stock2006]. We introduce additional complexity by adding additional random edges (denoted in the online appendix as “W/ Noise”). Full graphs, as well as scripts used to generate them are available in the online appendix.

Our overall benchmark suite of 96 graphs is explored in detail in the online appendix, wherein we present node and hyperedge distributions for all graphs. All names provided, except for our newly generated synthetic graphs, correspond to those found in the Sparse Matrix Collection.

For each combination of partitioner, objective, -value, and graph, we compute twenty trials, with a total of over half-a-million trials. For each trial we generate a new random seed and randomly relabel the node and hyperedge indices.

In order to quantify the difference in quality between the two compared methods, we rely on summary statistics such as the macro-average of improvement. We define the “improvement” as a ratio between the partitioning quality of some baseline method against some comparison method. Note that if the comparison method achieves a cut or value that is lower than the baseline, the improvement will be greater than 1. We compute four different improvement statistics between two methods: average, minimum, maximum, and standard deviation. In this way we compare the expected, worst-case, and best-case observed partitioning quality, as well as the variance of results. In the following equations, indicates the improvement between a baseline and comparison algorithm for the -partition problem on hypergraph with respect to metric . Let represent the trial of algorithm . For our experiments we run of these trails.

(21)
(22)
(23)
(24)
(25)

We can then reduce the overall comparison of two methods with respect to a particular optimization metric into the macro-average of improvement across all graphs in the benchmark. Here, represents the set of considered benchmark.

(26)

5 Results

Following the experimental procedure described in Section 4, we evaluate the partitioning quality of our proposed coarsening algorithm. Due to the volume of experimental trials performed for our evaluation, we can only present summary statistics for representative experiments in the body of this work. Full results are available online.

# Parts(): 2 4 8 16 32 64 128
KaHyPar 8% 13% 10% 6% 4% 3% 1%
KaHyPar(flow) 9% 11% 4% 2% 3% 2% 0%
Zoltan 48% 28% 15% 14% 9% 5% 3%
(a) Marginal avg. improvement.
# Parts(): 2 4 8 16 32 64 128
KaHyPar 8% 16% 9% 1% 3% 1% 0%
KaHyPar(flow) 10% 11% 3% 1% 1% 1% -1%
Zoltan 51% 45% 51% 41% 31% 14% 8%
(b) Marginal avg. cut improvement.
TABLE III: The above tables summarize the average increase in quality that can be gained per-metric and per-method when utilizing embedding-based coarsening. Each method is compared against its corresponding baseline, such as comparing KaHyPar (flow) with and without embedding-based coarsening. All quality numbers come from an average of all trials using the FOBE embedding and metric.

We present the macro-average improvement () gained by the embedding-based coarsening for each implementation across all partitions counts in Table III (a and b) for the and cut metrics. These tables compare each implementation against its corresponding baseline without embedding-based coarsening. For instance, KaHyPar without flow-based refinement [hs2017sea] is compared to the same KaHyPar without flow-based refinement but with the embedding-based coarsening.

Figure 2 depicts improvement of representative methods to KaHyPar with flow-based refinement, the top performing baseline [heuer2018network]. Note that in these plots we use “EC” to refer to embedding-based coarsening. Additionally, in these tables and figures we select the FOBE embedding for both the KaHyPar and Zoltan implementations to represent overall embedding quality, as all considered embeddings perform similarly. Furthermore, Figure 3 depicts the relative improvement gained by embedding-based coarsening per-graph for the 2-partition problem. Our analysis focuses on insights that can be observed from these representative results. Overall we observe that embedding-based coarsening increases average quality across almost all considered comparisons. This improvement is typically greater for low partition counts.

In the case of KaHyPar, an improvement is the result of replacing the existing community-based coarsening [hs2017sea]. The community-based coarsening, which is further discussed in Section 6, restricts coarsening partners to only nodes that share a community in the finest level of the input hypergraph. Our embedding-based coarsening is similar in the sense that node communities are likely to share similar embeddings. However, by introducing node embeddings we relax this constraint. As a result our approach gains in solution quality by occasionally merging across communities, which is particularly important when merging hubs or bridges that may border multiple communities.

When comparing the partitioning quality across all KaHyPar trials, we observe that KaHyPar with embedding-based coarsening but without flow-based refinement can find better partitioning solutions than KaHyPar with flow-based refinement. Specifically we observe an average improvement of for the metric for and , demonstrated in Figure 2. Furthermore, some graphs partitioned with embedding-based coarsening in Zoltan outperform even the latest KaHyPar version. Applications attempting to partition particularly large hypergraphs may benefit from this result as embedding-based coarsening and ()-level partitioners expose more parallelism than the -level KaHyPar design.

For small this flexibility appears to be the most valuable, as these latent embedding spaces may only detect a handful of relevant ground-truth clusters. For higher , the number of partitions appears to exceed the number of natural divisions in our embedding space. Our KaHyPar implementation particularly struggles here, which we observe is likely to result from the direct -way initial solution this method identifies [ahss2017alenex]. In contrast, Zoltan, which uses recursive-bisection to recursively 2-partition the input hypergraph, is more resilient for larger . Recursive bisection has the effect of splitting the input embedding into two subspaces. When combined with embedding-based coarsening, these subspaces divide the key axes of variance within the embedding spaced. Then, the next iteration need only consider locally relevant differences within each respective subspace, which retains more locally-relevant information. This effect is what keeps Zoltan’s partitioning quality competitive with KaHyPar flow for larger , as seen in Figure 2.

Examining the standard deviation results shown in Figure 2, we observe that embedding-based coarsening decreases the standard deviation of possible results for a given hypergraph. These figures corresponding to the standard deviation of both the and cut metrics demonstrate that the macro-average improvement of standard deviation is often substantial, and occasionally over an order of magnitude. This result comes from replacing the typically random node-visit order with a sorted ordering dependent on each node’s nearest-neighbor. If addition, the figures corresponding to the minimum and maximum and cut observed pet-trial all demonstrate that embedding-based coarsening consistently improves the expected worst-case () and average best-case () quality. Many applications run multiple partitioning trials and select the top-performing result [trifunovic2006parallel], however by reducing the variance of results to an improved range, our coarsening approach could improve overall application efficiency.

Looking into the graph-wise results, shown in Figure 3, we observe that there is a class of hypergraphs that are best aided by embedding-based coarsening. We observe that embedding-based coarsening can identify partitions with 200-400% improvement in graphs with rich latent structure, such as the communication networks corresponding to the Enron or European Union email networks (as found in [davis2011university]). Additionally, some synthetic graphs, those constructed through a star-shaped merge of multiple real-world networks and are designed to complicate the coarsening process, are similarly improved. These improvements on this class of graphs is also highly statistically significant (). These highly-improved graphs have rich latent global structure that may not be accurately captured through hyperedge-wise features. For example, the departmental structure within Enron is lost when individually considering emails between particular employees. While not every graph can be exploited to that magnitude, we do observe that a significant portion of our benchmark is significantly improved, and a further portion is merely unchanged. We do however observe a subset of graphs that are partitioned worse with embedding-based clustering. Nemsemm2, a sparse matrix corresponding to a linear program, is partitioned almost three-times worse using embedding-based coarsening. The incidence matrix of this hypergraph is nearly block-diagonal, which results in significant hyperedge-wise features that are not translated into an embedding, as disjoint graph regions are often embedded in overlapping spaces. In contrast, Nemswrld is another linear-program sparse matrix published by the same group, but is less block-diagonal and receives an statistically significant average improvement of about . Each of these above results refer to a -partition performed by KaHyPar (Figure 4.

Fig. 2: Above depicts  (26) using KaHyPar Flow as the baseline () for representative considered metrics.
Fig. 3: Above depicts the improvement of the metric in KaHyPar when neural coarsening is applied to the 2-partition problem. Each bar represents a comparison of 20 baseline and 20 embedding-based trails for a single graph. The color of each bar represents the statistical significance between the sets of trial results. The small black lines represent the standard deviation of the embedding-based method, and the absence of a bar indicates a standard deviation near zero. Note that the graphs with the most improvement primarily social networks. Bar heights correspond to the statistic.

6 Related Work

Our proposed embedding-based coarsening is similar to the community-based coarsening proposed by Heuer et al. and used in our KaHyPar baseline [hs2017sea]. Their approach begins with a “community detection phase” wherein traditional community detection algorithms cluster the nodes contained in the bipartite star-expansion of the original graph. From there, the coarsening process is restricted to only contract nodes within a community. This approach is intended to maintain global community structure from the original hypergraph in the final coarsest representation. While both methods leverage the bipartite representation to find initial node features, embedding-based coarsening improves upon community-based coarsening by relaxing the requirement that nodes can only be coarsened within a community. Nodes within a modularity maximizing community are internally dense and externally sparse [newman2010networks]. As a result nodes sharing a community are more likely to co-occur in any local sampling strategy employed by a graph embedding algorithm. Therefore, it is likely that the natural clusters within our considered graph embeddings are similar to the communities found by KaHyPar. However, these embeddings inform additional global relationships between clusters that are lost when each community is coarsened independently. For instance, nodes on the boundary of two communities will likely receive embeddings spatially located between two clusters. This distinction is able to remain in the coarsest representation of the hypergraph, and may be lost in community-based coarsening when nodes are initially split due to community assignments.

Memetic partitioning, also proposed for KaHyPar, uses the principles of genetic algorithms to discover improved partitioning solutions [andre2018memetic]. This approach creates high quality partitions by iterating through different “generations” of solutions, starting with an initial generation produced by KaHyPar run multiple times with different seeds. From the initial set, multiple combination operators “breed” new solutions by combining some number of “parents” to form new solutions. Each iteration is designed to improve the population’s average metric. Combination operators are specifically posed such that offspring solutions perform at least as good as its corresponding parents. While this approach is demonstrated to improve overall hypergraph partitioning quality, it does so by adding a meta process to the set of initial hypergraph solutions. We anticipate that adding embedding-based coarsening as a method for generating a high quality initial solution population may be a complimentary way to improve the overall process.

The proposed embedding-based coarsening extends the relaxation-based coarsening developed by Shaydulin et al. [shaydulin2019relaxation] in Zoltan. This work introduces algebraic distance for hypergraphs, which in turn extends a similar measure designed for traditional graphs [chen2011algebraic]. Algebraic distance is a similarity measure that takes into account distant neighborhoods of vertices, enabling the coarsening process to exploit the global structure of highly irregular hypegraphs. Algebraic distance is computed by an iterative process that is shown to stabilize quickly [shaydulin2019relaxation], requiring only tens of iterations to obtain rich latent features. As such, this method is additionally found within the Algebraic Heterogeneous Bipartite Embeddings we consider (AHBE) [sybrandt2019heterogeneous]. However, as uncovered in that work, neural graph embeddings can learn additional latent features not often captured by algebraic distance alone.

Aggregative coarsening [shaydulin2018sea] uses ideas from algebraic multigrid, extending an unfinished attempt published in Sandia Summer Reports [buluc-boman]. At each step of the coarsening process a set of seed vertices is selected. Each seed then becomes a center of an aggregate, with non-seeds assigned to seeds using different aggregation rules. An aggregate at finer level forms a vertex at coarser level. Two aggregation rules, based on inner product matching and stable matching were explored. Our embedding-based coarsening can be used within the aggregative coarsening to inform the aggregation rules.

Fig. 4: Macro-average improvement of the metric across all considered graphs and methods. We performed 20 partitions per-graph per-method using different seeds. Additional result matrices available online.

7 Conclusion

In this work we propose embedding-based coarsening, an approach that uses latent features present in a pretrained hypergraph embedding to better solve the hypergraph partitioning problem. We do so by prioritizing nodes that share many latent features during the coarsening process, and then leveraging a combination of traditional and embedding-derived features when determining coarsening partners. We evaluate this approach over multiple trials per combination of 96 graphs, 7 partition counts, 6 pretrained embedding methods, 5 baseline partitioners, 3 implementations, and 2 objective functions. We observe a significant increase in quality for small values of (from 2 until about 16) gained from embedding-based coarsening. For higher values of we observe overall quality that returns to the state-of-the-art baseline. All experiments, plots and code are available in our online appendix at sybrandt.com/2019/partitioning.

An important future research direction is related to the embedding-based coarsening for large as the improvement we observe is less significant. One potential explanation is that our fixed sized embeddings only contain a relatively small number of latent clusters. This would imply that beyond certain small , most coarsening comparisons will occur within a single cluster, wherein all nodes are similar. However, we demonstrate that using the proposed embedding-based coarsening one can improve the solution quality of existing hypergraph paritioners by about for small , and up to on particular graphs with rich latent structure. For example, this method increases the quality of Zoltan above that of KaHyPar with flow-based refinement in some cases, which is particularly important as then -level paradigm implemented in Zoltan exposes substantially more parallelism than the -level counterpart. We also note that our algorithm is embedding-agnostic and is ready to incorporate other types of embeddings that can potentially work better for specific types of instances.

8 Acknowledgements

We would like to thank Sebastian Schlag from the Karlsruhe Institute of Technology for helping us to understand KaHyPar. This work was supported by NSF awards MRI #1725573, DMS #1522751, and NRT #1633608.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
390199
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description