Hypergraph Partitioning With Embeddings
Abstract
The problem of placing circuits on a chip or distributing sparse matrix operations
can be modeled as the
hypergraph partitioning problem. A hypergraph is a
generalization of the traditional graph wherein each “hyperedge” may connect
any number of nodes. Hypergraph partitioning, therefore, is the NPHard
problem of dividing nodes into similarly sized disjoint sets while
minimizing the number of hyperedges that span multiple partitions. Due to this
problem’s complexity, many partitioners leverage the multilevel heuristic of
iteratively “coarsening” their input to a smaller approximation until an
inefficient algorithm becomes feasible. The initial solution is then propagated
back to the original hypergraph, which produces a reasonably accurate result
provided the coarse representation preserves structural properties of the original.
The multilevel hypergraph partitioners are considered today as stateoftheart solvers that achieve an excellent quality/running time tradeoff on practical largescale instances of different types.
In order to improve the quality of multilevel hypergraph partitioners,
we propose leveraging graph embeddings to better capture structural properties
during the coarsening process. Our approach prioritizes dense subspaces found at
the embedding, and contracts nodes according to both traditional and embeddingbased
similarity measures.
Reproducibility: All source code, plots and experimental data are available at https://sybrandt.com/2019/partition.
[color=red]is \definechangesauthor[color=blue]js \definechangesauthor[color=green]rs
1 Introduction
In order to model problems that contain interconnected groups of items, such as the various data dependencies between processes found in large scientific applications, many leverage the formalism of hypergraphs. A hypergraph is similar to a traditional graph, with the added generalization that the “hyperedges” may connect any number of nodes. Hypergraphs have been used in VLSI design [karypis1999multilevel], machine learning [zhou2007learning, hein2013total, zhang2017re], parallel algorithms [catalyurek1999hypergraph], combinatorial scientific computing [naumann2012combinatorial], and social network analysis [shepherd1990transient, zhang2010hypergraph].
The hypergraph partitioning problem is that of dividing the nodes of a hypergraph among similarlysized disjoint sets. A good partitioning is one that minimizes the number of hyperedges spanning multiple partitions. In the context of combinatorial scientific computing and load balancing, this is the problem of dividing logical threads (nodes) across the various available machines (partitions) in order to reduce the amount of communication necessary between machines (cut hyperedges). Unfortunately, it both is NPHard to solve [lengauer2012combinatorial] or accurately approximate [bui1992finding] a solution to this problem.
To mange the complexity of hypergraph partitioning, practitioners turn to heuristical algorithms [shhmss2016alenex], such as the multilevel paradigm [andre2018memetic, shaydulin2019relaxation, karypis2000multilevel, boman2009advances, devine2006parallel, chevalmlpartcompar]. The multilevel approach consists of a VCycle containing three phases, depicted in Figure 1. The Vcycle starts by iteratively coarsening the input hypergraph. Each iteration of the coarsening creates new coarse nodes by contracting groups of nodes in the the current set. These contractions are determined through a matching process that is informed by some similarity measure so that the resulting approximation retains the structural features of the original problem. This allows the coarse level partition to be interpolated to the nextfiner level without applying too many refinement steps that may substantially slow down the entire multilevel framework. Coarsening continues until the approximate hypergraph is small enough to partition directly, forming the initial solution. Multilevel partitioners then expand this approximate solution by iteratively uncoarsening to the original input. At each stage of the uncoarsening process, solvers interpolate the coarse solution and perform a local search or other methods to refine it. The resulting solution, once gradually refined to the highest level, becomes the final partitioning.
Because the initial solution to a multilevel algorithm propagates through the entire uncoarsening process, it is important to create a coarsened representation that shares structural properties with the original hypergraph. In order to improve coarsening, other solvers have exploited clustering and community detection techniques [hs2017sea], algebraic distance [shaydulin2019relaxation], and others. However, recent advances in graph embedding [sybrandt2019heterogeneous] indicate that the latent spaces found by unsupervised machine learning algorithms can better identify structural similarities between nodes.
1.1 Our Contribution
In this work we propose exploiting latent node representations gained through embeddings to better coarsen large hypergraphs for partitioning. First, we apply starexpansion [agarwal2006higher] to gain a bipartite representation of the input hypergraph. Then we learn latent structural features of this graph using a graph embedding method. Note that our algorithm is agnostic to the particular embedding. These dense realvalued embeddings inform our coarsening algorithm to prioritize more similar nodes at each level of coarsening. Then, we identify coarsening partners by comparing latent features in conjunction with traditional edgewise features. After each iteration, we assign newly coarsened nodes an embedding equal to the centroid of their primal embeddings.
We implement our coarsening algorithm in both the level solver KaHyPar [shhmss2016alenex], as well as the level solver Zoltan [devine2006parallel]. In the case of KaHyPar, we evaluate our coarsening under its original uncoarsening strategy, as well as its recent flowbased refinement [heuer2018network]. We also compare our solution quality when using six different graph embeddings: Node2Vec [grover2016node2vec], Metapath2Vec++ [dong2017metapath2vec], Boolean and Algebraic Heterogeneous Bipartite Graph Embeddings [sybrandt2019heterogeneous], as well as two combination embeddings, also proposed in [sybrandt2019heterogeneous].
We evaluate our implementations against five stateoftheart partitioners: hMetis [karypis1998hmetis], Zoltan [devine2006parallel], PaToH [ccatalyurek2011patoh], KaHyPar (with communitybased coarsening [hs2017sea]), and KaHyPar Flow (with both communitybased coarsening and flowbased refinement [heuer2018network]). For each method we additionally compare both the cut and optimization objectives^{1}^{1}1Note that hMetis does not optimize for . For this objective we only compare against the remaining four.. Our evaluation spans a range of the number of partitions from 2 to 128, and 96 graphs from the SuiteSparse Matrix Collection [davis2011university]. For each combination of proposed implementation, baseline method, optimization metric, partition count, and hypergraph we perform twenty trials each with a different random seed and relabeling of the input graph. This analysis consists of over halfamillion individual experiments.
We report summary statistics for the improvement of each proposed implementation when compared to each baseline method. Specifically, we consider the improvement relative to the minimum, maximum, and average observed objective value as well as the standard deviation of trials. We additionally supply more detailed plots in the online appendix for the improvements of all graphs across all method comparisons in order to highlight graphwise difference. These plots display the averages and standard deviations of each graph per comparison, and include statistical significant values. Additionally, all experimental data for each individual trial, including parameter settings, is available as a publicly downloadable MongoDB database dump^{2}^{2}2https://sybrandt.com/2019/partition.
Using our proposed coarsening, we observe a significant improvement between each implementation and its directly comparable baseline (e.g., our modified Zoltan against the baseline Zoltan). We observe, however, that the improvement gradually vanishes as the number of parts is increasing, indicating a promising future research direction. In some specific cases, such as hypergraphs representing social networks, our coarsening can find partitioning solutions that are over 400% better than the existing solutions. Our coarsening also improves the standard deviation of results. Typical multilevel solvers visit nodes in a random order for each level of coarsening. Our approach replaces this with a prioritized visit order derived from embeddings. This change decreases the standard deviation for almost all scenarios by over and often as high as . All experimental code, data, visualization scripts, and results are publicly available at https://sybrandt.com/2019/partition.
2 Background
A hypergraph is an ordered pair , where is the set of nodes and is the set of hyperedges. Each hyperedge is a nonempty subset of . In hypergraph partitioning the goal is to split the set of nodes into disjoint subsets or parts such that while minimizing an objective function over cut hyperedges subject to an imbalance constraint factor . A hyperedge belongs to the cut if it contains nodes from at least two parts: iff . Both nodes and hyperedges can have weights, namely, , and , for each , and , respectively. In this paper we consider two objective functions: “cut” and “”. The cut is the sum of weights of cut hyperedges: . Connectivity of an edge is defined as the number of parts an edge spans. The metric is then defined as . Note that for these two metrics are equivalent. The imbalance factor ensures that for each part the following holds .
Many partitioning algorithms assign weights to both nodes and hyperedges. Initially, weights are all set equally to 1. Once coarsened, the weight of a newly coarsened node is set equal to the sum of the weights of the contracted fine nodes. Coarse hyperedges are similarly weighted whenever two hyperedges are merged.
2.1 Multilevel Hypergraph Partitioning
Multilevel algorithms solve problems by constructing a hierarchy of subproblems that approximate the original. These “coarsened” subproblems contain fewer degrees of freedom and are therefore easier to solve. The multilevel approach captures the global structure of the problem by combining local information at different levels of coarseness. Originally introduced to speed up existing algorithms [barnard1994fast] and inspired by multigrid and multiscale optimization strategies [vlsicad], the multilevel method was quickly recognized to be a good way to improve the quality of partitioning [karypis1998fast] and is currently considered to be one of the stateoftheart methods [bulucc2016recent] for this problem. In the context of hypergraphs, one constructs a multilevel hierarchy by merging nodes — multiple nodes at the finer level become a single node in the coarser level. Once reduced to a sufficiently small problem, a multilevel partitioner can solve the coarse parititoning problem using an algorithm that would be infeasible on largescale instances. This initial solution is iteratively uncoarsened by first interpolating it onto a finer level and then refining it. The refinement is typically performed using local search or other methods. The coarseninguncoarsening pipeline is commonly referred to as Vcycle (see Fig. 1). Traditionally, at each level of the coarsening process all or almost all nodes have at least one merging partner, resulting in levels. This is the approach used by Mondriaan [vastenhouw2005two], hMetis2 [karypis1999multilevel], Zoltan [devine2006parallel], and PaToH [ccatalyurek2011patoh]. However, KaHyPar [shhmss2016alenex] implements an level approach where at each level only one pair of nodes is contracted. Over the years the multilevel method has become the gold standard in hypergraph partitioning to achieve an excellent time/quality tradeoff in many practical cases and is used by most stateoftheart solvers, including all of the ones discussed in this paper. For an extensive review of (hyper)graph partitioning methods, the reader is referred to [bulucc2016recent, bichot2011graph].
When constructing coarser hypergraphs, stateoftheart partitioners contract nodes according to some heuristic such that during the uncoarsening the solution can be interpolated from coarser levels without the loss of quality. These methods typically make coarsening decisions based on some similarly measure that can be computed on nodepairs. Most multilevel hypergraph partitioners, including Mondriaan [vastenhouw2005two], hMetis2 [karypis1999multilevel] and Zoltan [devine2006parallel], measure inner product or its variations, such as absorption (PaToH [ccatalyurek2011patoh]) and heavy edge (hMetis2 [karypis1999multilevel], Parkway [trifunovic2008parallel], PaToH [ccatalyurek2011patoh] and KaHyPar [hs2017sea]).
The inner product of two nodes is defined as the Euclidean inner product of the weighted hyperedge incidence vectors [devine2006parallel]. This similarity measure and its variations are simple and computationally inexpensive, but are limited due to only using local information. Recently, a number of advanced coarsening schemes were introduced to address this limitation. Shaydulin et al. introduce a relaxationbased similarity metric algebraic distance [shaydulin2019relaxation], extending a similar approach from graphs [chen2011algebraic]. In [shaydulin2018sea] this approach is extended and incorporated within an aggregative coarsening scheme, inspired by algebraic multigrid and stable matching approaches. An unfinished but promising attempt to generalize hypergraph coarsening using algebraic multigrid (AMG) on graphs [safro2009multilevel] was published in Sandia Labs Summer Reports [bulucboman]. In AMG coarsening [safro2015advanced, Safro2006, ron2011relaxation], instead of being contracted, the nodes are split into fractions which form coarse aggregates. Heuer and Schlag introduce a communityaware coarsening that uses global clustering information to restrict matching between communities [hs2017sea].
During uncoarsening, nodes are uncontracted and the coarserlevel partition is interpolated to the finerlevel node set. Then the solution is iteratively refined using a local nodemoving heuristic. A majority of hypergraph partitioners use a variation of FiducciaMattheyses [fiduccia1988linear] or KernighanLin [kernighan1970efficient] to perform these local searches [heuer2018network, vastenhouw2005two, karypis1999multilevel, devine2006parallel, ccatalyurek2011patoh, trifunovic2008parallel]. Using a local search heuristic at the uncoarsening stage allows these partitioners to locally improve the global solution interpolated from coarser levels. Recently, Heuer et al. introduced a flowbased refinement scheme for way hypergraph partitioning [heuer2018network], extending similar approaches from graph partitioning [sanders2011engineering].
2.2 Hypergraph Embeddings
The SkipGram text embedding model presented by Mikolov et al. learns embeddings by discovering the relationship between each word and its typical context [mikolov2013efficient, mikolov2013distributed]. This model underpins many graph embedding models [perozzi2014deepwalk, grover2016node2vec, dong2017metapath2vec]. In order to efficiently handle large volumes of text the SkipGram model samples “windows.” Each window is centered around a target word, and includes local context both leading and trailing the target. The SkipGram model learns to predict each window’s contents given the target word. The underlying assumption behind this approach is that “similar words share similar company,” and has shown to result in semantically rich latent features [tsvetkov2015evaluation, gladkova2016analogy].
Deepwalk, a pioneer graph embedding technique presented by Perozzi et al., reduces the graph structure to an analogous textual problem in order to also leverage the SkipGram approach [perozzi2014deepwalk]. Here, the underlying assumption is “similar nodes share similar company,” however graphs present additional challenges not found in text. Firstly, representing a node’s “company” is nontrivial. To simply take all firstorder neighbors of a target node may not be sufficient, or may contain more neighbors than fit in memory. To reconcile this, Perozzi et al. proposes randomwalk sampling. Pseudosentences form when traversing a graph, wherein each node is analogous to a word and each random walk is analogous to a sentence. These random walks serve as input to the hierarchical SkipGram model, similar to that proposed by Mikolov et al.
Extending this approach, Grover and Leskovec modify randomwalk sampling to add a “return probability” parameter [grover2016node2vec]. They observe that typical depthfirst random walks capture structural similarities, while breadthfirst approaches (such as the LINE embedding method [tang2015line]) capture homophilic relationships. In order to improve overall embedding quality, Node2Vec random walks strike a balance between the two traversal strategies by probabilistically doublingback on old neighborhoods, or forging onward to unseen areas.
The abovementioned graph embedding techniques assume nothing is known about the considered graph’s structure. However, recent methods address particular graph techniques that are applicable to hypergraphs. Metapath2Vec++, proposed by Dong et al., assumes each node is of a particular type, and that certain “metapaths,” path descriptions containing only type information, are known to be meaningful [dong2017metapath2vec]. In the case of hypergraphs, we can perform a starexpansion to convert each hyperedge to a new layer of nodes, which converts our input hypergraph into a traditional bipartite graph [agarwal2006higher]. This representation has two node types, original nodes, and those derived from hyperedges. Due to its bipartite structure the only metapath is that of alternating types. However, due to the model architecture of Metapath2Vec++, we can learn some typespecific latent features for each.
FOBE + HOBE Details: Further recent work addresses the bipartite case specifically. Sybrandt and Safro present multiple methods for embedding bipartite graphs [sybrandt2019heterogeneous]. These include First and HighOrder Bipartite Embeddings, as well as a combination approach to learn joint embeddings on bipartite graphs. These approaches are applicable to hypergraphs as represented through starexpansion. These methods model the two distinct types of nodes present in a bipartite network separately in order to better capture sametyped features. In the context of hypergraphs this is analogous to modeling nodes and hyperedges separately. For the purpose of this work however, we only consider the embedding of nodes present in the original hypergraph.
The firstorder approach, FOBE, presented by Sybrandt and Safro samples observed node similarities and then learns embeddings to encode those similarities via dot product. Nodes are deemed “similar” in this context if they share an edge, or a neighbor. Formally, if is an undirected bipartite graph with nodes and edges , an edge from nodes is indicated as , and indicates the neighborhood of , then the similarity measured by FOBE is:
(1) 
Note that for two nodes to be measured as similar by the above equation, they must either be of different bipartite types and share an edges, or of the same type and share a neighbor of the opposite type. FOBE then encodes the above similarities into node embeddings. However, the objective used to learn these embeddings is constructed to only explicitly compare nodes of the same type. If and are disjoint subsets of indicating the two types present in the bipartite graph, and is a function that related node to its embedding in , then the various encoded similarities are represented as the following:
(2) 
(3) 
(4) 
Here, and indicate the similarity shared between the nodes of the same type. Then, decomposes the similarity of crosstyped nodes into sets of sametyped comparisons. These predicted similarity measures derived from embeddings are fit to the observed samples above simultaneously.
The highorder embedding method (HOBE) presented by Sybrandt and Safro extends FOBE by sampling neighborsofneighbors, and prioritizes these similarities through the local heuristic signal provided by algebraic distance on graphs [ron2011relaxation, chen2011algebraic, shaydulin2019relaxation, john2016single]. This approach begins with a fast iterative relaxation technique that places all bipartite nodes on the interval such that locally similar nodes are more likely to have similar values. Multiple trials with random initializations boost this signal by reducing the effect of incidental proximity observed between distant nodes in a single trial. Formally, this algebraic similarity measure is determined by first calculating algebraic coordinates for each node . These coordinates are randomly initialled , and are refined over iterations via the following:
(5) 
Here, indicates the trail (), indicates the iteration, and is the damping factor (suggested in [shaydulin2019relaxation]). These coordinates pertrial are then combined into a more robust similarity measure through the following:
(6) 
Building from this heuristic signal, the HOBE similarity measures the presence of highlysimilar shared neighbors through the following:
(7) 
(8) 
In a manner similar to FOBE, these three similarity measures are encoded into the dot product of embeddings through a combined objective function.
In this work we explore the solution quality of our coarsening algorithm using a number of different embedding methods. We select Node2Vec and Metapath2Vec++ as well as both FOBE and HOBE to explore. In addition, we train two combination embeddings, one that merges all four methods, and another that combines only FOBE and HOBE. We do not attempt to demonstrate that any individual embedding is superior for hypergraph partitioning, on the contrary we demonstrate in Section 5 that all embeddings improve the partitioning quality, showing that such embeddings are an excellent tool for advanced coarsening schemes potentially not only for the partitioning problem. Node2Vec allows us to evaluate a generic embedding technique not designed with hypergraphs in mind, Metapath2Vec++ evaluates a method shown to transfer well to hypergraphs [sybrandt2019heterogeneous], while the Heterogeneous Bipartite approaches are designed to facilitate hypergraph learning.
3 Method
In order to improve the quality of multilevel hypergraph partitioning solvers, such as Zoltan [devine2006parallel] and KaHyPar [shhmss2016alenex], we take advantage of graph embedding techniques. These methods learn dense, realvalued representations in a fixedsized vector space for each node. In the case of traditional graphs, Grover et al. demonstrate that these embeddings can capture both structural and homophilic latent relationships [grover2016node2vec]. Additional work from Sybrandt and Safro demonstrates that these methods extend to hypergraphs [sybrandt2019heterogeneous] through star expansion [agarwal2006higher].
Graph embedding methods typically encode observed similarities through some similarity measure. In the case of Algebraic and Boolean Heterogeneous Bipartite Embeddings, these similarities are explicitly modeled using the dot product [sybrandt2019heterogeneous]. The same similarity measure is also found in more traditional methods such as LINE [tang2015line]. Semantically, dot product implies that two nodes are similar if they share common prominent features. Unlike cosine similarity, the dot product is not normalized, and therefore does not significantly penalize nodes for being dissimilar, provided their dissimilar values are near zero. We observe that dot product also applies to other graph embedding techniques, such as the SkipGrambased methods used in Node2Vec [grover2016node2vec], Deepwalk [perozzi2014deepwalk], and by extension, MetaPath2Vec++ [dong2017metapath2vec]. While the specifics of each method are beyond the scope of this work, we note that dot product is a robust measure of similarity across embeddings.
We exploit graph embeddings to better match nodes during coarsening. The typical matching process, in both level and level coarsening, identifies pairs of similar nodes, called coarsening partners, to merge in the nextcoarsest representation. The resulting coarsened node becomes a member of all hyperedges incident to both and . As a result, the overall partitioning solution can be drastically altered by the quality of these nodepairs, as demonstrated below in Section 5.
One common node similarity measure for finding coarsening partners is an inner product of edge features. In KaHyPar [shhmss2016alenex], this measure is a ratio between edge weight and size, as reproduced in (9). Here, corresponds to the weight of a coarsened hyperedge, which indicates the number of original hyperedges containing the same coarsened node set.
(9) 
This measure prioritizes nodes sharing many “tight” hyperedges, those with fewer members, as these tend to be more meaningful in realworld applications. For instance, members of a selective club or shoppers buying a niche ingredient are likely more selfsimilar than those buying bread or belonging to a massive organization. However, this model equally prioritizes all hyperedges of similar size, even if they contain a random assortment of nodes. To improve this coarsening measure, we introduce a term derived from a pretrained graph embedding.
Hypergraph embeddings, typically derived from the bipartite representation, project nodes into a fixeddimensionality vector space [sybrandt2019heterogeneous]. While the dimensionality of this space is a hyperparameter to an embedding model, typical values range from 100 to 1,000 and are robust to small changes. As a result, many methods capture similarities mathematically through the inner product of embeddings [grover2016node2vec, sybrandt2019heterogeneous, dong2017metapath2vec]. Formally, we represent the pretrained embedding as a function mapping each node to a dimensional vector. We represent the embedding similarity between two nodes as
(10) 
These embeddings can capture both structural and homophilic latent properties [grover2016node2vec]. Structural properties include hubs, bridges, and leafs, while homophilic properties include clusters and common neighbors. Different embedding techniques prioritize different latent features, and we explore six different embedding schemes to underpin our coarsening. These methods are outlined in detail in Section 2.2. However, we observe that all six considered embedding improve overall coarsening results (see Figure 4 as well as the online appendix).
We combine both hyperedgewise and embeddingwise similarities into a single measure for each node pair. As a result, two nodes will be selected as coarsening partners if they share both many hyperedges as well as many latent features. This formulation provides a mechanism to lessen the impact of hyperedges without selfsimilar content, because the similarity conveyed by a tight hyperedge will be lessened by the dissimilarity conveyed in the embedding. In addition, we add in a regularization term to maintain balance between node weights. The weight of a coarsened node is simply the number of original nodes that have been merged together in the coarsened representation. Without this penalty, dense subregions of the hypergraph could be coarsened entirely before anything else (in the level case), resulting in an imbalanced solution. Our resulting score is formally put
(11) 
Note that to receive a high score given our proposed method, two nodes must share hyperedges, have similar latent features, and be of reasonably small weights. By including the edgewise inner product, our method cannot coarsen disparate regions of the network that happen to share similar latent features, which can arise from some embedding techniques. For instance, disconnected subgraphs may be embedded in overlapping subspaces, and a simpler embeddingonly similarity measure would then conjoin the disconnected components.
We additionally apply the latent information present in embeddings to order nodes when identifying coarsening partners. Our goal is to match the pairs with the highest similarity first, so that the resulting coarsened nodes more likely to share the same higherorder structural feature, such as a cluster or role. We sort nodes by their nearest neighbor in the embedding space, and penalize this similarity again by weights. We restrict the nearestneighbor search to those nodes actually sharing hyperedges, as to match the scores calculated above. Formally, the sorting criteria we propose is as follows
(12) 
where represents the neighborhood of node , namely, .
We present our overall matching algorithm in Procedure 1. All nodes begin unmatched, as indicated by , a Boolean characteristic vector of (un)matched nodes. We then visit each node in sorted order, according to the above criteria. Provided a visited node is unmatched, we iterate its neighborhood and consider any unmatched neighbor that would not result in a coarse node above the weight tolerance. Out of these considered nodes, we select whichever has the highest score according to Eq. 11.
After coarsening, newly contracted nodes are assigned an embedding equal to the centroid of its primal nodes. In this context, a primal node is a fully uncoarsened node specified at the finest level of the problem. For instance, if at a given level of coarsening we match and , the resulting coarse node would have the following properties. Here represents the newly coarsened node, represents the modified edge set, and represents the set of primal nodes corresponding to node . At the finest level, .
(13)  
(14)  
(15)  
(16)  
(17)  
(18)  
(19) 
4 Experimental Design
In order to evaluate the partitioning quality of our proposed coarsening method, we implement our matching algorithm in both KaHyPar [shhmss2016alenex] and Zoltan [devine2006parallel]. Our KaHyPar implementation adds a new coarsening class to replace the existing communitybased structure, and maintains other KaHyPar features such as its direct way initial solution. We evaluate this implementation with both traditional vertexswapping refinement as well as more recent flowbased refinement [heuer2018network]. In the case of Zoltan we introduce a new function to evaluate nodes during matching. Our implementation also requires minor modifications elsewhere in the software package in order to address reindexing during recursive bisection. These changes do not effect the actual coarsening algorithm, as each call to recursive bisection begins with a subset of nodes and hyperedges from the original hypergraph.
In order to quantify the improvement in quality gained by embeddingbased coarsening, we compute a number of partitions under a variety of scenarios. This begins with a set of embeddings. Due to resource constraints, we only embed each graph once for each considered technique and reuse this embedding in different runs. This compromise is necessary because graph embedding can more expensive than the considered multilevel hypergraph partitioners by orders of magnitude, determined often by the efficiency of the embedding software. Furthermore we note that the problem of embedding coarsened hypergraphs is nontrivial. We observe a significant decrease in overall solution quality when attempting to recompute embeddings at intermediate coarse levels, as the considered methods were not indented to capture to small weighted structures. Ultimately we find that this challenge lies outside the scope of this work.
The set of embedding techniques we explore consists of Node2Vec [grover2016node2vec], Metapath2Vec++ [dong2017metapath2vec], FOBE, and HOBE [sybrandt2019heterogeneous], as well as two combination embeddings (also presented in [sybrandt2019heterogeneous]). The first combination merges only FOBE and HOBE, while the second combination merges all four previously stated embeddings. All considered embeddings are in . While higherdimensional embeddings have the ability to capture more complex latent structure, this complexity can also lead to poorer convergence while training. We performed an initial experiment comparing 100 to 500dimensional embeddings of our hypergraph set, and observed no significant difference in solution quality. In addition, we do not claim to extensively test our coarsening against all state of the art embeddings, only that our proposed technique is robust to different embedding algorithms.
Each of the six input embeddings combines with each of the three proposed implementations, KaHyPar, KaHyPar Flow, and Zoltan, to create a set of eighteen proposed partitioners with embeddingbased coarsening. We add to this five baseline methods: hMetis [karypis1998hmetis], Zoltan [devine2006parallel], PaToH [ccatalyurek2011patoh], KaHyPar (with communitybased coarsening [hs2017sea]), and KaHyPar Flow (with both communitybased coarsening and flowbased refinement [heuer2018network]). This results in 23 considered partitioners. For each of the partitioners, we run separate trials optimizing for cut and respectively. The differences between these objectives is defined in detail in Section 2.
For all combination of partitioner and objective we additionally compare across a range of values. Many solvers identify a larger number of partitions through recursivebisection (all considered except KaHyPar), which iteratively partitions the input hypergraph into two parts until reaching the desired number of partitions. For this reason we compare different numbers of partitions corresponding to the powers of two from 2 to 128. For each of these scenarios, we apply an overall imbalance tolerance of 3%. Then, for each combination of partitioner, objective, and value, we compare across a benchmark of hypergraphs.
Our benchmark consists of 86 sparse matrices selected from the SuiteSparse Matrix Collection [davis2011university]. These matrices span a range of domains including social networks, power grids, and linear systems. We interpret each matrix as the incidence matrix of a hypergraph. In doing so, we consider each row to represent a node, each column to be a hyperedge, and a nonzero value in to indicate node participates in hyperedge .
We additionally include ten synthetic hypergraphs that were designed to test the robustness of the coarsening process, extending a similar approach from graphs [safro2015advanced]. These graphs are a mixture of graphs that are weakly connected between each other, with less than of edges connecting different graphs in the mixture. In multilevel setting, this can cause the coarsening process to incorrectly contract edges between different graphs in the mixture, resulting in uneven coarsening, overloaded refinement and worse quality of the final solution. This structure can be found in many realworld graphs, including multimode networks [tang2008community] and logistics multistage system networks [Stock2006]. We introduce additional complexity by adding additional random edges (denoted in the online appendix as “W/ Noise”). Full graphs, as well as scripts used to generate them are available in the online appendix.
Our overall benchmark suite of 96 graphs is explored in detail in the online appendix, wherein we present node and hyperedge distributions for all graphs. All names provided, except for our newly generated synthetic graphs, correspond to those found in the Sparse Matrix Collection.
For each combination of partitioner, objective, value, and graph, we compute twenty trials, with a total of over halfamillion trials. For each trial we generate a new random seed and randomly relabel the node and hyperedge indices.
In order to quantify the difference in quality between the two compared methods, we rely on summary statistics such as the macroaverage of improvement. We define the “improvement” as a ratio between the partitioning quality of some baseline method against some comparison method. Note that if the comparison method achieves a cut or value that is lower than the baseline, the improvement will be greater than 1. We compute four different improvement statistics between two methods: average, minimum, maximum, and standard deviation. In this way we compare the expected, worstcase, and bestcase observed partitioning quality, as well as the variance of results. In the following equations, indicates the improvement between a baseline and comparison algorithm for the partition problem on hypergraph with respect to metric . Let represent the trial of algorithm . For our experiments we run of these trails.
(21) 
(22) 
(23) 
(24) 
(25) 
We can then reduce the overall comparison of two methods with respect to a particular optimization metric into the macroaverage of improvement across all graphs in the benchmark. Here, represents the set of considered benchmark.
(26) 
5 Results
Following the experimental procedure described in Section 4, we evaluate the partitioning quality of our proposed coarsening algorithm. Due to the volume of experimental trials performed for our evaluation, we can only present summary statistics for representative experiments in the body of this work. Full results are available online.


We present the macroaverage improvement () gained by the embeddingbased coarsening for each implementation across all partitions counts in Table III (a and b) for the and cut metrics. These tables compare each implementation against its corresponding baseline without embeddingbased coarsening. For instance, KaHyPar without flowbased refinement [hs2017sea] is compared to the same KaHyPar without flowbased refinement but with the embeddingbased coarsening.
Figure 2 depicts improvement of representative methods to KaHyPar with flowbased refinement, the top performing baseline [heuer2018network]. Note that in these plots we use “EC” to refer to embeddingbased coarsening. Additionally, in these tables and figures we select the FOBE embedding for both the KaHyPar and Zoltan implementations to represent overall embedding quality, as all considered embeddings perform similarly. Furthermore, Figure 3 depicts the relative improvement gained by embeddingbased coarsening pergraph for the 2partition problem. Our analysis focuses on insights that can be observed from these representative results. Overall we observe that embeddingbased coarsening increases average quality across almost all considered comparisons. This improvement is typically greater for low partition counts.
In the case of KaHyPar, an improvement is the result of replacing the existing communitybased coarsening [hs2017sea]. The communitybased coarsening, which is further discussed in Section 6, restricts coarsening partners to only nodes that share a community in the finest level of the input hypergraph. Our embeddingbased coarsening is similar in the sense that node communities are likely to share similar embeddings. However, by introducing node embeddings we relax this constraint. As a result our approach gains in solution quality by occasionally merging across communities, which is particularly important when merging hubs or bridges that may border multiple communities.
When comparing the partitioning quality across all KaHyPar trials, we observe that KaHyPar with embeddingbased coarsening but without flowbased refinement can find better partitioning solutions than KaHyPar with flowbased refinement. Specifically we observe an average improvement of for the metric for and , demonstrated in Figure 2. Furthermore, some graphs partitioned with embeddingbased coarsening in Zoltan outperform even the latest KaHyPar version. Applications attempting to partition particularly large hypergraphs may benefit from this result as embeddingbased coarsening and ()level partitioners expose more parallelism than the level KaHyPar design.
For small this flexibility appears to be the most valuable, as these latent embedding spaces may only detect a handful of relevant groundtruth clusters. For higher , the number of partitions appears to exceed the number of natural divisions in our embedding space. Our KaHyPar implementation particularly struggles here, which we observe is likely to result from the direct way initial solution this method identifies [ahss2017alenex]. In contrast, Zoltan, which uses recursivebisection to recursively 2partition the input hypergraph, is more resilient for larger . Recursive bisection has the effect of splitting the input embedding into two subspaces. When combined with embeddingbased coarsening, these subspaces divide the key axes of variance within the embedding spaced. Then, the next iteration need only consider locally relevant differences within each respective subspace, which retains more locallyrelevant information. This effect is what keeps Zoltan’s partitioning quality competitive with KaHyPar flow for larger , as seen in Figure 2.
Examining the standard deviation results shown in Figure 2, we observe that embeddingbased coarsening decreases the standard deviation of possible results for a given hypergraph. These figures corresponding to the standard deviation of both the and cut metrics demonstrate that the macroaverage improvement of standard deviation is often substantial, and occasionally over an order of magnitude. This result comes from replacing the typically random nodevisit order with a sorted ordering dependent on each node’s nearestneighbor. If addition, the figures corresponding to the minimum and maximum and cut observed pettrial all demonstrate that embeddingbased coarsening consistently improves the expected worstcase () and average bestcase () quality. Many applications run multiple partitioning trials and select the topperforming result [trifunovic2006parallel], however by reducing the variance of results to an improved range, our coarsening approach could improve overall application efficiency.
Looking into the graphwise results, shown in Figure 3, we observe that there is a class of hypergraphs that are best aided by embeddingbased coarsening. We observe that embeddingbased coarsening can identify partitions with 200400% improvement in graphs with rich latent structure, such as the communication networks corresponding to the Enron or European Union email networks (as found in [davis2011university]). Additionally, some synthetic graphs, those constructed through a starshaped merge of multiple realworld networks and are designed to complicate the coarsening process, are similarly improved. These improvements on this class of graphs is also highly statistically significant (). These highlyimproved graphs have rich latent global structure that may not be accurately captured through hyperedgewise features. For example, the departmental structure within Enron is lost when individually considering emails between particular employees. While not every graph can be exploited to that magnitude, we do observe that a significant portion of our benchmark is significantly improved, and a further portion is merely unchanged. We do however observe a subset of graphs that are partitioned worse with embeddingbased clustering. Nemsemm2, a sparse matrix corresponding to a linear program, is partitioned almost threetimes worse using embeddingbased coarsening. The incidence matrix of this hypergraph is nearly blockdiagonal, which results in significant hyperedgewise features that are not translated into an embedding, as disjoint graph regions are often embedded in overlapping spaces. In contrast, Nemswrld is another linearprogram sparse matrix published by the same group, but is less blockdiagonal and receives an statistically significant average improvement of about . Each of these above results refer to a partition performed by KaHyPar (Figure 4.
6 Related Work
Our proposed embeddingbased coarsening is similar to the communitybased coarsening proposed by Heuer et al. and used in our KaHyPar baseline [hs2017sea]. Their approach begins with a “community detection phase” wherein traditional community detection algorithms cluster the nodes contained in the bipartite starexpansion of the original graph. From there, the coarsening process is restricted to only contract nodes within a community. This approach is intended to maintain global community structure from the original hypergraph in the final coarsest representation. While both methods leverage the bipartite representation to find initial node features, embeddingbased coarsening improves upon communitybased coarsening by relaxing the requirement that nodes can only be coarsened within a community. Nodes within a modularity maximizing community are internally dense and externally sparse [newman2010networks]. As a result nodes sharing a community are more likely to cooccur in any local sampling strategy employed by a graph embedding algorithm. Therefore, it is likely that the natural clusters within our considered graph embeddings are similar to the communities found by KaHyPar. However, these embeddings inform additional global relationships between clusters that are lost when each community is coarsened independently. For instance, nodes on the boundary of two communities will likely receive embeddings spatially located between two clusters. This distinction is able to remain in the coarsest representation of the hypergraph, and may be lost in communitybased coarsening when nodes are initially split due to community assignments.
Memetic partitioning, also proposed for KaHyPar, uses the principles of genetic algorithms to discover improved partitioning solutions [andre2018memetic]. This approach creates high quality partitions by iterating through different “generations” of solutions, starting with an initial generation produced by KaHyPar run multiple times with different seeds. From the initial set, multiple combination operators “breed” new solutions by combining some number of “parents” to form new solutions. Each iteration is designed to improve the population’s average metric. Combination operators are specifically posed such that offspring solutions perform at least as good as its corresponding parents. While this approach is demonstrated to improve overall hypergraph partitioning quality, it does so by adding a meta process to the set of initial hypergraph solutions. We anticipate that adding embeddingbased coarsening as a method for generating a high quality initial solution population may be a complimentary way to improve the overall process.
The proposed embeddingbased coarsening extends the relaxationbased coarsening developed by Shaydulin et al. [shaydulin2019relaxation] in Zoltan. This work introduces algebraic distance for hypergraphs, which in turn extends a similar measure designed for traditional graphs [chen2011algebraic]. Algebraic distance is a similarity measure that takes into account distant neighborhoods of vertices, enabling the coarsening process to exploit the global structure of highly irregular hypegraphs. Algebraic distance is computed by an iterative process that is shown to stabilize quickly [shaydulin2019relaxation], requiring only tens of iterations to obtain rich latent features. As such, this method is additionally found within the Algebraic Heterogeneous Bipartite Embeddings we consider (AHBE) [sybrandt2019heterogeneous]. However, as uncovered in that work, neural graph embeddings can learn additional latent features not often captured by algebraic distance alone.
Aggregative coarsening [shaydulin2018sea] uses ideas from algebraic multigrid, extending an unfinished attempt published in Sandia Summer Reports [bulucboman]. At each step of the coarsening process a set of seed vertices is selected. Each seed then becomes a center of an aggregate, with nonseeds assigned to seeds using different aggregation rules. An aggregate at finer level forms a vertex at coarser level. Two aggregation rules, based on inner product matching and stable matching were explored. Our embeddingbased coarsening can be used within the aggregative coarsening to inform the aggregation rules.
7 Conclusion
In this work we propose embeddingbased coarsening, an approach that uses latent features present in a pretrained hypergraph embedding to better solve the hypergraph partitioning problem. We do so by prioritizing nodes that share many latent features during the coarsening process, and then leveraging a combination of traditional and embeddingderived features when determining coarsening partners. We evaluate this approach over multiple trials per combination of 96 graphs, 7 partition counts, 6 pretrained embedding methods, 5 baseline partitioners, 3 implementations, and 2 objective functions. We observe a significant increase in quality for small values of (from 2 until about 16) gained from embeddingbased coarsening. For higher values of we observe overall quality that returns to the stateoftheart baseline. All experiments, plots and code are available in our online appendix at sybrandt.com/2019/partitioning.
An important future research direction is related to the embeddingbased coarsening for large as the improvement we observe is less significant. One potential explanation is that our fixed sized embeddings only contain a relatively small number of latent clusters. This would imply that beyond certain small , most coarsening comparisons will occur within a single cluster, wherein all nodes are similar. However, we demonstrate that using the proposed embeddingbased coarsening one can improve the solution quality of existing hypergraph paritioners by about for small , and up to on particular graphs with rich latent structure. For example, this method increases the quality of Zoltan above that of KaHyPar with flowbased refinement in some cases, which is particularly important as then level paradigm implemented in Zoltan exposes substantially more parallelism than the level counterpart. We also note that our algorithm is embeddingagnostic and is ready to incorporate other types of embeddings that can potentially work better for specific types of instances.
8 Acknowledgements
We would like to thank Sebastian Schlag from the Karlsruhe Institute of Technology for helping us to understand KaHyPar. This work was supported by NSF awards MRI #1725573, DMS #1522751, and NRT #1633608.