Random Walks on Hypergraphs with Edge-Dependent Vertex Weights

Random Walks on Hypergraphs with
Edge-Dependent Vertex Weights

Uthsav Chitra , Benjamin J Raphael Email: uchitra@cs.princeton.eduEmail: braphael@cs.princeton.edu
Department of Computer Science, Princeton University
July 2, 2019
Abstract

Hypergraphs are used in machine learning to model higher-order relationships in data. While spectral methods for graphs are well-established, spectral theory for hypergraphs remains an active area of research. In this paper, we use random walks to develop a spectral theory for hypergraphs with edge-dependent vertex weights: hypergraphs where every vertex has a weight for each incident hyperedge that describes the contribution of to the hyperedge . We derive a random walk-based hypergraph Laplacian, and bound the mixing time of random walks on such hypergraphs. Moreover, we give conditions under which random walks on such hypergraphs are equivalent to random walks on graphs. As a corollary, we show that current machine learning methods that rely on Laplacians derived from random walks on hypergraphs with edge-independent vertex weights do not utilize higher-order relationships in the data. Finally, we demonstrate the advantages of hypergraphs with edge-dependent vertex weights on ranking applications using real-world datasets.

1 Introduction

Graphs are ubiquitous in machine learning, where they are used to represent pairwise relationships between objects. For example, social networks, protein-protein interaction (PPI) networks, and the internet are modeled with graphs. One limitation of graph models, however, is that they do not encode higher-order relationships between objects. A social network can represent a community of users (e.g. a friend group) as a collection of edges between each user, but this pairwise representation loses information about the overall group structure [38]. In biology, protein interactions are not only between pairs of proteins, but also between groups of proteins in protein complexes [32, 33].

Such higher-order interactions can be modeled using a hypergraph: a generalization of a graph containing hyperedges that can be incident to more than two nodes. A hypergraph representation of a social network can model a community of friends with a single hyperedge. In contrast, the corresponding representation of a community in a graph requires many edges that connect pairs of individuals within the community; conversely, it may not be clear which collection of edges in a graph represents a community (e.g. a clique, an edge-dense subnetwork, etc). Hypergraphs have been used in a variety of machine learning tasks, including clustering [1, 43, 27, 28], ranking keywords in a collection of documents [5], predicting customer behavior in e-commerce [26], object classification [42, 41], and image segmentation [24].

A common approach to incorporate graph information in a machine learning algorithm is to utilize properties of random walks or diffusion processes on the graph. For example, random walks on graphs underlie algorithms for recommendation systems [21], clustering [18, 31], information retrieval [6], and other applications. In many machine learning applications, the graph is represented through the graph Laplacian. Spectral theory includes many key results regarding the eigenvalues and eigenvectors of the graph Laplacian, and these results form the foundation of spectral learning algorithms.

Spectral theory on hypergraphs is much less developed than on graphs. In seminal work, Zhou et al. [43] developed learning algorithms on hypergraphs based on random walks on graphs. However, at nearly the same time, Agarwal et al. [2] showed that the hypergraph Laplacian matrix used by Zhou et al. is equal to the Laplacian matrix of a closely related graph, the star graph. A consequence of this equivalence is that the methods introduced by Zhou et al. utilize only pairwise relationships between objects, rather than the higher-order relationships encoded in the hypergraph. More recently, Chan et al. [7] and Li and Milenkovic [27, 28] developed nonlinear Laplacian operators for hypergraphs that partially address this issue. However, all existing constructions of linear Laplacian operators utilize only pairwise relationships between vertices, as shown by Agarwal et al. [2].

In this paper, we develop a spectral theory for hypergraphs with edge-dependent vertex weights. In such a hypergraph, each hyperedge has an edge weight , and each vertex has a collection of vertex weights, with one weight for each hyperedge incident to . The edge-dependent vertex weight models the contribution of vertex to hyperedge . Edge-dependent vertex weights have previously been used in several applications including: image segmentation, where the weights represent the probability of an image pixel (vertex) belonging to a segment (hyperedge) [11]; e-commerce, where the weights model the quantity of a product (hyperedge) in a user’s shopping basket (vertex) [26]; and text ranking, where the weights represent the importance of a keyword (vertex) to a document (hyperedge) [5]. Hypergraphs with edge-dependent vertex weights have also been used in image search [40, 20] and 3D object classification [42], where the weights represent contributions of vertices in a k-nearest-neighbors hypergraph.

Unfortunately, because of a lack of a spectral theory for hypergraphs with edge-dependent vertex weights, many of the papers that use these hypergraphs rely on incorrect or theoretically unsound assumptions. For example, Zhang et al. [42] and Ding and Yilmaz [11] use a hypergraph Laplacian with no spectral guarantees, while Li et al. [26] derive an incorrect stationary distribution for a random walk on such a hypergraph (see Supplement for additional details). The reason such issues arise is because existing spectral methods are developed for hypergraphs with edge-independent vertex weights, i.e. hypergraphs where the are identical for all hyperedges .

In this paper, we derive several results for hypergraphs with edge-dependent vertex weights. First, we show that random walks on hypergraphs with edge-independent vertex weights are always equivalent to random walks on the clique graph (Figure 1). This generalizes the results of Agarwal et al. [2] and gives the underlying reason why existing constructions of hypergraph Laplacian matrices [34, 43] do not utilize the higher-order relations of the hypergraph.

Motivated by this result, we derive a random walk-based Laplacian matrix for hypergraphs with edge-dependent vertex weights that utilizes the higher-order relations expressed in the hypergraph structure. This Laplacian matrix satisfies the typical properties one would expect of a Laplacian matrix, including being positive semi-definite and satisfying a Cheeger inequality. We also derive a formula for the stationary distribution of a random walk on a hypergraph with edge-dependent vertex weights, and give a bound on the mixing time of the random walk.

Our paper is organized as follows. In Section 2, we define our notation, and introduce hypergraphs with edge-dependent vertex weights. In Section 3, we formally define random walks on hypergraphs with edge-dependent vertex weights, and show that when the vertex weights are edge-independent, a random walk on a hypergraph has the same transition matrix as a random walk on its clique graph. In Section 4, we derive a formula for the stationary distribution of a random walk, and use it to bound the mixing time. In Section 5, we derive a random-walk based Laplacian matrix for hypergraphs with edge-dependent vertex weights and show some basic properties of the matrix. Finally, in Section 6, we demonstrate two applications of hypergraphs with edge-dependent vertex weights: ranking authors in a citation network and ranking players in a video game. All proofs are in the Supplementary Material.

2 Graphs, Hypergraphs, and Random Walks

Let be a graph with vertex set , edge set , and edge weights . For a vertex , let denote the vertices incident to . The adjacency matrix of a graph is a matrix where if and otherwise.

Let be a hypergraph with vertex set ; edge set ; and hyperedge weights . A graph is a special case of a hypergraph, where each hyperedge has size . For hypergraphs, the terms “hyperedge” and “edge” are used interchangeably. A random walk on a hypergraph is typically defined as follows [43, 12, 9, 4]. At time , a “random walker” at vertex will:

  1. Select an edge containing , with probability proportional to .

  2. Select a vertex from , uniformly at random.

  3. Move to vertex at time .

A natural extension is to modify Step 2: instead of choosing uniformly at random from , we pick according to a fixed probability distribution on the vertices in . This motivates the following definition of a hypergraph with edge-dependent vertex weights.

{defn}

A hypergraph with edge-dependent vertex weights is a set of vertices , a set of hyperedges, a weight for every hyperedge , and a weight for every hyperedge and every vertex incident to . We emphasize that a vertex in a hypergraph with edge-dependent vertex weights has multiple weights: one weight for each hyperedge that contains . Intuitively, measures the contribution of vertex to hyperedge . In a random walk on a hypergraph with edge-dependent vertex weights, the random walker will pick a vertex from hyperedge with probability proportional to . Note that we set if . We show an example of a hypergraph with edge-dependent vertex weights in Figure 1.

If each vertex has the same contribution to all incident hyperedges, i.e. for all hyperedges and incident to , then we say that the hypergraph has edge-independent vertex weights, and we use to refer to the vertex weights of . If for all vertices and incident hyperedges , we say the vertex weights are trivial.

We define to be the hyperedges incident to a vertex , and to be the hyperedges incident to both vertices and . Let denote the degree of vertex , and let denote the degree of hyperedge . The vertex-weight matrix of a hypergraph with edge-dependent vertex weights is an matrix with entries , and the hyperedge weight matrix is a matrix with if , and otherwise. The vertex-degree matrix is a diagonal matrix with entries , and the hyperedge-degree matrix is a diagonal matrix with entries .

Given , the clique graph of , , is an unweighted graph with vertices , and edges . In other words, turns all hyperedges into cliques.

We say a hypergraph is connected if its clique graph is connected. In this paper, we assume all hypergraphs are connected.

For a Markov chain with states transition probabilities , we use to denote the probability of going from state to state .

3 Random Walks on Hypergraphs with Edge-Dependent Vertex Weights

Let be a hypergraph with edge-dependent vertex weights. We first define a random walk on . At time , a random walker at vertex will do the following:

Figure 1: Example illustrating Theorem 3. A hypergraph with edge-independent vertex weights (left) and a corresponding edge-weighted clique graph (right) such that random walks on and are equivalent. Note that, if one changes the vertex weights of to be edge-dependent vertex weights, by setting , then it is not possible to choose edge weights on such that random walks on and are equivalent.
  1. Pick an edge containing , with probability .

  2. Pick a vertex from , with probability .

  3. Move to vertex , at time .

Formally, we define a random walk on by writing out the transition probabilities according to the above steps.

{defn}

A random walk on a hypergraph with edge-dependent vertex weights is a Markov chain on with transition probabilities

(1)

The probability transition matrix of a random walk on is the matrix with entries and can be written in matrix form as . (We use the convention that probability transition matrices have row sum .) Using the probability transition matrix , we can also define a random walk with restart on [36]. The random walk with restart is useful when it is unknown whether the random walk is irreducible.

Note that our definition allows self-loops, i.e. , and thus the random walk is lazy. While one can define a non-lazy random walk (i.e. for all ), the analysis of such walks is significantly more difficult, as the probability transition matrix cannot be factored as easily. In the Supplement, we show that a weaker version of Theorem 3 below holds for a non-lazy random walk. Cooper et al. [9] also studies the cover time of a non-lazy random walk on a hypergraph with edge-independent vertex weights.

Next, we define what it means for two random walks to be equivalent. Because random walks are Markov chains, we define equivalence in terms of Markov chains. {defn} Let and be Markov chains with the same (countable) state space, and let and be their respective probability transition matrices. We say that and are equivalent if

for all states and .

Using this definition, we state our first main theorem: a random walk on a hypergraph with edge-independent vertex weights is equivalent to a random walk on its clique graph, for some choice of weights on the clique graph. {theorem} Let be a hypergraph with edge-independent vertex weights. There exist weights on the clique graph such that a random walk on is equivalent to a random walk on .

Theorem 3 generalizes the result by Agarwal et al. [2] who showed that the two hypergraph Laplacian matrices constructed in Zhou et al. [43] and Rodriguez-Velazquez [34] are equal to the Laplacian matrix of either the clique graph or the star graph, another graph constructed from a hypergraph. Agarwal et al. [2] also showed that the Laplacians of the clique graph and the star graph are equal when is -uniform (i.e. when all hyperedges have size ), and are very close otherwise. Since the Laplacian matrices in Zhou et al. [43] and Rodriguez-Velazquez [34] are derived from random walks on edge-independent vertex weights, Theorem 3 implies that both Laplacians are equal to the Laplacian of the clique graph – even when the hypergraph is not -uniform – thus strengthening the result in Agarwal et al. [2].

The proof of Theorem 3 relies on the fact that a random walk on satisfies a property known as time-reversibility: for all vertices , where is the stationary distribution of the random walk [3]. It is well-known that a Markov chain can be represented as a random walk on a graph if and only if it is time-reversible. Moreover, time-reversiblility allows us to derive a formula for the weights on . Let be the edge-independent weight for vertex . Then,

(2)

Conversely, the caption of Figure 1 describes a simple example of a hypergraph with edge-dependent vertex weights that is not time-reversible. This proves the following result.

{theorem}

There exists a hypergraph with edge-dependent weights such that a random walk on is not equivalent to a random walk on its clique graph for any choice of edge weights on .

Anecdotally, we find from simulations that most random walks on hypergraphs with edge-dependent vertex weights are not time-reversible, and therefore satisfy Theorem 3. However, it is not clear how to formalize this observation.

Theorem 3 says that random walks on graphs with vertex set are a strict subset of Markov chains on . A natural follow-up question is whether all Markov chains on can be described as a random walk on some hypergraph with vertex set and edge-dependent vertex weights. In the Supplement, we show that the answer to this question is no and provide a counterexample.

In addition, we show in the Supplement that hypergraphs with edge-dependent vertex weights create a rich hierarchy of Markov chains, beyond the division between time-reversible and time-irreversible Markov chains. In particular, we show that random walks on hypergraphs with edge dependent vertex weights and at least one hyperedge of cardinality cannot in general be reduced to a random walk on a hypergraph with hyperedges of cardinality at most .

Finally, note that our definition of equivalent random walks (Definition 3) requires the probability transition matrices to be equal. Thus, another natural question is: given , do there exist weights on the clique graph such that random walks on and are “close”? We provide a partial answer to this question in Section 5, where we show that, for a specific choice of weights on , the second-smallest eigenvalues of the Laplacian matrices of and are close.

4 Stationary Distribution and Mixing Time

4.1 Stationary Distribution

Recall the formula for the stationary distribution of a random walk on a graph. If is a graph, then the stationary distribution of a random walk on is

(3)

where . We derive a formula for the stationary distribution for a random walk on a hypergraph with edge-dependent vertex weights; the formula is analogous to equation (3) above with two important changes: first, the proportionality constant depends on the hyperedge, and second, each term in the sum is multiplied by the vertex weight . {theorem} Let be a hypergraph with edge-independent vertex weights. There exist positive constants such that the stationary distribution of a random walk on is

(4)

Moreover, can be computed in time .

Note that while the vertex weights can be scaled arbitrarily without affecting the properties of the random walk, Theorem 4.1 suggests that is the “correct” scaling factor.

When the hypergraph has edge-independent vertex weights (i.e. for all incident hyperedges ), , leading to the following formula for the stationary distribution:

(5)

Furthermore, if the vertex weights are trivial (i.e. ) then , recovering the formula derived in Zhou et al. [43] for the stationary distribution of hypergraphs with trivial vertex weights.

4.2 Mixing Time

In this section, we derive a bound on the mixing time of a random walk on . First, we recall the definition of the mixing time of a Markov chain.

{defn}

Let be a Markov chain with states and probability transition matrix . The mixing time of is

where is the total variation distance.

We derive the following bound on the mixing time for a random walk on a hypergraph with edge-dependent vertex weights.

{theorem}

Let be a hypergraph with edge-dependent vertex weights. Without loss of generality, assume (i.e. by multiplying the vertex weights in hyperedge by ). Then,

(6)

where

  • is the Cheeger constant of a random walk on [30, 22]

  • is the minimum degree of a vertex in , i.e. ,

  • ,

  • .

This bound on the mixing time of the hypergraph random walk has a similar form to the bound on the mixing time bound for a random walk on a graph [22]. For a graph with edge weights satisfying , we have,

(7)

Note that both and have the same dependence on , and . Intuitively, the additional dependence of on and is because small values of and correspond to the hypergraph having vertices that are hard to reach, and the presence of such vertices increases the mixing time.

5 Hypergraph Laplacian

Let be a hypergraph with edge-dependent vertex weights. Since a random walk on is a Markov chain, we can model the transition probabilities of the random walk using a weighted directed graph with the same vertex set . Specifically, let be a directed graph with directed edges , and edge weights . Extending the definition of the Laplacian matrix for directed graphs [8], we define a Laplacian matrix for the hypergraph as follows.

{defn}

[Random walk-based hypergraph Laplacian] Let be a hypergraph with edge-dependent vertex weights. Let be the probability transition matrix of a random walk on with stationary distribution . Let be a diagonal matrix with . Then, the random walk-based hypergraph Laplacian matrix is

(8)

At first glance, one might hypothesize that the hypergraph Laplacian defined above does not model higher-order relations between vertices, since is defined using a directed graph containing edges only between pairs of vertices. Indeed, if has edge-independent vertex weights, then it is true that does not model higher-order relations between vertices. This is because the transition probabilities are completely determined by the edge weights of the undirected clique graph (Theorem 3). Thus, for each pair of vertices in , only a single quantity , which encodes a pairwise relation between and , is required to define the random walk. As such, the Laplacian matrix defined in Equation (8) is equal to the Laplacian matrix of an undirected graph, showing that only encodes pairwise relationships between vertices.

In contrast, when has edge-dependent vertex weights, the transition probabilities generally cannot be computed from a single quantity defined for each pair of vertices (Theorem 3). The absence of such a reduction implies that the transition probabilities , which are the edge weights of the directed graph , encode higher-order relations between vertices. Thus, the Laplacian matrix also encodes these higher-order relations.

From Chung [8], the hypergraph Laplacian matrix given in equation (8) is positive semi-definite and has a Rayleigh quotient for computing its eigenvalues. can be used in developing spectral learning algorithms for hypergraphs with edge-dependent vertex weights, or to study the properties of random walks on such hypergraphs. For example, the following Cheeger inequality for hypergraphs follows directly from the Cheeger inequality for directed graphs [8].

{theorem}

[Cheeger inequality for hypergraphs] Let be a hypergraph with edge-dependent vertex weights. Let be the Laplacian matrix given in equation (8), and let be the Cheeger constant of a random walk on . Let be the non-zero eigenvalues of , and let . We have

(9)

5.1 Approximating the Hypergraph Laplacian with a Graph Laplacian

In Section 3, we posed the following question: given a hypergraph with edge-dependent vertex weights, can we find weights on the clique graph such that the random walks of and are close? We prove the following result. {theorem} Let be a hypergraph, with the edge-dependent vertex weights normalized so that for all hyperedges . Let be the clique graph of , with edge weights

(10)

Let be the Laplacians of and , respectively, and let be the second-smallest eigenvalues of , respectively. Then

(11)

where . This theorem says that there exist edge weights on such that second smallest eigenvalues of the Laplacians of and are within a constant factor of each other, where is determined by the vertex weights. We do not know if the edge weights in Equation (59) give the tightest bound, or if another choice of edge weights on will yield a Laplacian that is “closer” to the hypergraph Laplacians .

Interestingly, Zhang et al. [42] use a variant of as the Laplacian matrix of a hypergraph with edge-dependent vertex weights, and obtain state-of-the-art results on an object classification task. Theorem 5.1 provides some theoretical evidence for why Zhang et al. [42] are able to obtain good results, even with the “wrong” Laplacian.

6 Experiments

We demonstrate the utility of hypergraphs with edge-dependent vertex weights in two different ranking applications: ranking authors in an academic citation network, and ranking players in a video game.

6.1 Citation Network

We construct a citation network of all machine learning papers from NIPS, ICML, KDD, IJCAI, UAI, ICLR, and COLT published on or before 10/27/2017, and extracted from the ArnetMiner database [35]. We represent the network as a hypergraph whose vertices are authors and whose hyperedges are papers, such that each hyperedge connects the authors of a paper. The hypergraph has vertices and hyperedges.

We consider two vertex weighted hypergraphs: has trivial vertex weights with for all for all vertices and incident hyperedges , and has edge-dependent vertex weights

The edge-dependent vertex weights model unequal contributions by different authors. For papers whose authors are in alphabetical order (as is common in theory papers), we set vertex weights for all . We set the hyperedge weights in both hypergraphs.

We calculate the stationary distribution of a random walk with restart on both and (restart parameter ), and rank authors in each hypergraph by their value in the stationary distribution. This yields two different rankings of authors: one with edge-independent vertex weights, and one with edge-dependent vertex weights.

The two rankings have a Kendall correlation coefficient [23] of , indicating modest similarity. Examining individual authors, we typically see that authors who are first/last authors on their most cited papers have higher rankings in compared to , e.g. Ian Goodfellow [17]. In contrast, authors who are middle authors on their most cited papers have lower rankings in relative to their rankings in . Table 1 shows the authors with rank above in at least one of the two hypergraphs, and with the largest gain in rank in relative to .

Name Rank in Rank in
Richard Socher 687 382
Zhongzhi Shi 543 304
Daniel Rueckert 619 391
Lars Schmidt-Thieme 673 454
Tat-Seng Chua 650 435
Ian J. Goodfellow 612 413
Table 1: Highly ranked authors with the largest increase in rank when edge-dependent vertex weights are used in the hypergraph citation network.

We emphasize that this example is intended to illustrate how a straightforward application of vertex weights leads to alternative author rankings. We do not anticipate that our simple scheme for choosing edge-dependent vertex weights will always yield the best results in practice. For example, Christopher Manning drops in rank when edge-dependent vertex weights are added, but this is because he is the second-to-last, and co-corresponding, author on his most cited papers in the database. A more robust vertex weighting scheme would include knowledge of such equal-contribution authors, and would also incorporate different relative contributions of first, middle, and corresponding authors.

6.2 Rank Aggregation

We illustrate the usage of hypergraphs with edge-dependent vertex weights on the rank aggregation problem. The rank aggregation problem aims to combine many partial rankings into one complete ranking. Formally, given a universe of items and a collection of partial rankings (e.g. is a partial ranking expressing item item item ), a rank aggregation algorithm should find a permutation on that is “close” to the partial rankings .

We consider a particular application of rank aggregation: ranking players in a multiplayer game. Here, the outcome of a game/match gives a partial ranking of the players participating in the match. In addition to the ranking, one may also have additional information such as the scores of each player in the match. The latter setting has been extensively studied; classic ranking methods are the ELO [14], and Glicko [16] systems that are used to rank chess players. More recently, online multiplayer games such as Halo have led to the development of alternative ranking systems such as Microsoft’s TrueSkill [19] and TrueSkill 2 [29].

We develop a rank aggregation algorithm that uses random walks on hypergraphs with edge-dependent vertex weights, and evaluate the performance of this algorithm on a real-world datasets of Halo 2 games. In the Supplement, we also include results on experiments with synthetic data.

Data. We analyze the Halo 2 dataset from the TrueSkill paper [19]. This dataset contains two kinds of matches: free-for-all matches with up to players, and 1-v-1 matches. There are free-for-all matches and 1-v-1 matches among players. Using the free-for-all matches as partial rankings, we construct rankings of all players in the dataset, and evaluate those rankings on the 1-v-1 matches.

Methods. A well-known class of rank aggregation algorithms are Markov chain-based algorithms, first developed by Dwork et al. [13]. Markov-chain based algorithms create a Markov chain whose states are the players and whose the transition probabilities depend in some way on the partial rankings. The final ranking of players is determined by sorting the values in the stationary distribution of . In our experiments, we use a random walk with restart () instead of just a random walk, so that the stationary distribution always exists [36].

Using the free-for-all matches, we construct rankings of the players using four algorithms. The first three algorithms use Markov chains: a random walk on hypergraph with edge-dependent vertex weights; a random walk on a clique graph; and MC3, a Markov chain-based rank aggregation algorithm designed by Dwork et al. [13]. The fourth algorithm is TrueSkill [19].

First, we derive a rank aggregation algorithm using a random walk on a hypergraph with edge-dependent vertex weights. The vertices are the players, and the hyperedges correspond to the free-for-all matches. We set the hyperedge and vertex weights to be

This choice of hyperedge weights are inspired by Ding and Yilmaz [11], who also use variance to define the hyperedge weights of their hypergraph. For vertex weights, we use . We choose these vertex weights instead of raw scores for two reasons: first, scores in Halo 5 can be negative, but vertex weights should be positive, and second, exponentiating the score gives more importance to the winner of a match. We chose to use relatively simple formulas for the hyperedge and vertex weights to evaluate the potential benefits of utilizing edge-dependent vertex weights; further optimization of vertex and edge weights may yield better performance.

Second, we derive a rank aggregation algorithm using a random walk on the clique graph of hypergraph described above, with the edge weights of given by Equation 59. Specifically, if is the hypergraph defined above, then is a graph with vertex set and edge weights defined by

(12)

In contrast to Equation 59, here we do not normalize vertex weights on so that for each hyperedge , since computing is computationally infeasible on our large dataset. Instead, we normalize vertex weights so that for all hyperedges .

Third, we use MC3, a Markov chain-based rank aggregation algorithm designed by Dwork et al. [13]. MC3 uses the partial rankings in each match; it does not use the score information. MC3 is very similar to a random walk on a hypergraph with edge-independent vertex weights. We convert the scores from each player in match into a partial ranking of the players, and use the as input to MC3.

Fourth, we use TrueSkill [19]. TrueSkill models each player’s skill with a normal distribution. We rank players according to the mean of this distribution. We also implemented the probabilistic decision procedure for ranking players from the TrueSkill paper, and found no difference in performance between ranking by the mean of the distribution and the probabilistic decision procedure.

Evaluation and Results: We evaluate the rankings of each algorithm by using them to predict the outcomes of the 1-v-1 matches. Specifically, given a ranking of players, we predict that the winner of a match between two players is the player with the higher ranking in . Table 2 shows the fraction of 1-v-1 matches correctly predicted by each of the four algorithms. Random walks on the hypergraph with edge-dependent vertex weights have significantly better performance than both MC3 and random walks on the clique graph , and comparable performance to TrueSkill. Moreover, on of 1-v-1 matches, the hypergraph method correctly predicts the outcome of the match, while TrueSkill incorrectly predicts the outcome—suggesting that the hypergraph model is capturing some information about the players that TrueSkill is missing. Unfortunately, we are unable to identify any specific pattern in the matches where the hypergraph predicted the outcome correctly and TrueSkill predicted incorrectly.


Correctly Predicted
TrueSkill 73.4%
Hypergraph 71.1%
Clique Graph 61.1%
MC3 52.3%
Table 2: Result of ranking players for Halo 2 Dataset.

7 Conclusion

In this paper, we use random walks to develop a spectral theory for hypergraphs with edge-dependent vertex weights. We demonstrate both theoretically and experimentally how edge-dependent vertex weights model higher-order information in hypergraphs and improve the performance of hypergraph-based algorithms. At the same time, we show that random walks on hypergraphs with edge-independent vertex weights are equivalent to random walks on graphs, generalizing earlier results tha showed this equivalence in special cases [2].

There are numerous directions for future work. It would be desirable to evaluate additional applications where hypergraphs with edge-dependent vertex weights have previously been used (e.g. [42, 26]), replacing the Laplacian used in some of these works with the hypergraph Laplacian introduced in Section 5. Sharper bounds on the approximation of the hypergraph Laplacian by a graph Laplacian are also desirable. Another direction is to examine the relationship between the linear hypergraph Laplacian matrix introduced here and the nonlinear Laplacian operators that were recently introduced in the case of trivial vertex weights [7] or submodular vertex weights [27, 28].

Another interesting direction is in extending graph convolutional neural networks (GCNs) to hypergraphs. Recent approaches to GCNs implement the graph convolution operator as a non-linear function of the graph Laplacian [25, 10]. GCNs have also been generalized to hypergraph convolutional neural networks (HGCNs), where the convolution layer operates on a hypergraph with edge-independent vertex weights instead of a graph [37, 15]. The hypergraph Laplacian matrix introduced in this paper would allow one to extend HGCNs to hypergraphs with edge-dependent vertex weights.

References

  • Agarwal et al. [2005] S. Agarwal, Jongwoo Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise clustering. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 838–845 vol. 2, June 2005.
  • Agarwal et al. [2006] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 17–24, New York, NY, USA, 2006. ACM. ISBN 1-59593-383-2. doi: 10.1145/1143844.1143847.
  • Aldous and Fill [2002] David Aldous and James Allen Fill. Reversible Markov Chains and Random Walks on Graphs. 2002.
  • Avin et al. [2014] Chen Avin, Yuval Lando, and Zvi Lotker. Radio cover time in hyper-graphs. Ad Hoc Networks, 12:278 – 290, 2014. ISSN 1570-8705. doi: http://doi.org/10.1016/j.adhoc.2012.08.010.
  • Bellaachia and Al-Dhelaan [2013] Abdelghani Bellaachia and Mohammed Al-Dhelaan. Random walks in hypergraph. In Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy, pages 187–194, 2013.
  • Brin and Page [1998] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Seventh International World-Wide Web Conference (WWW 1998), 1998.
  • Chan et al. [2018] T.-H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. J. ACM, 65(3):15:1–15:48, March 2018. ISSN 0004-5411. doi: 10.1145/3178123.
  • Chung [2005] Fan Chung. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1–19, Apr 2005. ISSN 0219-3094. doi: 10.1007/s00026-005-0237-z.
  • Cooper et al. [2013] Colin Cooper, Alan Frieze, and Tomasz Radzik. The cover times of random walks on random uniform hypergraphs. Theoretical Computer Science, 509:51 – 69, 2013. ISSN 0304-3975. doi: http://dx.doi.org/10.1016/j.tcs.2013.01.020.
  • Defferrard et al. [2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. CoRR, abs/1606.09375, 2016.
  • Ding and Yilmaz [2010] Lei Ding and Alper Yilmaz. Interactive image segmentation using probabilistic hypergraphs. Pattern Recognition, 43(5):1863 – 1873, 2010. ISSN 0031-3203.
  • Ducournau and Bretto [2014] Aurélien Ducournau and Alain Bretto. Random walks in directed hypergraphs and application to semi-supervised image segmentation. Comput. Vis. Image Underst., 120:91–102, March 2014. ISSN 1077-3142. doi: 10.1016/j.cviu.2013.10.012.
  • Dwork et al. [2001] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pages 613–622, New York, NY, USA, 2001. ACM. ISBN 1-58113-348-0. doi: 10.1145/371920.372165.
  • Elo [1978] Arpad E. Elo. The rating of chessplayers, past and present. Arco Pub., New York, 1978. ISBN 0668047216 9780668047210.
  • Feng et al. [2018] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. CoRR, abs/1809.09401, 2018.
  • Glickman [1995] Mark E Glickman. The glicko system. Boston University, 1995.
  • Goodfellow et al. [2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • Harel and Koren [2001] David Harel and Yehuda Koren. On clustering using random walks. In Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer Science, FST TCS ’01, pages 18–41, Berlin, Heidelberg, 2001. Springer-Verlag. ISBN 3-540-43002-4.
  • Herbrich et al. [2006] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill™: A bayesian skill rating system. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, pages 569–576, Cambridge, MA, USA, 2006. MIT Press.
  • Huang et al. [2010] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas. Image retrieval via probabilistic hypergraph ranking. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3376–3383, June 2010. doi: 10.1109/CVPR.2010.5540012.
  • Jamali and Ester [2009] Mohsen Jamali and Martin Ester. Trustwalker: A random walk model for combining trust-based and item-based recommendation. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 397–406, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-495-9. doi: 10.1145/1557019.1557067.
  • Jerison [2013] Daniel Jerison. General mixing time bounds for finite markov chains via the absolute spectral gap, October 2013.
  • Kendall [1938] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. ISSN 00063444.
  • Kim et al. [2011] Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang D. Yoo. Higher-order correlation clustering for image segmentation. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1530–1538. Curran Associates, Inc., 2011.
  • Kipf and Welling [2016] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016.
  • Li et al. [2018] Jianbo Li, Jingrui He, and Yada Zhu. E-tail product return prediction via hypergraph-based local graph cut. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pages 519–527, New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5552-0.
  • Li and Milenkovic [2017] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2308–2318. Curran Associates, Inc., 2017.
  • Li and Milenkovic [2018] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, Cheeger inequalities and spectral clustering. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3014–3023, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
  • Minka et al. [2018] Tom Minka, Ryan Cleven, and Yordan Zaykov. Trueskill 2: An improved bayesian skill rating system. March 2018.
  • Montenegro and Tetali [2006] R. Montenegro and P. Tetali. Mathematical aspects of mixing times in markov chains. Found. Trends Theor. Comput. Sci., 1(3):237–354, May 2006. ISSN 1551-305X. doi: 10.1561/0400000003.
  • Ng et al. [2001] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01, pages 849–856, Cambridge, MA, USA, 2001. MIT Press.
  • Ramadan et al. [2004] E. Ramadan, A. Tarafdar, and A. Pothen. A hypergraph model for the yeast protein complex network. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., pages 189–, April 2004. doi: 10.1109/IPDPS.2004.1303205.
  • Ritz et al. [2014] Anna Ritz, Allison N. Tegge, Hyunju Kim, Christopher L. Poirel, and T.M. Murali. Signaling hypergraphs. Trends in Biotechnology, 32(7):356 – 362, 2014. ISSN 0167-7799. doi: http://doi.org/10.1016/j.tibtech.2014.04.007.
  • Rodriguez-Velazquez [2002] Juan Alberto Rodriguez-Velazquez. On the laplacian eigenvalues and metric parameters of hypergraphs. Linear and Multilinear Algebra, 50:1–14, 03 2002.
  • Tang et al. [2008] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pages 990–998, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-193-4. doi: 10.1145/1401890.1402008.
  • Tong et al. [2006] H. Tong, C. Faloutsos, and J. Pan. Fast random walk with restart and its applications. In Sixth International Conference on Data Mining (ICDM’06), pages 613–622, Dec 2006. doi: 10.1109/ICDM.2006.70.
  • Yadati et al. [2018] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Anand Louis, and Partha Talukdar. Hypergcn: Hypergraph convolutional networks for semi-supervised classification. CoRR, abs/1809.02589, 2018.
  • Yang et al. [2017] Wenyin Yang, Guojun Wang, Md Zakirul Alam Bhuiyan, and Kim-Kwang Raymond Choo. Hypergraph partitioning for social networks based on information entropy modularity. Journal of Network and Computer Applications, 86:59 – 71, 2017. ISSN 1084-8045. Special Issue on Pervasive Social Networking.
  • Yilmaz et al. [2008] Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. A new rank correlation coefficient for information retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 587–594, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-164-4. doi: 10.1145/1390334.1390435.
  • Zeng et al. [2016] Kaiman Zeng, Nansong Wu, Arman Sargolzaei, and Kang Yen. Learn to rank images: A unified probabilistic hypergraph model for visual search. Mathematical Problems in Engineering, 2016:1–7, 01 2016. doi: 10.1155/2016/7916450.
  • Zhang et al. [2018a] Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Transactions on Image Processing, 27(12):5957–5968, Dec 2018a.
  • Zhang et al. [2018b] Zizhao Zhang, Haojie Lin, and Yue Gao. Dynamic hypergraph structure learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 3162–3169. International Joint Conferences on Artificial Intelligence Organization, 7 2018b.
  • Zhou et al. [2006] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, pages 1601–1608, Cambridge, MA, USA, 2006. MIT Press.

Appendix A Incorrect Stationary Distribution in Earlier Work

Li et al. [26] claim in Equation 4 that the stationary distribution of a random walk on a hypergraph with edge-dependent vertex weights is

(13)

where is the sum of edge weights of incident hyperedges. Curiously, the stationary distribution given by this formula does not depend on the vertex weights. A counterexample to this formula is shown in hypergraph in Figure 1 of the main text, with edge-dependent vertex weights as described in the caption (i.e. ). Computing the stationary distribution of a random walk on yields that , while Equation (13) incorrectly yields .

Appendix B Proof of Theorem 3

First we need the following definition and lemma.

{defn}

Let be a Markov chain with state space and transition probabilities , for . We say is reversible if there exists a probability distribution over such that

(14)
{lem}

Let be an irreducible Markov chain with finite state space and transition probabilities for . is reversible if and only if there exists a weighted, undirected graph with vertex set such that a random walk on and are equivalent.

Proof of Lemma.

First, suppose is reversible. Since is irreducible, let be the stationary distribution of . Note that, because is irreducible, for all states .

Let be a graph with vertices , and edge weights . By reversibility, is well-defined. In a random walk on , the probability of going from to in one time-step is

since .

Thus, if is reversible, the stated claim holds. The other direction follows from the fact that a random walk on an undirected graph is always reversible [3]. ∎

Theorem 3.

Let be a hypergraph with edge-independent vertex weights. Then, there exist weights on the clique graph such that a random walk on is equivalent to a random walk on .

Proof of Theorem 3.

Let for vertices and incident hyperedges . We first show that a random walk on is reversible. By Kolmogorov’s criterion, reversibility is equivalent to

(15)

for any set of vertices .

Since the transition probabilities for any two vertices are

(16)

we have

(17)

So by Kolmogorov’s criterion, a random walk on is reversible.

Furthermore, because is connected, random walks on are irreducible. Thus, by Lemma B, there exists a graph with vertex set and edge weights such that random walks on and are equivalent. The equivalence of the random walks implies that if and only if , so it follows that is the clique graph of . ∎

Appendix C Non-Lazy Random Walks on Hypergraphs

First we generalize the random walk framework of Cooper et al. [9] to random walks on hypergraphs with edge-dependent vertex weights. Informally, in a non-lazy random walk, a random walker at vertex will do the following:

  1. pick an edge containing , with probability ,

  2. pick a vertex from , with probability , and

  3. move to vertex .

Formally, we have the following. {defn} A non-lazy random walk on a hypergraph with edge-dependent vertex weights is a Markov chain on with transition probabilities

(18)

for all states .

It is also useful to define a modified version of the clique graph without self-loops.

{defn}

Let be a hypergraph with edge-dependent vertex weights. The clique graph of without self-loops, , is a weighted, undirected graph with vertex set , and edges defined by

(19)

In contrast to the lazy random walk, a non-lazy random walk on a hypergraph with edge-independent vertex weights is not guaranteed to satisfy reversibility. However, if has trivial vertex weights, then reversibility holds, and we get the following result.

{theorem}

Let be a hypergraph with trivial vertex weights, i.e. for all vertices and incident hyperedges . Then, there exist weights on the clique graph without self-loops such that a non-lazy random walk on is equivalent to a random walk on .

Proof.

Again, we first show that a non-lazy random walk on is reversible. Define the probability mass function for normalizing constant . Let be the probability of going from to in a non-lazy random walk on , where . Then,

By symmetry, , so a non-lazy random is reversible. Thus, by Lemma B, there exists a graph with vertex set and edge weights such that a random walk on and a non-lazy random walk on are equivalent. The equivalence of the random walks implies that if and only if , so it follows that is the clique graph of without self-loops. ∎

Appendix D Relationships between Random Walks on Hypergraphs and Markov Chains on Vertex Set

In the main text, we show that there are hypergraphs with edge-dependent vertex weights whose random walks are not equivalent to a random walk on a graph. A natural follow-up question is to ask whether all Markov chains on a vertex set can be represented as a random walk on some hypergraph with the same vertex set and edge-dependent vertex weights. Below, we show that the answer is no. Since random walks on hypergraphs with edge-dependent vertex weights are lazy, in the sense that for all vertices , we restrict our attention to lazy Markov chains with .

{claim}

There exists a lazy Markov chain with state space such that is not equivalent to a random walk on a hypergraph with vertex set and edge-dependent vertex weights.

Proof.

Suppose for the sake of contradiction that any lazy Markov chain with is equivalent to a random walk on some hypergraph with vertex set . Let be a lazy Markov chain with states and transition probabilities , with the following property. For some states , let

(20)

By assumption, let be a hypergraph with vertex set and edge-dependent vertex weights, such that a random walk on is equivalent to . Let be the transition probabilities of a random walk on . We have

(21)

Plugging in Equations (20) to the above yields , or .

By similar reasoning, we also have , and plugging in Equations (20) gives us , or .

Combining both of these inequalities, we obtain

(22)

Since the vertex degree , we obtain a contradiction. ∎

Next, for any , define a -hypergraph to be a hypergraph with edge-dependent vertex weights whose hyperedges have cardinality at most . We show that, for any , there exists a -hypergraph with vertex set whose random walk is not equivalent to the random walk of any -hypergraph with vertex set . We first prove the result for .

{lem}

There exists a -hypergraph with vertex set , whose random walk is not equivalent to a random walk on any -hypergraph with vertex set .

Proof.

Let be a -hypergraph with four vertices, , and two hyperedges and . Let the hyperedge weights be and the vertex weights be , and for all other such that .

Figure 2: Pictured above is .

For the sake of contradiction, suppose a random walk on is equivalent to a random walk on , where is a -hypergraph with vertex set . Let be the transition probabilities of for ; by assumption, .

must have the following edges: , , , , and . WLOG let for each . Moreover, while we do not depict these edges in the figure below, also has edges for , though it may be the case that .

For shorthand, we write for , for , and for where .

Figure 3: Pictured above is . For illustrative purposes, we do not draw out singleton edges.

By definition, we have

(23)

Thus, .

By similar analysis of , and using that , we also have . Thus, adding together the bounds on and

(24)

Note that, to get the bound in Equation (24), we summed for . If we follow the same steps but replace with , we get the following bounds, respectively:

(25)
(26)

Now, solving for in Equation (24) yields

(27)

Next, using that , we bound Equation (25):

(28)

Solving for yields . Combining with Equation (27):

(29)

Bounding Equation (26) in a similar way to Equation (28) gives us:

(30)

Solving for gives us

(31)

Finally, putting together Equations (29) and (31):

(32)

which yields a contradiction, as . ∎

We prove the result for general by extending the above proof.

{theorem}

Let . Then, there exists a -hypergraph with vertex set whose random walk is not equivalent to a random walk on any -hypergraph with vertex set .

Proof.

For simplicity, assume is even (our argument can be adapted to odd ). Write . For the sake of contradiction, suppose all -hypergraphs have random walks equivalent to the random walk of some -hypergraph.

Let be a -hypergraph with vertices , and hyperedges and . The edge weights are , and the edge-dependent vertex weights are , and for all other with .

Figure 4: Pictured above is .

By assumption, let be a