Line Hypergraph Convolution Network: Applying Graph Convolution for Hypergraphs
Network representation learning and node classification in graphs got significant attention due to the invent of different types graph neural networks. Graph convolution network (GCN) is a popular semi-supervised technique which aggregates attributes within the neighborhood of each node. Conventional GCNs can be applied to simple graphs where each edge connects only two nodes. But many modern days applications need to model high order relationships in a graph. Hypergraphs are effective data types to handle such complex relationships. In this paper, we propose a novel technique to apply graph convolution on hypergraphs with variable hyperedge sizes. We use the classical concept of line graph of a hypergraph for the first time in the hypergraph learning literature. Then we propose to use graph convolution on the line graph of a hypergraph. Experimental analysis on multiple real world network datasets shows the merit of our approach compared to state-of-the-arts.
Keywords:Hypergraph Graph Convolution Network Line Graph Node Classification and Representation.
Graph representation learning got remarkable attention in the last few years. Performance on semi-supervised tasks such as node classification in a graph has improved significantly due to invent of different types of graph neural networks (GNNs) [23, 28]. Graph convolution network (GCN)  is one of the most popular and efficient neural network structure that aggregates the transformed attributes over the neighborhood of a node. The transformation matrix is learned in a semi-supervised way by minimizing the cross entropy loss of predicting the labels of a subset of nodes. Graph convolution network  is an fast approximation of spectral graph convolutions . There are different variants of graph convolution present in the literature such as inductive version of GCN with a different aggregation function , primal-dual GCN , FastGCN , etc.
Most of the existing graph convolution approaches are suited for simple graphs, i.e., when the relationship between the nodes are pairwise. In such a graph, each edge connects only two vertices (or nodes). However, real life interactions among the entities are more complex in nature and relationships can be of high-order (beyond pairwise connections). For example, when four authors write a research papers together, it is not necessary that any two of them are connected directly . They are still coauthors because of some other coauthor in the paper who is strongly connected to both of them. Thus, representing such a co-authorship network by a simple graph may not be suitable. Hypergraphs  are introduced to model such complex relationships among the real world entities in a graph structure. In a hypergraph, each edge may connect more than two vertices. So an edge is essentially denoted by a subset of nodes, rather than just a pair. Computation on hypergraphs are more expensive and also complicated. But due to their power to capture real world interactions, it is important to design learning algorithms for hypergraph representation.
Network analysis community shows interest in different applications of hypergraph mining [7, 33].
There exist different ways to transform a hypergraph to a simple graph such as via clique expansion and star expansion [34, 1, 24]. Clique expansion creates cliques by connecting any pair of nodes belonging to the same hyperedge in the transformed simple graph. The star expansion creates a bipartite graph by creating a new node for each hyperedge. Typically, the analysis on hypergraph is done by performing the downstream mining tasks on the transformed simple graph.
There are also tensor based approaches  available to deal with hypergraphs, but typically they assume the size of all the hyperedges in the hypergraph to be the same (i.e., uniform hypergraph
We propose a novel approach of applying graph convolution on hypergraphs. We refer our proposed algorithm as LHCN (Line Hypergraph Convolution Network). To the best of our knowledge, we use the classical concept of line graph of a hypergraph first time in the hypergraph learning literature. We map the hypergraph to a weighted and attributed line graph and learn the graph convolution on this line graph. We also propose a reverse mapping to get the labels of the nodes in the hypergraph. The proposed algorithm can work with any hypergraphs, even with the non-uniform ones.
We conduct thorough experimentation on popular node classification datasets. Experimental results show that the performance of LHCN is as per, and often improve the state-of-the-arts in hypergraph neural networks. We make the source code publicly available at https://bit.ly/2qNmbRn to ease the reproducibility of the results.
2 Related Work
To give a brief but comprehensive idea about the existing literature, we discuss some of the prominent works in the following three domains.
Network Embedding and Graph Neural Networks: A detailed survey on network representation learning and graph neural networks can be found in [31, 12]. Different types of semi-supervised graph neural networks exist in the literature for node representation and classification [23, 8].  proposes a version of graph convolution network (GCN) which learns a weighted mean of neighbor node attributes to find the embedding of a node by minimizing the cross entropy loss for node classification.  proposes GraphSAGE which employs different types of aggregation methods for GCN and extends it for inductive node classification.  proposes GAT which uses attention mechanism to learn the importance of a node to determine the label of another node in the neighborhood of it in the graph convolution framework. Recently, a GCN based unsupervised approach (DGI) is proposed  by maximizing mutual information between patch and high-level summaries of a graph.
Learning on Hypergraphs: As mentioned in Section 1, many of the existing analysis on hypergraph first transform the hypergraph to a simple graph by clique expansion or star expansion [1, 34, 24], and then do the mining on the simple graph. Conventional analysis on simple graphs are done on the adjacency matrix as it captures the graph structure well. Similarly, in the hypergraph domain, higher order matrices, called tensors [3, 19], are used for multiple computations. A non-negative tensor factorization approach is proposed in  for clustering a dataset having complex relations (beyond pairwise) in the form of a hypergraph. A hypergraph clustering approach by casting it into a non-cooperative multi-player game is proposed in . The main disadvantage of tensor based approaches are that, mostly they assume the hypergraph to be uniform, i.e., all the hyper edges are of equal size. Link prediction in hypergraphs is also studied in the literature [7, 25]. Submodular hypergraphs are introduced in , which arise in clustering applications in which higher-order structures carry relevant information. Recently, a hypergraph based active learning scheme is proposed in , which allows one to ask both pointwise queries and pairwise queries.
Hypergraph Neural Networks: Application of graph neural networks for hypergraphs is still a new area of research. To the best of our knowledge, there are only three prominent works in this. A hypergraph neural network (HGNN) is proposed in  which applies convolution on the hypergraph Laplacian. From Eq. 10 of , the framework boils down to the application of graph convolution (as proposed in ) on a weighted clique expansion of the hypergraph, where the weights of the edges of a clique is determined by the weight and degree of the corresponding hyperedge. Assuming the initial hypergraph structure is weak, dynamic hypergraph neural network  is proposed by extending the idea of HGNN, where a dynamic hypergraph construction module is added to dynamically update hypergraph structure on each layer. Very recently, HyperGCN is proposed in , where the authors use the maximum distance of two nodes (in the embedding space) in a hyperedge as a regularizer. They use the hypergraph Laplacian proposed in  to transform a hypergraph into a simple graph where each hyperedge is represented by a simple edge and the edge weight is proportional to the maximum distance between any pair of nodes in the hypergraph. Then they perform GCN on this simple graph structure.
Our proposed approach belongs to the class of hypergraph neural networks, where we invent a novel method to apply graph convolution on the hypergraphs.
3 Problem Statement and Notations Used
We consider an undirected hypergraph . Here is the set of (hyper)nodes with and is the set of hyperedges, with . A hyperedge connects a subset of nodes. For example, if the hyperedge connects , and , it is denoted as . A hypergraph is often represented by an incidence matrix , where if the hyperedge contains the node , and otherwise. We also assume the hypergraph to be attributed, i.e., each node is associated with a dimensional feature vector and this forms a feature matrix . We aim to propose a novel algorithm to apply graph convolution for the hypergraph which can be used to classify the nodes of the hypergraph. So we assume to have a training set where for each node , we know the label of it. Here, is the set of labels. For example, for a binary classification. Our goal is to learn a function which can output label of each unlabelled node . The desired algorithm to learn such a function should be able to use both the hypergraph structure, along with the node attributes.
4 Solution Approach: LHCN
In this section, we discuss each stage of the proposed algorithm LHCN. We also give the necessary background on line graph and graph convolution to make the paper self-contained.
4.1 Line Graph and Extension to Hypergraphs
First, we define a line graph of a simple graph. Given a simple undirected graph , the line graph is the graph such that each node of is an edge in and two nodes of are neighbors if and only if their corresponding edges in share a common endpoint vertex . Formally where and . Figure 1 shows how to convert a graph into the line graph.
Now, we discuss the process of transforming an attributed hypergraph (as discussed in Section 3) to an attributed weighted line graph as follows. Similar to the case of simple graph, we create a node in the line graph for each hyperedge in the hypergraph. We connect the two nodes in the line graph by an edge if the corresponding two hyperedges in the hypergraph share at least one common node. More formally, , and . Next, we assign a weight to each edge of the line graph as follows.
Example of the formulation of a line graph from a small hypergraph is depicted in Figure 2. The original hypergraph has attribute to each node . We follow a simple strategy to assign attributes to each node of the corresponding line graph. For a node , we check the corresponding hyperedge of the given hypergraph. We consider each node of the hypergraph which belongs to this hyperedge and we assign the attribute of as the average attribute of all these nodes. So, .
Finally, as discussed in Section 3, we are interested in the semi-supervised node classification of the given hypergraph. We have subset of nodes in hypergraph for which we have the labels available. We again follow a simple strategy to assign labels to a subset of nodes in the hypergraph. For each node , we put it in the set of labelled nodes if the corresponding hyperedge contains at least one labelled (hyper)node from . For each such node , the label of the node is deduced as the majority of the labels of the nodes in . For example, in Figure 2, if nodes in the hypergraph E and F are labelled as 1, G is labelled as 2 and C is unlabelled, then the node in the line graph would be labelled as 1.
4.2 Convolution Network on the Line Graph
In this section, we aim to apply the graph convolution for the hypergraphs. Please note, the proposed line graph of a hypergraph is completely different from the concept of dual graph of a hypergraph . Dual graphs are formed just by interchanging vertices and edges of a hypergraph. The dual graph of a hypergraph is still a hypergraph. Whereas from the last section, one can see that, even though the input is a hypergraph, the resulting line graph is a simple graph (i.e., each edge connects only two nodes). Also, we derive attributes to each node of the line graph and there are positive weight associated with each edge. Hence, this enable us to apply graph convolution on the generated line graph, as explained below.
For the deduced line graph , say is the adjacency matrix of it, where (as mentioned in Equation 1). To apply the graph convolution, we add self loops to each node of the line graph and normalize it. Say, , where is an identity matrix. is a diagonal matrix with . For all our experiment, we use the following 2-layered graph convolution on the line graph:
Here, is the symmetric graph normalization, is a nonlinear activation function, for which we use Leaky Relu. and are the learn-able weight parameters of GCN which transforms the input feature matrix and the intermediate hidden representations respectively. is the final hidden representation of the nodes of the line graph, which are fed to a softmax layer and use the following cross entropy loss for node classification.
Here, is an indicator function if the actual label of a node in of the line graph is , as derived in Section 4.1, and is the probability with which the node has the label from the output of softmax function. We use back-propagation algorithm with ADAM optimization technique  to learn the parameters of the graph convolution on the cross entropy loss. After the training is completed, we obtain the labels of all the nodes in the line graph.
4.3 Information Propagation to the Hypergraph
Here, we move the learned node labels and the representations back from the line graph to the given hypergraph . Each node in the line graph corresponds to a hyperedge of the hypergraph. So after running the graph convolution in Section 4.2, we obtain labels and representation of each hyperedge. We again opted for a simple strategy to get them for the nodes of the hypergraph. For an unlabelled node , we take the majority of the labels of the edges it belongs to. For example, if the node is part of three hyperedges , and , and if their labels are 1, 2 and 2, then the labels of the node is assigned as 2. Similarly to get the vector representation of a node in the hypergraph, we take the average representations of all the hyperedges it belongs to.
Though the above scheme to generate hypernode representations from the node representations (as in Equation 2) of the line graph is simple in nature, it completely adheres the hypergraph Laplacian assumption that any two nodes connected by an hyperedge tend to have similar embeddings and labels. This also satisfies the well-known concept of homophily  in social networks. The different stages of LHCN are summarized in the block diagram in Figure 3.
Time Complexity of LHCN: As discussed in Section 3, the given hypergraph has number of (hyper)nodes and hyperedges. We assume feature dimension and embedding dimension are constants. On average, if a node belong to number of hyperedges, all the corresponding hyperedges will be connected to each other in the line graph. Hence, complexity of generating the line graph is . The runtime of graph convolution in the line graph would take time. Propagating the information back to the hypergraph requires time. Hence, the total runtime of LHCN is . For real work sparse graphs, is a constant. Hence the effective runtime of LHCN is .
5 Experimental Evaluation
We experiment on three publicly available popular network datasets for node classification and representation and compare the results with state-of-the-art hypergraph neural networks. Please note, as the inherent objective of graph convolution is semi-supervised in nature, we do not conduct experiment on unsupervised tasks such as hypernode clustering and hyperedge prediction. This is consistent with the hypergraph neural network literature [32, 10].
5.1 Datasets and Baseline Algorithms Used
We use three citation network datasets: Cora, Citeseer and Pubmed. We construct hypergraphs from them, as done in recent hypergraph neural network papers [32, 10]. We keep all the nodes of the original network in the hypergraph. If a research paper ‘a’ cites the papers ‘b’,‘c’ and ‘d’, then we create a hyperedge in the hypergraph. Each node in these datasets is associated with an attribute vector. Attributes represent the occurrence of a word in the research paper through bag-of-words models. For Cora and Citeseer the attributes are binary vector and for Pubmed, they are tf-idf vectors. High level summary of these datasets are presented in Table 1.
|Number of hypernodes||2708||3312||19717|
|Number of hyperedges||1579||1079||7963|
|Number of features||1433||3703||500|
|Number of Classes||7||6||3|
We have selected the following set of diverse state-of-the-art algorithms to compare with the results of the proposed algorithm LHCN.
Confidence Interval based method (CI) : Authors have proposed a semi-supervised learning approach on hypergraphs and design an algorithm for solving the convex program based on the subgradient method.
Multi-layer perceptron (MLP): This is a classical MLP which only uses the node attributes for node classification problem. The hypergraph structure is completely ignored here.
MLP + explicit hypergraph Laplacian regularisation (MLP + HLR): Here the MLP is used along with an added component to capture the hypergraph Laplacian regularizer .
Hypergraph Neural Networks: Here we used recently proposed state-of-the-art hypergraph neural networks. First, HGNN  is considered as it uses graph convolution on the weighted clique expansion of a hypergraph. Second, we consider different variants of HyperGCN as explained in . HyperGCN approximates hypergraph Laplacian into a simple graph and then uses graph convolution in it to classify the nodes of the hypergraph.
|CI||35.60 0.8||29.63 0.3||47.04 0.8|
|MLP||57.86 1.8||58.88 1.7||69.30 1.6|
|MLP+HLR||63.02 1.8||62.25 1.6||69.82 1.5|
|HGNN||67.59 1.8||62.60 1.6||70.59 1.5|
|1-HyperGCN||65.55 2.1||61.13 1.9||69.92 1.5|
|FastHyperGCN||67.57 1.8||62.58 1.7||70.52 1.6|
|HyperGCN||67.63 1.7||62.65 1.6||74.44 1.6|
|LHCN (ours)||73.34 1.7||63.19 2.23||70.76 2.36|
5.2 Experimental Setup
For all our experiments, we split the (hyper)nodes into 80-20% random train-test split and then train LHCN (with a two layer GCN on the generated line graph). We decrease the learning rate of the ADAM optimizer after every 100 epochs by a factor of 2. For Cora and Citeseer datasets, the dimensions of the first and second hidden layers of GCN are 32 and 16 respectively and we train the GCN model for 200 epochs. Pubmed being a larger dataset, the dimesnions of the hidden layers are set to 128 and 64 respectively, and we train it for 1700 epochs. We perform the experiments in a shared server having Intel(R) Xeon(R) Gold 6142 processor which contains 64 processors with 16 core each. The overall runtime (including all the stages) of LHCN on Cora, Citeseer and Pubmed datasets are approximately 16 sec., 21 sec. and 867 sec. respectively.
5.3 Results on Node Classification
We run LHCN (on 80-20% random train test splits) 10 times on each dataset and reported the average node classification accuracy and standard deviation in Table 2. Reported numbers for the baselines are taken from . First, it can be observed that graph neural network based approaches perform better than the rest. Our proposed LHCN improves the performance on Cora and Citeseer datasets. On Cora dataset, LHCN is able to outperform the closest baseline HyperGCN by roughly 8.4%. Very recently proposed HyperGCN turns out to be the best performer on Pubmed, and many of the baseline algorithms, including ours are able to generate accuracy around 70%.
5.4 Hypernode Visualization
LHCN produces vector representations of the (hyper)nodes as well. To see the visual quality of those representations, we t-SNE  which maps the vectors to two dimensional space which can be plotted easily. Figure 4 shows the visualization of hypernode representations of all the datasets using LHCN. Different colors represent different node classes. We can observe that the classes are separated to a good extent for all the datasets. We also tried to run HyperGCN to generate the hypernode representations. But the visual quality of the representations are not coming good. So, we present the results only for our algorithm.
5.5 Further Analysis on LHCN
We conduct few more experiments to get more insight about LHCN. First, we observe the convergence of training loss over the epochs of LHCN in Figure 4(a). Due to the use of ADAM and our strategy to decrease the learning rate after every 100 epochs, the training losses are decreasing fast at the beginning and converge at the end. In Figure 4(b), we change the training size from 50% to 90% with 10% increment. At the end, we include the case when training size is 95% and remaining is the test set. We can see, the mode classification performance of LHCN improves significantly over increasing training set. This implies that the algorithm is able to use the labelled nodes properly to improve the results.
6 Conclusion and Future Work
In this work, we propose a novel approach of applying graph convolution to a hypergraph via a transformation to a weighted and attributed line graph. Experimental results are promising and improve state-of-the-arts in this recently developed area of graph neural networks for hypergraphs. There is ample scope of future work in this direction. One practical issue is the suitable construction of the hypergraph so that algorithms can extract meaningful information easily. In our work, we adopt a rule to transform a hypergraph to a line graph. It would be interesting to study if this rule can be learned from the data.
- A k-uniform hypergraph is a hypergraph such that all its hyperedges have size k.
- (2006) Higher order learning with graphs. In Proceedings of the 23rd international conference on Machine learning, pp. 17–24. Cited by: §1, §2.
- (1973) Graphs and hypergraphs. Cited by: §4.2.
- (1968) Vector and tensor analysis with applications. Courier Corporation. Cited by: §2.
- (2013) Hypergraph theory. An introduction. Mathematical Engineering. Cham: Springer. Cited by: §1.
- (2009) A game-theoretic approach to hypergraph clustering. In Advances in neural information processing systems, pp. 1571–1579. Cited by: §1, §2.
- (2018) Spectral properties of hypergraph laplacian and approximation algorithms. Journal of the ACM (JACM) 65 (3), pp. 15. Cited by: §2.
- (2015) Multimodal hypergraph learning for microblog sentiment prediction. In 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. Cited by: §1, §2.
- (2018) FastGCN: fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations, External Links: Cited by: §1, §2.
- (2019) HS: active learning over hypergraphs with pointwise and pairwise queries. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2466–2475. Cited by: §2.
- (2019) Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3558–3565. Cited by: §1, §2, 4th item, §5.1, §5.
- (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1025–1035. Cited by: §1, §2.
- (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584. Cited by: §2.
- (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30 (2), pp. 129–150. Cited by: §1.
- (2009) Understanding importance of collaborations in co-authorship networks: a supportiveness analysis approach. In Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 1112–1123. Cited by: §1.
- (2019) Dynamic hypergraph neural networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 2635–2641. Cited by: §1, §2.
- (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
- (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §1.
- (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, Cited by: §2, §2.
- (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §1, §2.
- (2018) Submodular hypergraphs: p-laplacians, cheeger inequalities and spectral clustering. arXiv preprint arXiv:1803.03833. Cited by: §2.
- (2001) Birds of a feather: homophily in social networks. Annual review of sociology 27 (1), pp. 415–444. Cited by: §4.3.
- (2018) Dual-primal graph convolutional networks. arXiv preprint arXiv:1806.00770. Cited by: §1.
- (2016) Learning convolutional neural networks for graphs. In International conference on machine learning, pp. 2014–2023. Cited by: §1, §2.
- (2012) Hypergraph learning with hyperedge expansion. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 410–425. Cited by: §1, §2.
- (2014) Predicting multi-actor collaborations using hypergraphs. arXiv preprint arXiv:1401.6404. Cited by: §2.
- (2006) Multi-way clustering using super-symmetric non-negative tensor factorization. In European conference on computer vision, pp. 595–608. Cited by: §1, §2.
- (2008) Visualizing data using t-SNE. Journal of Machine Learning Research 9, pp. 2579–2605. External Links: Cited by: §5.4.
- (2018) Graph attention networks. In International Conference on Learning Representations, External Links: Cited by: §1, §2.
- (2019) Deep graph infomax. In International Conference on Learning Representations, External Links: Cited by: §2.
- (1932) Congruent graphs and the connectivity of graphs. American Journal of Mathematics 54 (1), pp. 150–168. Cited by: §4.1.
- (2019) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §2.
- (2019) HyperGCN: a new method for training graph convolutional networks on hypergraphs. In Advances in Neural Information Processing Systems, pp. 1509–1520. Cited by: §1, §2, 3rd item, 4th item, §5.1, §5.3, §5.
- (2017) Re-revisiting learning on hypergraphs: confidence interval and subgradient method. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 4026–4034. Cited by: §1, 1st item.
- (2007) Learning with hypergraphs: clustering, classification, and embedding. In Advances in neural information processing systems, pp. 1601–1608. Cited by: §1, §2.