Hebbian Graph Embeddings

Hebbian Graph Embeddings

Abstract

Representation learning has recently been successfully used to create vector representations of entities in language learning, recommender systems and in similarity learning. Graph embeddings exploit the locality structure of a graph and generate embeddings for nodes which could be words in a language, products of a retail website; and the nodes are connected based on a context window. In this paper, we consider graph embeddings with an error-free associative learning update rule, which models the embedding vector of node as a non-convex Gaussian mixture of the embeddings of the nodes in its immediate vicinity with some constant variance that is reduced as iterations progress. It is very easy to parallelize our algorithm without any form of shared memory, which makes it possible to use it on very large graphs with a much higher dimensionality of the embeddings. We study the efficacy of proposed method on several benchmark data sets and favourably compare with state of the art methods. Further, proposed method is applied to generate relevant recommendations for a large retailer.

1 Introduction

Graph embeddings learn vector representations of nodes in a graph. [3] and [6] give a comprehensive survey of graph embedding methods like node2vec [7] and also deep convolutional embeddings. The advantage of learning low dimensional embeddings is that they induce an order on the nodes of a graph which could be authors in a citation network, products in a recommender system, or words in an text corpus. The order could be established using an inner product or using another machine learning algorithm like a neural network or a random forest.

Our method uses error-free associative learning to learn the embeddings on graphs. The algorithm is quite simple, but very effective. We apply the learnt embeddings to the task of recommending items to users and to the task of link prediction and reconstruction.

Label propagation and message passing have been applied to many tasks like feature propagation [8], interest propagation, propagation of information in a population [19] and other network models of behavior like PageRank [18] and models of text like TextRank [14]. Instead of propagating a single unit of information, we propagate entire embeddings across the network. By propagating information on a graph iteratively, long distance similarities can also be learnt.

For link prediction and reconstruction, our results are directly comparable to [6]. We compare our results from the state of the art results in [6] in tables 1,5,6,7 and find that our results are better (except SEAL [24] which outperforms our method on 3 out of 4 data sets) as compared to VGAE [11], node2vec [7], GF (graph factorization) [1], SDNE [23], HOPE [17], and LE [2].

Our method is similar to LLE [20] (we compare with algorithms that were developed after LLE) and our algorithm takes inspiration from PageRank [18] and annealing [12] which is theoretically sound and it iteratively reduces the global variance. Our method is an instance of errorless learning which, as the results show, is effective and embarrassingly parallelizable.

Annealing is a process that takes steel in a furnace from a very high temperature to gradually cooling to lower temperatures. This creates a self-organizing process that improves the ductility properties of steel. A high temperature implies higher variance. We take inspiration from this process in this paper in which initially, the variance is very high and is gradually reduced with the goal that the network gains more structure and finds a stable state after the iterations complete [12].

2 Hebbian Graph Embeddings

Hebbian learning is the simplest form of learning invented by Donald Hebb in 1949 in his book “The organization of behavior” [9]. It is inspired by dynamics of biological systems. A synapse between two neurons is strengthened when the neurons on either side of the synapse (input and output) have highly correlated outputs. In essence, when an input neuron fires, if it frequently leads to the firing of the output neuron, the synapse is strengthened. In simple terms: ”neurons that fire together wire together” [9]. Recently, there’s renewed interest in Hebbian learning. [10] postulates that Hebbian learning predicts mirror-like neurons for sensations and emotions. [21] applies Hebbian learning in modelling of temporal-causal network.

Hebbian learning consists of a parameter update rule which is based on the strength of connection between two nodes, as applied to neural networks (based on firing tendencies of neurons on the opposite ends of a synapse). We extend the idea to graphs. Based on a pre-computed transition probability between two nodes, we update the parameters (the embeddings of a node) iteratively based on an error-free associative learning rule (nodes that are contextually connected should have similar embeddings, like word2vec for words [15]). For a discussion on errorless learning, please see [13].

We first initialize all embeddings to a multivariate normal distribution with mean 0 and variance .

(1)

We model the embedding at a node as a non-convex Gaussian mixture of the embeddings of the connected nodes. If there is an edge from node i to node j, the embedding of node j is modeled as follows:

(2)

The variance starts off at a value of 10 and is divided by 1.1 every iteration in the spirit of simulated annealing [12]. The embedding of node j is updated as follows:

(3)
(4)
(5)

The are then simply added to the embedding at node j (where there is an edge from node i to node j). is the transition probability and is the learning rate. The graph is weighted, asymmetric and undirected. Also, a random negative edge is selected at each node and the negative of the embeddings is propagated to both selected nodes with a fixed transition probability (we use 0.5). This iterative procedure learns the embeddings of all nodes in the graph and is able to generate very effective embeddings, as the next section shows. As shown in figure 1, the embeddings get propagated across the Gaussian graph iteratively.

Figure 1: Propagation of Embeddings Across a Graph
1:procedure FindEmbeddings()
2:     Inputs: Weighted, asymmetric and undirected graph with nodes as nodes and edge weights as transition probabilities between nodes
3:     Hyper-parameters:
4:      Variance of normal distribution (initial value = 10)
5:      Number of iterations of Hebbian learning
6:      Dimensionality of node representation
7:      Variance reduction factor (value = 1.1)
8:     Initialization: Initialize the node’s representation by sampling from a zero mean multivariate normal distribution of dimensionality
9:     for each integer in  do
10:         for each node in  do
11:              for each node in  do
(6)
(7)
12:              end for
13:         end for
(8)
14:     end for
15:end procedure
Algorithm 1 Hebbian Graph Embeddings

3 Experiments and Results

We run our algorithm on three of the data sets used in [6] namely AstroPh, BlogCatalog and HepTh for both link prediction and reconstruction. Our algorithm outperforms several other algorithms that are implemented in [6] and [5]. We also compare our algorithm for link prediction using average precision and run-time with SEAL [24] and VGAE [11] in table 6 and table 7.

We start with an initial variance of 10 and use the variance reduction factor of 1.1. We run the algorithm for 10 iterations. The algorithm is shown in Algorithm 1.

Link Prediction is the task of trying to predict a link between two nodes that were not part of the training data. Reconstruction tries to reconstruct the entire graph which is used entirely in the training set (i.e. there is no train/test separation).

We also run our algorithm on our recommender system and find that it is able to achieve a very high hit rate. Future work will focus more on the recommender system.

3.1 Results on Reconstruction

We ran our algorithm for reconstruction on publicly available data sets. Reconstruction tries to reconstruct the entire original graph (without splitting into train/test). As in [6], we sample 1024 nodes for calculation of the MAP. We run the algorithm for 10 iterations with a learning rate 1.0. The results in table 1, table 2 and figure 2 show that our algorithm is able to achieve good results on reconstruction when the dimensionality is large. As benchmarks, we use three data sets that [6] uses for reconstruction. Our results are favorably comparable on those three data sets. The other data sets are not used by [6] but the supporting code base as in [5] can be used to compare.

Algorithm Dimensionality AstroPh BlogCatalog HepTh
Hebbian Graph Embeddings 200 0.573 0.499 0.619
node2vec 256 0.56 0.24 0.42
GF 256 0.29 0.09 0.39
SDNE 256 0.46 0.33 0.5
HOPE 256 0.33 0.45 0.32
LE 256 0.26 0.09 0.4
Table 1: MAP Comparison with State of the Art for Reconstruction (see [6])
DataSet Nodes Edges Reconstruction Results of Varying Dimensionality
10 20 50 100 200 300 400 500
CondMat 23,133 93,497 0.192 0.304 0.495 0.649 0.778 0.838 0.873 0.895
GrQc 5,242 14,496 0.245 0.407 0.625 0.763 0.860 0.894 0.910 0.918
HepPh 12,008 118,521 0.196 0.293 0.455 0.586 0.698 0.755 0.789 0.814
AstroPh 18,772 198,110 0.181 0.245 0.362 0.461 0.573 0.635 0.675 0.707
HepTh 27,770 352,807 0.188 0.261 0.402 0.509 0.619 0.679 0.709 0.732
BlogCatalog 10,312 333,983 0.432 0.432 0.458 0.491 0.499 0.507 0.508 0.496
Table 2: Mean Average Precision (MAP) results for network embeddings for Reconstruction of the entire graph
DataSet Nodes Edges Random (no training)
500 (Dimension)
CondMat 23,133 93,497 0.0139
GrQc 5,242 14,496 0.0126
HepPh 12,008 118,521 0.0233
AstroPh 18,772 198,110 0.0255
HepTh 27,770 352,807 0.0292
BlogCatalog 10,312 333,983 0.0364
Table 3: Random Mean Average Precision (MAP) results (no training) for network embeddings for Reconstruction
Figure 2: Mean Average Precision for Reconstruction with Varying Dimensionality.

3.2 Results on the Recommender System of a large retailer

Also, in the recommender system at a large retailer, we used a sample of 200 thousand items as our population for training and measurement. 10% of the users are held out as the test set. The number of nodes in the graph is 200,000 and the number of edges is about 13.1 billion (note that the weight of an incoming edge might be different from an outgoing edge between any two nodes).

We measure the performance of our algorithm on the hit rate. Top 10 recommendations are generated per item based on the nearest neighbors of the generated embeddings based on an inner product (using all 200,000 items). Then, one random item from the user’s entire interaction history is chosen. Recommendations for this random item are computed. If any of the top 10 recommended items (other than the seed item) also occurs in the user’s interaction history, it is considered a hit. Otherwise a fail. The average hit rate is then the number of successes divided by the number of users in the test set. Results are shown in table 4. We use 10 iterations and a learning rate of 1.0.

The edges are determined using the induced graph from the consumer-product bipartite graph based on the co-viewing of the products. So, if two products were viewed by the same consumer, then we create an edge between them based on the same weight between them (as described in section 2).

Dimensionality HitRate@10
100 24.2%
200 30.1%
250 31.1%
Table 4: Results on a very large graph for recommender systems at a large retailer

3.3 Results on Link Prediction

For link prediction, we use some of the data sets used in [16] and [6]. As in [6], we sample 1024 nodes for calculation of the MAP. We keep 10% of the edges as a held out test set. We run the algorithm for 10 iterations with a learning rate 1.0. The results in table 8, table 9 and figure 3 show that our algorithm is able to achieve good results on link prediction when the dimensionality is large. As benchmarks, we use three data sets that [6] uses for link prediction. Our results are favorably comparable on those three data sets for link prediction. [6] also has a supporting code base [5] which can be used to compare on other data sets.

We also compare our algorithm with SEAL and VGAE and find that our algorithm outperforms VGAE on all four data-sets and outperforms SEAL on one of the four data-sets. Note that since our algorithm is an Apache Spark application, there is some initial time spent on initialization and allocation of resources. The larger the graph, the more noticeable is the difference in the run-time. For instance, it might be infeasible to run SEAL or VGAE on our recommender system data-set with 200,000 nodes and 13.1 billion edges.

Algorithm Dimensionality AstroPh BlogCatalog HepTh
Hebbian Graph Embeddings 200 0.317 0.202 0.339
node2vec 256 0.025 0.17 0.04
GF 256 0.15 0.02 0.17
SDNE 256 0.24 0.19 0.16
HOPE 256 0.25 0.07 0.17
LE 256 0.21 0.04 0.23
Table 5: MAP Comparison with State of the Art for Link Prediction (see [6])
Algorithm Power PB USAir C.Ele
Hebbian Graph Embeddings 93.11 93 95.92 84.99
SEAL 86.69 94.55 97.13 88.81
VGAE 75.91 90.38 89.27 78.32
Table 6: Comparison of Average Precision with other State of the Art algorithms for Link Prediction (see [24] for more details), randomly chosen 10% of edges held out as the test set
Algorithm Power PB USAir C.Ele
Hebbian Graph Embeddings 287 233 237 172
SEAL 1640 146 31 16
Table 7: Comparison of Run-Time (seconds) for Link Prediction (see [24] for more details)
DataSet Nodes Edges Link Prediction Results for Varying Dimensionality
10 20 50 100 200 300 400 500
CondMat 23,133 93,497 0.070 0.130 0.251 0.350 0.450 0.507 0.531 0.544
GrQc 5,242 14,496 0.064 0.129 0.233 0.292 0.332 0.348 0.363 0.383
HepPh 12,008 118,521 0.065 0.121 0.213 0.289 0.346 0.384 0.401 0.424
AstroPh 18,772 198,110 0.060 0.092 0.179 0.235 0.317 0.357 0.388 0.409
HepTh 27,770 352,807 0.070 0.120 0.203 0.259 0.339 0.370 0.383 0.407
BlogCatalog 10,312 333,983 0.183 0.182 0.198 0.198 0.202 0.217 0.210 0.212
Table 8: Mean Average Precision (MAP) results for network embeddings for Link Prediction (10% randomly chosen edges are held out as the test set)
DataSet Nodes Edges Random (no training)
500 (Dimension)
CondMat 23,133 93,497 0.007
GrQc 5,242 14,496 0.007
HepPh 12,008 118,521 0.010
AstroPh 18,772 198,110 0.009
HepTh 27,770 352,807 0.009
BlogCatalog 10,312 333,983 0.014
Table 9: Random Mean Average Precision (MAP) results (no training) for network embeddings for Link Prediction
Figure 3: Mean Average Precision for Link Prediction with Varying Dimensionality.

It is quite easy to parallelize the algorithm, and we implement it on Apache Spark. We run the algorithm for 10 iterations (which takes about 3 hours on the parallel implementation on recommender system data and from 5 minutes to 2 hours (depending on the dimensionality) on the publicly available data). We found that the learning rate does not affect the results in any significant way (we use 1.0).

4 Conclusion

In this paper, we described a simple, but very effective algorithm to learn the embeddings on a graph. The results show that the algorithm, as applied to the tasks of link prediction and reconstruction, is able to perform well when the dimensionality of the embeddings is large. This shows the effectiveness of learning on graphs using iterative methods. It’s a useful experiment of error-free (errorless) learning on graphs. Our method can learn long distance similarities because of the iterative nature of the algorithm which percolates the embeddings on the weighted graph.

A distinctive advantage of our approach is that it is very easy to parallelize the algorithm without any need for shared memory. It is quite easy to implement the algorithm on platforms like Apache Spark, which makes the algorithm amenable to very large graphs which cannot be processed on one machine.

Our recommender system work was tested live and it did very well. But because our item graph has a very large number of nodes and edges, we omit the implementation of [16] and [6] for our recommender system.

Other algorithms like in [22] and [4] could be compared with our work. There is still an opportunity to improve the algorithm through hyperparameter tuning. It might be interesting to measure the algorithm with a much higher dimensionality of the embeddings.

Acknowledgments

We thank Ramasubbu Venkatesh, Nicholas Eggert, Sayon Majumdar and Jinghe Zhang for their valuable suggestions and comments.

References

  1. A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski and A. J. Smola (2013) Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pp. 37–48. Cited by: §1.
  2. M. Belkin and P. Niyogi (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pp. 585–591. Cited by: §1.
  3. H. Cai, V. W. Zheng and K. C. Chang (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30 (9), pp. 1616–1637. Cited by: §1.
  4. B. P. Chamberlain, S. R. Hardwick, D. R. Wardrope, F. Dzogang, F. Daolio and S. Vargas (2019) Scalable hyperbolic recommender systems. arXiv preprint arXiv:1902.08648. Cited by: §4.
  5. P. Goyal and E. Ferrara (2018) GEM: a python package for graph embedding methods.. J. Open Source Software 3 (29), pp. 876. Cited by: §3.1, §3.3, §3.
  6. P. Goyal and E. Ferrara (2018) Graph embedding techniques, applications, and performance: a survey. Knowledge-Based Systems 151, pp. 78–94. Cited by: §1, §1, §3.1, §3.3, Table 1, Table 5, §3, §4.
  7. A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1, §1.
  8. C. Heaukulani and Z. Ghahramani (2013) Dynamic probabilistic models for latent feature propagation in social networks. In International Conference on Machine Learning, pp. 275–283. Cited by: §1.
  9. D. O. Hebb (1949) The organization of behavior. Wiley & Sons. Cited by: §2.
  10. C. Keysers and V. Gazzola (2014) Hebbian learning and predictive mirror neurons for actions, sensations and emotions. Philosophical Transactions of the Royal Society B: Biological Sciences 369 (1644), pp. 20130175. Cited by: §2.
  11. T. N. Kipf and M. Welling (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §1, §3.
  12. S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi (1983) Optimization by simulated annealing. science 220 (4598), pp. 671–680. Cited by: §1, §2.
  13. J. L. McClelland (2006) How far can you go with hebbian learning, and when does it lead you astray?. Processes of change in brain and cognitive development: Attention and performance xxi 21, pp. 33–69. Cited by: §2.
  14. R. Mihalcea and P. Tarau (2004) Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404–411. Cited by: §1.
  15. T. Mikolov, K. Chen, G. Corrado and J. Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §2.
  16. M. Nickel and D. Kiela (2017) Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pp. 6338–6347. Cited by: §3.3, §4.
  17. M. Ou, P. Cui, J. Pei, Z. Zhang and W. Zhu (2016) Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1105–1114. Cited by: §1.
  18. L. Page, S. Brin, R. Motwani and T. Winograd (1999) The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: §1.
  19. A. Rapoport (1953) Spread of information through a population with socio-structural bias: i. assumption of transitivity. The bulletin of mathematical biophysics 15 (4), pp. 523–533. Cited by: §1.
  20. S. T. Roweis and L. K. Saul (2000) Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. Cited by: §1.
  21. J. Treur (2016) Network-oriented modeling. Springer. Cited by: §2.
  22. T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong and X. Li (2018) Hyperbolic recommender systems. arXiv preprint arXiv:1809.01703. Cited by: §4.
  23. D. Wang, P. Cui and W. Zhu (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §1.
  24. M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, pp. 5165–5175. Cited by: §1, Table 6, Table 7, §3.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409225
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description