GravityInspired Graph Autoencoders for Directed Link Prediction
Abstract.
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods. In particular, graph AE and VAE were successfully leveraged to tackle the challenging link prediction problem, aiming at figuring out whether some pairs of nodes from a graph are connected by unobserved edges. However, these models focus on undirected graphs and therefore ignore the potential direction of the link, which is limiting for numerous reallife applications. In this paper, we extend the graph AE and VAE frameworks to address link prediction in directed graphs. We present a new gravityinspired decoder scheme that can effectively reconstruct directed graphs from a node embedding. We empirically evaluate our method on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three realworld graphs, outperforming several popular baselines.
1. Introduction
Graphs are useful data structures to efficiently represent relationships among items. Due to the proliferation of graph data (Zhang et al., 2018; Wu et al., 2019b), a large variety of specific problems initiated significant research efforts from the Machine Learning community, aiming at extracting relevant information from such structures. This includes node clustering (Malliaros and Vazirgiannis, 2013), influence maximization (Kempe et al., 2003), graph generation (Simonovsky and Komodakis, 2018) and link prediction, on which we focus in this paper.
Link prediction consists in inferring the existence of new relationships or still unobserved interactions (i.e. new edges in the graph) between pairs of entities (nodes) based on observable links and on their properties (LibenNowell and Kleinberg, 2007; Wang et al., 2015). This challenging task has been widely studied and successfully applied to several domains. In biological networks, link prediction models were leveraged to predict new interactions between proteins (Kovács et al., 2019). It is also present in our daily lives, suggesting people we may know but we are still not connected to, in our social networks (LibenNowell and Kleinberg, 2007; Wang et al., 2015; Haghani and Keyvanpour, 2017). Besides, link prediction is closely related to numerous recommendation tasks (Li et al., 2014; Zhao et al., 2016; Berg et al., 2018).
Link prediction has been historically addressed through graph mining heuristics, via the construction of similarity indices between nodes, capturing the likelihood of their connection in the graph. The AdamicAdar and Katz indices (LibenNowell and Kleinberg, 2007), reflecting neighborhood structure and node proximity, are notorious examples of such similarity indices. More recently, along with the increasing efforts in extending Deep Learning methods to graph structures (Scarselli et al., 2009; Bruna et al., 2014; Wu et al., 2019b), these approaches have been outperformed by the node embedding paradigm (Tang et al., 2015; Grover and Leskovec, 2016; Zhang et al., 2018). In a nutshell, the strategy is to train graph neural networks to represent nodes as vectors in a lowdimensional vector space, namely the embedding space. Ideally, in such space nodes with a structural proximity in the graph should be close to each other. Therefore, one can resort to proximity measures such as inner products between vector representations to predict new unobserved links in the underlying graph. In this direction, the graph extensions of autoencoders (AE) (Rumelhart et al., 1986; Baldi, 2012) and variational autoencoders (VAE) (Kingma and Welling, 2013; Tschannen et al., 2018) recently appeared as stateoftheart approaches for link prediction in numerous experimental analyses (Wang et al., 2016; Kipf and Welling, 2016b; Pan et al., 2018; Tran, 2018; Salha et al., 2019).
However, these models focus on undirected graphs and therefore ignore the potential direction of the link. As explained in section 2, a graph autoencoder predicting that node is connected to node will also predict that node is connected to node , with the same probability. This is limiting for numerous reallife applications, as directed graphs are ubiquitous. For instance, web graphs are made up of directed hyperlinks. In social networks such as Twitter, opinion leaders are usually followed by many users, but only few of these connections are reciprocal. Moreover, directed graphs are efficient abstractions in many domains where data are not explicitly structured as graphs. For instance, on music streaming platforms, the page providing information about an artist will usually display the most similar artists. Artists similarities can be represented in a graph, in which nodes are artists, connected to their most similar neighbors. Such graph is definitely directed: indeed, while Bob Marley might be among the most similar artists of a new unknown reggae band, it is unlikely that this band should be presented among Bob Marley’s top similar artists in his page.
Directed link prediction has been tackled through graph mining asymmetric measures (Yu and Wang, 2014; Garcia Gasulla, 2015; Schall, 2015) and, recently, a few attempts at capturing asymmetric proximity when creating node embeddings were proposed (Miller et al., 2009; Ou et al., 2016; Zhou et al., 2017). However, the question of how to reconstruct directed graphs from vector space representations to effectively perform directed link prediction remains widely open. In particular, it is unclear how to extend graph AE and graph VAE to directed graphs and to which extent the promising performances of these models on undirected graphs could also be achieved on directed link prediction tasks. We propose to address these questions in this paper, making the following contributions:

We present a new model to effectively learn node embeddings from directed graphs using the graph AE and VAE frameworks. We draw inspiration from Newton’s theory of universal gravitation to introduce a new decoder scheme, able to reconstruct asymmetric relationships from vector space node embeddings.

We empirically evaluate our approach on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three realworld datasets, outperforming popular baselines. To the best of our knowledge, these are the first graph AE/VAE experiments on directed graphs.

We publicly release our code^{1}^{1}1https://github.com/deezer/gravity_graph_autoencoders for these experiments, for reproducibility and easier future usages.
This paper is organized as follows. In Section 2, we recall key concepts related to graph AE and VAE and we explain why these models are not suitable for directed link prediction. In Section 3, we introduce our gravityinspired method to reconstruct directed graphs using graph AE or VAE, and effectively perform directed link prediction. We present and discuss our experimental analysis in Section 4, and we conclude in Section 5.
2. Preliminaries
In this section, we provide an overview of graph AE, VAE and of their main applications to link prediction. In the following, we consider a graph without selfloops, with nodes and edges that can be directed. We denote by the adjacency matrix of , that is either binary or weighted. Moreover, nodes can possibly have features vectors of size , gathered in an matrix . Otherwise, is the identity matrix .
2.1. Graph Autoencoders
Graph autoencoders (Kipf and Welling, 2016b; Wang et al., 2016) are a family of unsupervised models extending autoencoders (Rumelhart et al., 1986; Baldi, 2012) to graph structures. Their goal is to learn a node embedding, i.e. a low dimensional vector space representation of the nodes. Graph AE are composed of two stacked models:

Firstly, an encoder model assigns a latent vector of size , with , to each node of the graph. The matrix of all latent vectors is usually the output of a Graph Neural Network (GNN) applied on and, where appropriate, on , i.e. we have .

Then, a decoder model aims at reconstructing the adjacency matrix from , using another GNN or a simpler alternative. For instance, in (Kipf and Welling, 2016b) and in several extensions of their model (Pan et al., 2018; Salha et al., 2019), decoding is obtained through an inner product between latent vectors, along with a sigmoid activation or, if is weighted, some more complex thresholding. In other words, the larger the inner product , the more likely node and are connected in the graph according to the model. Denoting the reconstruction of from the decoder, we have .
The intuition behind autoencoders is the following: if, starting from the latent vectors, the decoder is able to reconstruct an adjacency matrix that is close to the original one, then it should mean that these representations preserve some important characteristics of the graph structure. Graph AE are trained by minimizing the reconstruction loss of the graph structure (Wang et al., 2016), with the Frobenius matrix norm, or alternatively a weighted cross entropy loss (Kipf and Welling, 2016b), by stochastic gradient descent (Goodfellow et al., 2016).
2.2. Graph Convolutional Networks
Throughout this paper, as (Kipf and Welling, 2016b) and most subsequent works (Pan et al., 2018; Grover et al., 2018; Salha et al., 2019; Do et al., 2019), we assume that the GNN encoder is a Graph Convolutional Network (GCN) (Kipf and Welling, 2016a). In a GCN with layers, with and , we have:
In the above equation, denotes some normalized version of . As undirected graphs were considered in existing models, a usual choice is the symmetric normalization where is the diagonal degree matrix of . In a nutshell, for each layer , we average the feature vectors from of the neighbors of a given node, together with its own feature information (thus the ) and with a ReLU activation: . Weights matrices are trained by stochastic gradient descent.
We rely on GCN encoders for three main reasons: 1) consistency with previous efforts on graph AE, 2) capitalization on previous successes of GCNbased graph AE (see subsection 2.4) and, last but not least, 3) for computation efficiency. Indeed, evaluating each layer of a GCN has a linear complexity w.r.t. the number of edges (Kipf and Welling, 2016a). Speedup strategies to improve the training of GCNs were also proposed (Chen et al., 2018; Wu et al., 2019a). Nonetheless, we point out that the method we present in this article is not limited to GCN and would still be valid for any alternative encoder, e.g. for more complex encoders such as ChebNet (Defferrard et al., 2016) that sometimes empirically outperform GCN encoders (Salha et al., 2019).
2.3. Variational Graph Autoencoders
(Kipf and Welling, 2016b) introduced Variational Graph Autoencoders, denoted VGAE, a graph extension of VAE (Kingma and Welling, 2013). While sharing the name autoencoder, VAE are actually based on quite different mathematical foundations. Specifically, (Kipf and Welling, 2016b) assume a probabilistic model on the graph that involves some latent variables of length for each node . Such vectors are the node representations in a low dimensional embedding space . Denoting by the matrix of all latent vectors, authors define the inference model as follows:
The latent vectors themselves are random samples drawn from the learned distribution, and this inference step is referred to as the encoding part of the graph VAE. Parameters of Gaussian distributions are learned using two GCNs. In other words, , the matrix of mean vectors , is defined as . Likewise, .
Then, a generative model attempts to reconstruct using, as for graph AE, inner products between latent variables:
As before, is the sigmoid activation function. This is the decoding part of the model. (Kipf and Welling, 2016b) optimize GCN weights by maximizing a tractable variational lower bound (ELBO) of the model’s likelihood:
with a Gaussian prior , using fullbatch gradient descent and leveraging the reparameterization trick (Kingma and Welling, 2013). is the KullbackLeibler divergence (Kullback and Leibler, 1951).
2.4. Graph AE and VAE for Undirected Link Prediction
In the last three years, graph AE/VAE and their extensions have been successfully leveraged to tackle several challenging tasks, such as node clustering (Wang et al., 2017; Pan et al., 2018; Salha et al., 2019), recommendation from bipartite graphs (Berg et al., 2018; Do et al., 2019) and graph generation, notably biologically plausible molecule generation from graph VAE’s generative models (Liu et al., 2018; Ma et al., 2018; Jin et al., 2018; Simonovsky and Komodakis, 2018; Samanta et al., 2018). We refer to the aforementioned references for a broader overview of these applications, and focus on link prediction tasks in the remaining of this section.
Link prediction has been the main evaluation task for graph AE and VAE in the seminal work of (Kipf and Welling, 2016b) and in numerous extensions (Tran, 2018; Pan et al., 2018; Grover et al., 2018; Salha et al., 2019). In a nutshell, authors evaluate the global ability of their models to predict whether some pairs of nodes from an undirected graph are connected by unobserved edges, using the latent space representations of the nodes. More formally, in such setting autoencoders are usually trained on an incomplete version of the graph where a proportion of the edges, say 10%, were randomly removed. Then, a test set is created, gathering these missing edges and the same number of randomly picked pairs of unconnected nodes. Authors evaluate the model’s ability to discriminate the true edges (i.e. in the complete adjacency matrix) from the fake ones () using the decoding of the latent vectors . In other words, they predict that nodes are connected when is larger than some threshold. This is a binary classification task, typically assessed using the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) or the Average Precision (AP) scores. For such tasks, graph AE and VAE have been empirically proven to be competitive and often outperforming w.r.t. several popular node embeddings baselines, notably Laplacian eigenmaps (Belkin and Niyogi, 2003) and word2veclike models such as DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015) and node2vec (Grover and Leskovec, 2016).
We point out that most of these experiments are focusing to mediumsize graphs with up to a few thousand nodes and edges. This is mainly due to the limiting quadratic time complexity of the inner product decoder, which involves the multiplication of the dense matrices and . However, (Salha et al., 2019) recently bypassed this scalability issue and introduced a general framework for more scalable graph AE and VAE, leveraging graph degeneracy concepts (Malliaros et al., 2019). They confirmed the competitive performance of graph AE and VAE for largescale link prediction, based on experiments on undirected graphs with up to millions of nodes and edges.
2.5. Why do these models fail to perform Directed Link Prediction?
At this stage, we recall that all previously mentioned works assume, either explicitly or implicitly, that the input graph is undirected. By design, graph AE and VAE are not suitable for directed graphs, since they are ignoring directions when reconstructing the adjacency matrix from the embedding. Indeed, due to the symmetry of the inner product decoder, we have:
In other words, if we predict the existence of an edge from node to node , then we also necessarily predict the existence of the reverse edge , with the same probability. As a consequence, as we empirically show in section 4, standard graph AE and VAE significantly underperform on link prediction tasks in directed graphs, where relationships are not always reciprocal.
Replacing inner product decoders by an distance in the embedding (e.g. the Euclidean distance, if ) or by existing more refined decoder schemes (Grover et al., 2018) would lead to the same conclusion, since they are also symmetric. Recently, (Zhang et al., 2019) proposed DVAE, a variational autoencoders for small Directed Acyclic Graphs (DAG) such as neural networks architectures or bayesian networks, focusing on neural architecture search and structure learning. However, the question of how to extend graph AE and VAE to general directed graphs, such as citation networks or web hyperlink networks, where directed link prediction is challenging, remains open.
2.6. On the Source/Target Vectors Paradigm
To conclude these preliminaries, we highlight that, out of the graph AE/VAE frameworks, a few recent node embeddings methods proposed to tackle directed link prediction by actually learning two latent vectors for each node. More precisely:

HOPE, short for HighOrder Proximity preserved Embedding (Ou et al., 2016), aims at preserving highorder node proximities and capturing asymmetric transitivity. Nodes are represented by two vectors: source vectors , stack up in an matrix , and target vectors , gathered in another matrix . For a given similarity matrix , authors learn these vectors by approximately minimizing using a generalized SVD. For directed graphs, a usual choice for is the Katz matrix , with if the parameter is smaller than the spectral radius of (Katz, 1953). It computes the number of paths from a node to another one, these paths being exponentially weighted according to their length. For link prediction, one can assess the likelihood of a link from node to node using the asymmetric reconstruction .

APP (Zhou et al., 2017) is a scalable Asymmetric Proximity Preserving node embedding method, that conserves the Rooted PageRank score (Page et al., 1999) for any node pair. APP leverages random walk with restart strategies to learn, as HOPE, a source vector and a target vector for each node. As before we predict that node is connected to node from the inner product of source vector and target vector , with a sigmoid activation.
One can derive a straightforward extension of this source/target vectors paradigm to graph AE and VAE. Indeed, considering GCN encoders returning dim latent vectors , with being even, we can assume that the first dimensions (resp. the last dimensions) of actually correspond to the source (resp. target) vector of node , i.e. and . Then, we can replace the symmetric decoder by and , both in the AE and VAE frameworks, to reconstruct directed links from encoded representations. We refer to this method as source/target graph AE (or VAE).
However, in the following of this paper, we adopt a different approach and we propose to come back to the original idea consisting in learning a single node embedding, and therefore represent each node via a single latent vector. Such approach has a stronger interpretability power and, as we later show in the experimental part of this paper, it also significantly outperforms source/target graph AE and VAE on directed link prediction tasks.
3. A Gravityinspired model for Directed Graph AE and VAE
In this section, we introduce a new model to learn node embeddings from directed graphs using the AE and VAE frameworks, and to address the directed link prediction problem. The main challenge is the following: how to effectively reconstruct asymmetric relationships from encoded representations that are (unique) latent vectors in a node embedding where inner product and standard distances are symmetric?
To overcome this challenge, we resort to classical mechanics and especially to Newton’s theory of universal gravitation. We propose an analogy between latent node representations in an embedding and celestial objects in space. Specifically, even if the EarthMoon distance is symmetric, the acceleration of the Moon towards the Earth due to gravity is larger than the acceleration of the Earth towards the Moon. As explained below, this is due to the fact that the Earth is more massive. In the remaining of this section, we transpose these notions of mass and acceleration to node embeddings to build up our asymmetric graph decoding scheme.
3.1. Newton’s Theory of Universal Gravitation
According to Newton’s theory of universal gravitation (Newton, 1687), each particle in the universe attracts the other particles through a force called gravity. This force is proportional to the product of the masses of the particles, and inversely proportional to the squared distance between their centers. More formally, let us denote by and the positive masses of two objects and and by the distance between their centers. Then, the gravitational force attracting the two objects is:
where is the gravitational constant (Cavendish, 1798). Then, using Newton’s second law of motion (Newton, 1687), we derive , the acceleration of object towards object due to gravity:
Likewise, the acceleration of towards due to gravity is:
We note that when . More precisely, we have when and conversely, i.e. the acceleration of the less massive object towards the more massive object due to gravity is higher.
Despite being superseded in modern physics by Einstein’s theory of general relativity (Einstein, 1915), describing gravity not as a force but as a consequence of spacetime curvature, Newton’s law of universal gravitation is still used in many applications, as the theory provides precise approximations of the effect of gravity when gravitational fields are not extreme. In the following of this paper, we directly draw inspiration from this theory, notably from the formulation of acceleration, to build our proposed autoencoders models. We highlight that Newtonian gravity concepts were already successfully leveraged in (Bannister et al., 2012) for graph visualization, and in (WahidUlAshraf et al., 2017) where the force formula has been transposed to graph mining measures, to construct symmetric similarity scores among nodes.
3.2. From Physics to Node Representations
Let us come back to our initial analogy between celestial objects in space and node embeddings. In this subsection, let us assume that, in addition to a latent vector of dimension , we have at our disposal a model that is also able to learn a new mass parameter for each node of a directed graph. Such parameter would capture the propensity of to attract other nodes from its neighborhood in this graph, i.e. to make them point towards through a directed edge. From such augmented model, we could apply Newton’s equations in the resulting embedding. Specifically, we could use the acceleration of a node towards a node due to gravity in the embedding as an indicator of the likelihood that is connected to in the directed graph, with . In a nutshell:

The numerator captures the fact that some nodes are more influential than others in the graph. For instance, in a scientific publications citation network, seminal groundbreaking articles and more influential and should be more cited. Here, the bigger the more likely will be connected to via the directed edge.

The denominator highlights that nodes with structural proximity in the graph, typically with a common neighborhood, are more likely to be connected, provided that the model effectively manages to embed these nodes close to each other in the latent space representation. For instance, in a scientific publications citation network, article will more likely cite article if it comes from a similar field of study.
More precisely, instead of directly dealing with , we use in the remaining of this paper. Taking the logarithm has two advantages. Firstly, thanks to its concavity it limits the potentially large values resulting from acceleration towards very central nodes. Also, can be negative, which is more convenient to reconstruct an unweighted edge (i.e. in the adjacency matrix we have or ) using a sigmoid activation function, as follows:
3.3. GravityInspired Directed Graph AE
For pedagogical purposes, we assumed in subsection 3.2 that we had at our disposal a model able to learn mass parameters for all . Let us know detail how we actually derive such parameters, using the graph autoencoder framework.
3.3.1. Encoder
For the encoder part of the model, we resort to a Graph Convolutional Network processing and, potentially, a node features matrix . Such GCN assigns a vector of size to each node of the graph, instead of as in standard graph autoencoders. The first dimensions correspond to the latent vector representation of the node i.e. , where is the dimension of the node embedding. The last value of the output vector is the model’s estimate of . To sum up, we have:
where is the matrix of all latent vectors , is the dimensional vector of all values of , and is the matrix rowconcatenating and . We note that learning is equivalent to learning , but is also more convenient since we get rid of the gravitational constant and of the logarithm.
In this GCN encoder, as we process directed graphs, we replace the usual symmetric normalisation of , i.e. , by the outdegree normalization . Here, denotes the diagonal outdegree matrix of , i.e. the element of corresponds to the number of edges (potentially weighted) going out of node , plus one. Therefore, at each layer of the GCN, the feature vector of a node is the average of features vectors from previous layer of the neighbors to which it points, together with its own feature vector and with a ReLU activation.
3.3.2. Decoder
We leverage the previously defined logarithmic version of acceleration, together with a sigmoid activation, to reconstruct the adjacency matrix from and . Denoting the reconstruction of , we have:
Contrary to the inner product decoder, we usually have . This approach is therefore more relevant for directed graph reconstruction. Model training is similar to standard graph AE, i.e. we aim at minimizing the reconstruction loss from matrix , formulated as a weighted cross entropy loss as in (Kipf and Welling, 2016b), by stochastic gradient descent.
3.4. GravityInspired Directed Graph VAE
We also propose to extend our gravityinspired method to the graph variational autoencoder framework.
3.4.1. Encoder
We extend (Kipf and Welling, 2016b) to build up an inference model for . In other words, the dimensional latent vector associated to each node is , concatenating the dimensional vector and the scalar . We have:
with Gaussian hypotheses, as (Kipf and Welling, 2016b):
Parameters of Gaussian distributions are learned using two GCNs, with similar outdegree normalization w.r.t. subsection 3.3:
3.4.2. Decoder
From sample vectors from these distributions, we then incorporate our gravityinspired decoding scheme into the generative model attempting to reconstruct :
where:
As (Kipf and Welling, 2016b), we train the model by maximizing the ELBO of the model’s likelihood using fullbatch gradient descent and with a Gaussian prior . We discuss these Gaussian assumptions in the experimental part of this paper.
3.5. Generalization of the Decoding Scheme
We point out that one can improve the flexibility of our decoding scheme, both in the AE and VAE settings, by introducing an additional parameter and reconstruct as follows:
Decoders from sections 3.3 and 3.4 are special cases where . This parameter can be tuned by crossvalidation on link prediction tasks (see section 4). The interpretation of such parameter is twofold. Firstly, it constitutes a simple tool to balance the relative importance of the node distance in the embedding for reconstruction w.r.t. the mass attraction parameter. Then, from a physical point of view, it is equivalent to replacing the squared distance in Newton’s formula by a distance to the power of . In our experimental analysis on link prediction, we provide insights on when and why deviating from Newton’s actual theory (i.e. ) is relevant.
3.6. On Complexity and Scalability
Assuming featureless nodes, a sparse representation of adjacency matrix with nonzero entries, and considering that our models return a dense embedding matrix , then the space complexity of our approach is , both in the AE and VAE frameworks. If nodes also have features to stack up in the matrix , then the space complexity becomes , with and in practice. Therefore, as standard graph AE and VAE models (Kipf and Welling, 2016b), space complexity increases linearly w.r.t. the size of the graph.
Moreover, due to the pairwise computations of distances between all dimensional vectors and involved in our gravityinspired decoding scheme, our models have a quadratic time complexity w.r.t. the number of nodes in the graph, as standard graph AE and VAE. As a consequence we focus on mediumsize datasets, i.e. graphs with thousands of nodes and edges, in our experimental analysis. We nevertheless point out that extending our model to very large graphs (with millions of nodes and edges) could be achieved by applying the degeneracy framework proposed in (Salha et al., 2019) to scale graph autoencoders, or a variant of their approach involving directed graph degeneracy concepts (Giatsidis et al., 2013). Future works will provide a deeper investigation of these scalability concerns.
4. Experimental Analysis
In this section, we empirically evaluate and discuss the performance of our models, on three realworld datasets and on three variants of the directed link prediction problem.
4.1. Three Directed Link Prediction Tasks
We consider the following three learning tasks for our experiments.
4.1.1. Task 1: General Directed Link Prediction
The first task is referred to as general directed link prediction. As previous works (Kipf and Welling, 2016a; Grover et al., 2018; Pan et al., 2018; Salha et al., 2019), we train models on incomplete versions of graphs where of edges were randomly removed. We take directionality into account in the masking process. In other words, if a link between node and is reciprocal, we can possibly remove the edge but still observe the reverse edge in the training incomplete graph. Then, we create validation and test sets from removed edges and from the same number of randomly sampled pairs of unconnected nodes. We evaluate the performance of our models on a binary classification task consisting in discriminating the actual removed edges from the fake ones, and compare results using the AUC and AP scores. In the following, the validation set contains of edges, and the test set contains of edges. Validation set is only used for hyperparameters tuning.
This setting corresponds to the most general formulation of link prediction. However, due to the large number of unconnected pairs of nodes in most realworld graphs, we expect the impact of directionality on performances to be limited. Indeed, for each actual unidirectional edge from the graph, it is unlikely to retrieve the reverse (unconnected) pair among negative samples in test set. As a consequence, models focusing on graph proximity and ignoring the direction of the link, such as standard graph AE and VAE, might still perform fairly on such task.
For this reason, in the following of this subsection we also propose two additional learning tasks, designed to reinforce the importance of directionality learning.
4.1.2. Task 2: Biased Negative Samples (B.N.S.) Link Prediction
For the second task, we also train models on incomplete versions of graphs where of edges were removed: for validation set and for test set. However, removed edges are all unidirectional, i.e. exists but not . In this setting, the reverse node pairs are included in validation and test sets and constitute negative samples. In other words, all node pairs from validation and test sets are included in both directions. As for general directed link prediction task, we evaluate the performance of our models on a binary classification task consisting in discriminating actual edges from fake ones, and therefore evaluate the ability of our models to correctly reconstruct and simultaneously.
This task has been presented in (Zhou et al., 2017) under the name biased negative samples link prediction. It is more challenging w.r.t. general link direction, as the ability to reconstruct asymmetric relationships is more crucial. Therefore, models ignoring directionality and only learning from symmetric graph proximity, such as standard graph AE and VAE, will fail in such setting.
4.1.3. Task 3: Bidirectionality Prediction
As a third task, we evaluate the ability of our models to discriminate bidirectional edges, i.e. reciprocal connections, from unidirectional edges. Specifically, we create an incomplete training graph by removing at random one of the two directions of all bidirectional edges. Therefore, the training graph only has unidirectional connections. Then, a binary classification problem is once again designed, aiming at retrieving bidirectional edges in a test set composed of their removed direction and of the same number of reverse directions from unidirectional edges (that are therefore fake edges). In other words, for each pair of nodes from the test set, we observe a connection from to in the incomplete training graph, but only half of them are reciprocal. This third evaluation task, referred to as bidirectionality prediction in this paper, also strongly relies on directionality learning. As a consequence, as for task 2, standard graph AE and VAE are expected to perform poorly.
4.2. Experimental Setting
\topruleDataset  Number of  Number of  Percentage of 

nodes  edges  reciprocity  
\midrule\midruleCora  
Citeseer  
\bottomrule 
4.2.1. Datasets
\topruleDataset  Model  Task 1: General Link Prediction  Task 2: B.N.S. Link Prediction  Task 3: Bidirectionality Prediction  

AUC (in %)  AP (in %)  AUC (in %)  AP (in %)  AUC (in %)  AP (in %)  
\midrule\midruleCora  Gravity Graph VAE (ours)  
Gravity Graph AE (ours)  
\cmidrule28  Standard Graph VAE  
Standard Graph AE  
Source/Target Graph VAE  
Source/Target Graph AE  
APP  
HOPE  
node2vec  
\midrule\midruleCiteseer  Gravity Graph VAE (ours)  
Gravity Graph AE (ours)  
\cmidrule28  Standard Graph VAE  
Standard Graph AE  
Source/Target Graph VAE  
Source/Target Graph AE  
APP  
HOPE  
node2vec  
\midrule\midruleGoogle  Gravity Graph VAE (ours)  
Gravity Graph AE (ours)  
\cmidrule28  Standard Graph VAE  
Standard Graph AE  
Source/Target Graph VAE  
Source/Target Graph AE  
APP  
HOPE  
node2vec  
\bottomrule 
We provide experiments on three publicly available realworld directed graphs, whose statistics are presented in Table 1. The Cora^{2}^{2}2https://linqs.soe.ucsc.edu/data and Citeseer^{2} datasets are citation graphs consisting of scientific publications citing one another. The Google^{3}^{3}3http://konect.unikoblenz.de/networks/cfindergoogle dataset is a web graph, whose nodes are web pages and directed edges represent hyperlinks between them. The Google graph is denser than Cora and Citeseer and has a higher proportion of bidirectional edges. Graphs are unweighted and featureless.
4.2.2. Standard and GravityInspired Autoencoders
We train gravityinspired AE and VAE models for each graph. For comparison purposes, we also train standard graph AE and VAE from (Kipf and Welling, 2016b). Each of these four models includes a twolayer GCN encoder with 64dim hidden layer and with outdegree left normalization of as defined in subsection 3.3.1. All models are trained for 200 epochs and return 32dim latent vector node representations. We use Adam optimizer (Kingma and Ba, 2015), apply a learning rate of 0.1 for Cora and Citeseer and 0.2 for Google, train models without dropout, performing fullbatch gradient descent and using the reparameterization trick (Kingma and Welling, 2013) for variational autoencoders. Also, for tasks 1 and 3 we picked (respectively ) for Cora and Citeseer (resp. for Google) ; for task 2 we picked for all three graphs, which we interpret in next subsections. All hyperparameters were tuned from AUC score on task 1 i.e. on general directed link prediction task.
4.2.3. Baselines
Besides standard AE and VAE models, we also compare the performance of our methods w.r.t. the alternative graph embedding methods introduced in subsection 2.6:

Our Source/Target Graph AE and VAE, extending the source target vectors paradigm to graph AE and VAE, and trained with similar settings w.r.t. standard and gravity models.

HOPE (Ou et al., 2016), setting and with source and target vectors of dimension 16, to learn 32dim node representations.

For comparison purposes, in our experiments we also train node2vec models (Grover and Leskovec, 2016) that, while dealing with directionality in random walks, only return one 32dim embedding vector per node. We rely on symmetric inner products with sigmoid activation for link prediction, and we therefore expect node2vec to underperform w.r.t. APP. We trained models from 10 random walks of length 80 per node, with and a window size of 5.
We used Python and especially the Tensorflow library, except for APP where we used the authors’ Java implementation (Zhou et al., 2017). We trained models on a NVIDIA GTX 1080 GPU and ran other operations on a double Intel Xeon Gold 6134 CPU.
4.3. Results for Directed Link Prediction
Table 2 reports mean AUC and AP, along with standard errors over 100 runs, for each dataset and for the three tasks. Train incomplete graphs and test sets are different for each of the 100 runs. Overall, our gravityinspired graph AE and VAE models achieve very competitive results.
On task 1, standard graph AE and VAE, despite ignoring directionality for graph reconstruction, still perform fairly well (e.g. AUC for standard graph VAE on Cora). This emphasizes the limited impact of directionality on performances for such task, as planned in subsection 4.1.1. Nonetheless, our gravityinspired models significantly outperform the standard ones (e.g. AUC for gravityinspired graph VAE on Cora), confirming the relevance of capturing both proximity and directionality for general directed link prediction. Moreover, our models are competitive w.r.t. baselines designed for directed graphs. Among them, APP is the best on our three datasets, together with the source/target graph AE on Google graph.
On task 2, i.e. biased negative samples link prediction, our gravityinspired models persistently achieve the best performances (e.g. a top AUC on Citeseer, points above the best baseline). We notice that models ignoring directionality for prediction, i.e. node2vec and standard graph AE and VAE, totally fail ( AUC and AP on all graphs, corresponding to the random classifier level) which was expected since test sets include both directions of each node pair. Experiments on task 3, i.e. on bidirectionality prediction, confirm the superiority of our approach when dealing with challenging tasks where directionality learning is crucial. Indeed, on this last task, gravityinspired models also outperform alternative approaches (e.g. with a top AUC for gravityinspired graph AE on Google).
While the AE and VAE frameworks are based on different foundations, we found no significant performance gap in our experiments between (standard, asymmetric, or gravityinspired) autoencoders and their variational counterparts. This result is consistent with previous insights from (Kipf and Welling, 2016b; Salha et al., 2019) on undirected graphs. Future works will investigate alternative prior distributions for graph VAE, aiming at challenging the traditional Gaussian hypothesis that, despite being very convenient for computations, might not be an optimal choice in practice (Kipf and Welling, 2016b). Last, we note that all AE/VAE models required a comparable training time of roughly 7 seconds (respectively 8 seconds, 5 minutes) for Cora (resp. for Citeseer, for Google) on our machine. Baselines were faster: for instance, on the largest Google graph, 1 minute (resp. 1.30 minutes, 2 minutes) were required to train HOPE (resp. APP, node2nec).
4.4. Discussion
To pursue our experimental analysis, we propose a discussion on the nature of , on the role of to balance node proximity and influence, and on some limits and openings of our work.
4.4.1. Deeper insights on
Figure 1 displays a visualization of Cora graph, using embeddings and parameters learned through our gravityinspired graph VAE. In such visualization, we observe that nodes with smaller ”masses” tend to be connected to nodes with larger ”masses” from their embedding neighborhood, which was expected by design of our decoding scheme.
From Figure 1, one might argue that tend to reflect centrality. In this direction, we compare to the most common graph centrality measures. Specifically, in Table 3 we report Pearson correlation coefficients of w.r.t. the following measures:

The indegree and outdegree of the node, that are respectively the number of edges coming into and going out of the node.

The betweenness centrality, which is, for a node , the sum of the fraction of all pairs shortest paths that goes through : , where is the number of shortest paths from node to node , and is the number of those paths going through (Brandes, 2008).

The PageRank (Page et al., 1999), computing a ranking of nodes importances based on the structure of incoming links. It was originally designed to rank web pages, and it can be seen as the stationary distribution of a random walk on the graph.

The Katz centrality, a generalization of the eigenvector centrality. Katz centrality of node is , where is the adjacency matrix with largest eigenvalue , with usually and with (Katz, 1953).
\topruleCentrality Measures  Cora  Citeseer  

\midrule\midruleIndegree  
Outdegree  
Betweenness  
Pagerank  
Katz    
\bottomrule 
As observed in Table 3, the parameter is positively correlated to all of these centrality measures, except for outdegree where the correlation is negative (or almost null for Google), meaning that nodes with few edges going out of them tend to have larger values of . Correlations are not perfect which emphasizes that our models do not exactly learn one of these measures. We also note that centralities are lower for Google, which might be due to the structure of this graph and especially to its density.
In our experiments, we tried to replace by any of these (normalized) centrality measures when performing link prediction, and to learn optimal vectors for these fixed masses values, achieving underperforming results. For instance, we reached a AUC by using betweenness centrality on Cora instead of the actual learned by the VAE, which is above standard graph VAE ( AUC) but below the gravity VAE with ( AUC). Also, using centrality measures as initial values for before model training did not significantly impact performances in experiments.
4.4.2. Impact of parameter
In subsection 3.5 we introduced a parameter to tune the relative importance of node proximity w.r.t. mass attraction, leading to the reconstruction scheme . In Figure 2, we show the impact of on mean AUC scores for the VAE model and for all three datasets. For Cora and Citeseer, on task 1 and task 3, is an optimal choice, consistently with Newton’s formula (see Figure 2 (a) and (c)). However, for Google, on task 1 and task 3, we obtained better performances for higher values of , notably for that we used in our experiments. Increasing reinforces the relative importance of node symmetric proximity in the decoder, measured by , w.r.t. parameter capturing the global influence of a node on its neighbors and therefore asymmetries in links. Since the Google graph is much denser than Cora and Citesser, and has a higher proportion of symmetric relationships (see Table 1), putting the emphasis on node proximity appears as a relevant strategy.
On a contrary, on task 2 we achieved optimal performances for , for all three graphs (see Figure 2 (b)). Since , we therefore improved scores by assigning more relative importance to the mass parameter . Such result is not surprising since, for biased negative samples link prediction task, learning directionality is more crucial than learning proximity, as nodes pairs from test sets are all included in both directions. As display in Figure 2 (b), increasing significantly deteriorates performances.
4.4.3. Extensions and openings
Throughout these experiences, we focused on featureless graphs, to fairly compete with HOPE, APP and node2vec. However, as explained in section 3, our models can easily leverage node features, in addition to the graph structure summarized in . Moreover, the gravityinspired method is not limited to GCN encoder and can be generalized to any alternative graph neural network. Future works will provide more evidence on such extensions, will investigate bettersuited priors for graph VAE, and will generalize existing scalable graph AE/VAE framework (Salha et al., 2019) to directed graphs. We also aim at exploring to which extent graph AE/VAE can tackle the node clustering problem in directed graphs.
5. Conclusion
In this paper we presented a new method, inspired from Newtonian gravity, to learn node embeddings from directed graphs, using graph AE and VAE. We provided experimental evidences of its ability to effectively address the challenging directed link prediction problem. Our work also pinpointed several research directions that, in the future, should lead towards the improvement of our approach.
References
 Autoencoders, unsupervised learning, and deep architectures. Cited by: §1, §2.1.
 Forcedirected graph drawing using social gravity and scaling. In International Symposium on Graph Drawing, Cited by: §3.1.
 Laplacian eigenmaps for dimensionality reduction and data representation. Vol. 15, pp. 1373–1396. Cited by: §2.4.
 Graph convolutional matrix completion. Cited by: §1, §2.4.
 On variants of shortestpath betweenness centrality and their generic computation. Vol. 30, pp. 136–145. Cited by: 2nd item.
 Spectral networks and locally connected networks on graphs. Cited by: §1.
 XXI. experiments to determine the density of the earth. pp. 469–526. Cited by: §3.1.
 FastGCN: fast learning with graph convolutional networks via importance sampling. Cited by: §2.2.
 Convolutional neural networks on graphs with fast localized spectral filtering. Cited by: §2.2.
 Matrix completion with variational graph autoencoders: application in hyperlocal air quality inference. Cited by: §2.2, §2.4.
 Erklarung der perihelionbewegung der merkur aus der allgemeinen relativitatstheorie. Vol. 47, pp. 831–839. Cited by: §3.1.
 Graph drawing by forcedirected placement. Vol. 21, pp. 1129–1164. Cited by: Figure 1.
 Link prediction in large directed graphs. Cited by: §1.
 Dcores: measuring collaboration of directed graphs based on degeneracy. Vol. 35, pp. 311–343. Cited by: §3.6.
 Deep learning. MIT press. Cited by: §2.1.
 Node2vec: scalable feature learning for networks. Cited by: §1, §2.4, 4th item.
 Graphite: iterative generative modeling of graphs. Cited by: §2.2, §2.4, §2.5, §4.1.1.
 A systemic analysis of link prediction in social network. pp. 1–35. Cited by: §1.
 Junction tree variational autoencoder for molecular graph generation. Cited by: §2.4.
 A new status index derived from sociometric analysis. Vol. 18, pp. 39–43. Cited by: 1st item, 4th item.
 Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137–146. Cited by: §1.
 Autoencoding variational bayes. Cited by: §1, §2.3, §2.3, §4.2.2.
 Adam: a method for stochastic optimization. Cited by: §4.2.2.
 Semisupervised classification with graph convolutional networks. Cited by: §2.2, §2.2, §4.1.1.
 Variational graph autoencoders. Cited by: §1, 2nd item, §2.1, §2.2, §2.3, §2.3, §2.4, §3.3.2, §3.4.1, §3.4.2, §3.6, §4.2.2, §4.3.
 Networkbased prediction of protein interactions. Vol. 10, pp. 1240. Cited by: §1.
 On information and sufficiency. Vol. 22, pp. 79–86. Cited by: §2.3.
 Recommendation algorithm based on link prediction and domain knowledge in retail transactions. Vol. 31, pp. 875–881. Cited by: §1.
 The linkprediction problem for social networks. Vol. 58, pp. 1019–1031. Cited by: §1, §1.
 Constrained graph variational autoencoders for molecule design. Cited by: §2.4.
 Constrained generation of semantically valid graphs via regularizing variational autoencoders. Cited by: §2.4.
 The core decomposition of networks: theory, algorithms and applications. Cited by: §2.4.
 Clustering and community detection in directed networks: a survey. Vol. 533, pp. 95–142. Cited by: §1.
 Nonparametric latent feature models for link prediction. In NIPS, Cited by: §1.
 Philosophiae naturalis principia mathematica. Cited by: §3.1.
 Asymmetric transitivity preserving graph embedding. Cited by: §1, 1st item, 2nd item.
 The pagerank citation ranking: bringing order to the web.. Cited by: 2nd item, 3rd item.
 Adversarially regularized graph autoencoder. Cited by: §1, 2nd item, §2.2, §2.4, §2.4, §4.1.1.
 Deepwalk: online learning of social representations. Cited by: §2.4.
 Learning internal representations by error propagation. Cited by: §1, §2.1.
 A degeneracy framework for scalable graph autoencoders. Cited by: §1, 2nd item, §2.2, §2.2, §2.4, §2.4, §2.4, §3.6, §4.1.1, §4.3, §4.4.3.
 NeVAE: a deep generative model for molecular graphs. Cited by: §2.4.
 The graph neural network model. Vol. 20, pp. 61–80. Cited by: §1.
 Link prediction for directed graphs. In Social NetworkBased Recommender Systems, pp. 7–31. Cited by: §1.
 GraphVAE: towards generation of small graphs using variational autoencoders. Cited by: §1, §2.4.
 Line: largescale information network embedding. Cited by: §1, §2.4.
 Learning to make predictions on graphs with autoencoders. Cited by: §1, §2.4.
 Recent advances in autoencoderbased representation learning. Cited by: §1.
 Newton’s gravitational law for link prediction in social networks. In International Conference on Complex Networks and their Applications, Cited by: §3.1.
 Mgae: marginalized graph autoencoder for graph clustering. Cited by: §2.4.
 Structural deep network embedding. Cited by: §1, §2.1.
 Link prediction in social networks: the stateoftheart. Vol. 58, pp. 1–38. Cited by: §1.
 Simplifying graph convolutional networks. Cited by: §2.2.
 A comprehensive survey on graph neural networks. Cited by: §1, §1.
 Link prediction in directed network and its application in microblog. Cited by: §1.
 Network representation learning: a survey. Cited by: §1, §1.
 Dvae: a variational autoencoder for directed acyclic graphs. Cited by: §2.5.
 Genrebased link prediction in bipartite graph for music recommendation. Vol. 91, pp. 959–965. Cited by: §1.
 Scalable graph embedding for asymmetric proximity. In AAAI, Cited by: §1, 2nd item, 3rd item, §4.1.2, §4.2.3.