Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study
Abstract
Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a lowdimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings process. In this paper, we investigate if graph embeddings are approximating something analogous with traditional vertex level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation from five stateoftheart unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated by the embedding space, allowing key insight into how graph embeddings create good representations.
1 Introduction
Representing the complex and inherent links and relationships between and within datasets in the form of a graph is a widely performed practised across scientific disciplines newman2010networks (). One reason for the popularity is that the structure or topology of the resulting graph can reveal important and unique insights into the data it represents. Recently, analysing and making predictions about graph using machine learning has shown significant advances in a range of commonly performed tasks over traditional approaches Goyal2017 (). Such tasks include predicting the formation of new edges within the graph and the classification of vertices Moyano2017 (). However, graph are inherently complex structures and do not naturally lend themselves as input into existing machine learning methods, many of which operate on vectors of real numbers.
Graph embeddings
However, to date, there has been little research performed into why graph embedding approaches have been so successful. They all aim to capture as much topological information as possible during the embedding process, but how this is achieved, or even exactly what structure is being captured, is currently not known. In previous work bonner2017embedding (), we provided a framework which could be used to directly measure the ability of graph embeddings to capture a good representation of a graph’s topology. In this paper, we expand upon this work by attempting to provide insight into the graph embedding process itself. We attempt to explore if the known and mathematically understood range of topological features newman2010networks () are being approximated in the embedding space. To achieve this, we investigate if a mapping from the embedding space to a range of topological features is possible. We hypothesise that if such a mapping can be found, then the topological structure represented by that feature, is also approximated in the embedding space. Such a discovery could begin to provide a theoretical framework for the use of graph embeddings, by experimentally demonstrating which topological structures are used to create the representations. Our methodology employs a combination of supervised and unsupervised models to predict topological features directly from the embeddings. The features we are investigating, first go through a binning process to transform them into classes to enable the classification. We make the following contributions whilst exploring this area:

We propose to investigate if graph embeddings are learning something analogous with traditional vertex level graph features. If this is the case, is there a particular type of feature which is being learned best.

We empirically show, to the best of our knowledge for the first time, that several known topological features are present in graph embeddings. This can be used to help explain the graph embedding process, by detailing which graph features are key in creating high quality representation.

We provide detailed experimental evidence, with five stateoftheart unsupervised graph embeddings approaches, across seven topological features and six empirical graph datasets for our claims.
Reproducibility  we make all the experiments performed in this paper, reproducible by opensourcing our code
In Section 2 we explore prior work, in Section 3 we detail our approach for providing an experimental methodology for assessing know topological features approximated by graph embeddings, Section 4 details the experiment setup, in Section 5 we present our results and in Section 6 we conclude the paper along with suggesting further expansions of this work.
1.1 Notation
We adopt here the commonly used notation for representing a graph or network
2 Previous Work
This section explores the prior research regarding graph embedding techniques and previous approaches measuring known features in embeddings. We firstly introduce the notation of graph embeddings, detail supervised and factorization based approaches, explore in detail stateoftheart unsupervised approaches which we be used throughout the rest of the paper and finally review past attempts to provide a theoretical understanding of there functionality.
2.1 Graph Embeddings
The ability to automatically learn some descriptive numerical based representation for a given graph is a attractive goal, and could provide a timely solution to some common problem within the field of graph mining. Traditional approaches have relied upon extracting features – such as various measures of a vertices’ centrality Page1998 () – capturing the required information about a graph’s topology, which could then be used in some downstream prediction task Li2012 () bonner2016 () Berlingerio2012 () bonner2016gfp (). However, such a feature extraction based approach relies solely upon the handcrafted features being a good representation of the target graph. Often a user must use extensive domain knowledge to select the correct features for a given task, with a change in task often requiring the selection of new features Li2012 ().
Graph embedding models are a collection of machine learning techniques which attempt to learn key features from a graph’s topology automatically, in either a supervised or unsupervised manner, removing the often cumbersome task of end users manually selecting representative graph features Perozzi2014 (). This process, known as feature selection guyon2003introduction () in the machine learning literature, has clear disadvantages as certain features many only be useful for a certain task. It even could negatively affect model performance if utilised in a task for which they are not well suited. Arguably, many of the recent exciting advances seen in the field of Deep Learning have been driven by the removal of this feature selection process Grover2016 (), instead allowing models to learn the best data representations themselves Goodfellow2016 (). For a selection of recent review papers covering the complete family of graph embedding techniques, readers are referred to Hamilton2017 () cai2017 () zhang2017network () cui2017survey (). The work presented in this paper focuses on neural network based approaches for graph embedding (as these have demonstrated superior performance compared with traditional approaches Goyal2017 ()).
The study of Neural Networks (NNs) is a field within machine learning inspired by the human brain Goodfellow2016 (). NNs model problems via the use of connected layers of artificial neurons, where each network has an input layer, at least one hidden layer and an output layer. The activation of each neuron in a layer is given by a prespecified function, with each neuron taking a weighted sum of all the outputs of those neurons to which it is connected. These weights are learned through training examples which are fed through the network, with modifications made to the weights via backpropagation to increase the probability of the NN producing the desired result Goodfellow2016 ().
Supervised Approaches
Within the field of machine learning, approaches which are supervised are perhaps the most studied and understood Goodfellow2016 (). In supervised learning, the datasets contain labels which help guide the model in the learning process. In the field of graph analysis, these labels are often present at the vertex level and contain, for example, the metadata of a user in a social network.
Perhaps the largest area of supervised graph embeddings is that of Graph Convolutional Neural Networks (GCNs) Bruna2013 (), both spectral Defferrard2016 () kipf2017semi () and spatial NiepertMATHIASNIEPERT2016 () approaches. Such approaches pass a sliding window filter over a graph, in a manner analogous with Convolutional Neural Networks from the computer vision field Goodfellow2016 (), but with the neighbourhood of a vertex replacing the sliding window. Current GCN approaches are supervised and thus require labels upon the vertices. This requirement has two significant disadvantages: Firstly, it limits the available graph data which can be used due to the requirement for labelled vertices. Secondly, it means that the resulting embeddings are specialised for one specific task and cannot be generalised for a different problem without costly retraining of the model for the new task.
Factorization Approaches
Before the recent interest in learning graph embeddings via the use of neural networks, a variety of other approaches were explored. Often these approaches took the form of matrix factorization, in a similar vain to classical dimensionality reduction techniques such as Principal Competent Analysis (PCA) Hamilton2017 () wold1987principal (). Such approaches first calculate the pair wise similarity between the vertices of a graph, then find a mapping to a lower dimensional space, such that the relationships observed in the higher dimensions are preserved. An early example of such an approach is that of the Laplican eigenmaps, which attempts to directly factorize the Laplacian matrix of a given graph belkin2002laplacian (). Other approaches, often using the adjacency matrix, define the relationship in low dimension space between two vertices in the graph as being determined by the dot product of their corresponding embeddings. Such approaches include Graph Factorization ahmed2013distributed (), GraGrep cao2015grarep () and HOPE ou2016asymmetric (). Such dimensionality reduction based approaches are often quadratic in complexity zhang2017network () and the predictive performance of the embeddings has largely been superseded by the recent neural network based methods Goyal2017 ().
2.2 Unsupervised Stochastic Embeddings
DeepWalk Perozzi2014 () and Node2Vec Grover2016 () are the two main approaches for random walk based embedding. Both of these approaches borrow key ideas from a technique entitled Word2Vec Mikolov2013 () designed to embed words, taken from a sentence, into vector space. The Word2Vec model is able to learn an embedding for a word by using surrounding words within a sentence as targets for a single hidden layer neural network model to predict. Due to the nature of this technique, words which cooccur together frequently in sentences will have positions which are close within the embedding space. The approach of using a target word to predict neighbouring words is entitled SkipGram and has been shown to be very effective for language modelling tasks Mikolov2013b ().
DeepWalk
The key insight of DeepWalk is to use random walks upon the graph, starting from each vertex, as the direct replacement for the sentences required by Word2Vec. A random walk can be defined as a traversal of the graph rooted at a vertex , where the next step in the walk is chosen uniformly at random from the vertices incident upon Backstrom2010 (), these walks are recorded as (where is the walk starting from of length , and ), i.e. a sequence of the vertices visited along the random walk starting from . DeepWalk is able to learn unsupervised representations of vertices by maximising the average log probability over the set of vertices :
(1) 
where is the size of the training context of vertex .
The basic form of SkipGram used by DeepWalk defines the conditional probability of observing a nearby vertex , given the vertex from the random walk , can be defined via the softmax function over the dotproduct between their features Perozzi2014 ():
(2) 
where and are the hidden layer and output layer weights of the SkipGram neural network respectively.
Node2Vec
Whilst DeepWalk uses a uniform random transition probability to move from a vertex to one of its neighbours, Node2Vec biases the random walks. This biasing introduces two user controllable parameters which dictate how far from, or close to, the source vertex the walk progresses. This is done to capture either the vertex’s role in its local neighbourhood (homophily), or alternatively its role in the global graph structure (structural equivalence) Grover2016 (). Changing the random walk means that Node2Vec has a higher accuracy over DeepWalk for a selection of vertex classification problems Grover2016 ().
2.3 Unsupervised Hyperbolic Embeddings
Recently, a new family of graph embedding approaches has been introduced which embed vertices into hyperbolic, rather than Euclidean space Nickel2017 () Chamberlain2017 (). Hyperbolic space has long been used to analyse graphs which exhibit high levels of hierarchical or community structure munzner1998 (), but it also has properties which could make it an interesting space for embeddings Chamberlain2017 (). Hyperbolic space can be considered “larger” than Euclidean with the same number of dimensions, as the space is curved, its total area grows exponentially with the radius Chamberlain2017 (). For graph embeddings, this key property means that one effectively has a much larger range of possible points into which the vertices can be embedded. This property allows for closely correlated vertices to be embedded close together, whilst also maintaining more distance between disparate vertices, resulting in an embedding which has the potential to capture more of the latent community structure of a graph.
The hyperbolic approach we focus on was introduced by Chamberlain Chamberlain2017 (), and uses the Poincaré Disk model of 2D hyperbolic space epstein1988 (). In their model, the authors use polar coordinates , where and to describe a point in space for each vertex in the Poincaré Disk, which allows for the technique to be significantly simplified Chamberlain2017 (). Similar to DeepWalk, an innerproduct is used to define the similarity between two points within the space. The innerproduct of two vectors in a Poincaré Disk can be defined as follows Chamberlain2017 ():
(3) 
(4) 
where and are the two input vectors representing two vertices and arctanh is the inverse hyperbolic tangent function Chamberlain2017 ().
To create their hyperbolic graph embedding, the authors use the softmax function of Equation 2, used by DeepWalk and others, but importantly replacing the Euclidean innerproducts with the hyperbolic innerproducts of Equation 3. Aside from this, hyperbolic approaches share many similarities with the stochastic approaches in regards to their input data and training procedure. For example, the hyperbolic approaches are still trained upon pairs of vertex IDs, taken from sequences of vertices generated via random walks on graphs.
2.4 Unsupervised AutoEncoder Based Approaches
A different approach for graph embeddings which does not use random walks for input, is entitled Structural Deep Network Embedding (SDNE) Wang2016a (). Instead of a technique based upon capturing the meaning of language, SDNE is designed specifically for creating graph embeddings using Deep Learning Goodfellow2016 () – deep autoencoders Hinton2011 (). Autoencoders are an unsupervised neural network, where the goal of the technique is to accurately reconstruct the input data through explicit encoder and decoder stages Salakhutdinov2009 ().
The authors of SDNE argue that a deep neural network, versus the shallow SkipGram model used by both DeepWalk and Node2Vec, is much more capable of capturing the complex structure of graphs. In addition the authors argue that for a successful embedding, it must capture both the first and second order proximity of vertices. Here the first order proximity measures the similarity of the vertices which are directly incident upon one another, whereas the second order proximity measures the similarly of vertices neighbourhoods. To capture both of these elements SDNE has a dual objective loss function for the model to optimise. The input data to SDNE is the adjacency matrix , where each row represents the neighbourhood of a vertex.
The objective function for SDNE comprises two distinct terms, the first term captures the second order proximity of the vertices neighbourhood, whilst the second captures the first order proximity of the vertices by iterating over the set of edges :
(5) 
where and are the input and reconstructed representation of the input, is the element wise Hadamard product and is a scaling factor to penalise the technique if it predicts zero too frequently, is the weights of the layer in the autoencoder technique and where is a usercontrollable parameter defining the importance of the second term in the final loss score Wang2016a ().
To initialise the weights of the deep autoencoder used for this approach, an additional neural network must be trained to find a good starting region for the parameters. This pretraining neural network is called a Deep Belief Network, and is widely used within the literature to form the initialisation step of deeper models erhan2010 (). However, this pretraining step is not required by either the stochastic or hyperbolic approaches as random initialisation is used for the weights, and adds significant complexity.
2.5 Observing Features Preserved in Embeddings
Graph Embeddings Features
To date, there has been little research performed exploring a theoretical bases as to why graph embeddings are able to demonstrate such power in graph analytic tasks, or if something approximating traditional graph features are being captured during the embeddings process. Recently Goyal and Ferrar Goyal2017 () presented a experimental review paper on a selection of graph embedding techniques. The authors use of range of tasks including vertex classification, link prediction and visualization to measure the quality of the embeddings. However the authors do not provide any theoretical basis as to why the embedding approaches they test are successful, or if know features are present in the embeddings. In addition, the authors do not consider embeddings taken from promising unsupervised techniques – such as the family of hyperbolic approaches, nor; do they explore performance across imbalanced classes during the classification.
Some recent work has speculated on the use of a graph’s topological features as a way to improve the quality of vertex embeddings by incorporating them into a supervised GCN based model Hamilton2017a (). They show how aggregating a vertex feature – even one as simple as its degree – can improve the performance of their model. Further, they present theoretical analysis to validate that their approach is able to learn the number of triangles a vertex is part of, arguing that this demonstrates the model is able to learn topological structure. We take inspiration from this work, but consider unsupervised approaches as well as exploring if richer and more complicated topological features are being captured in the embedding process. In a similar vain, an approach for generating supervised graph embeddings using heatkernel based methods is validated by visualizing if a selection a topological features can be seen in a twodimensional projection of the embedding space li2016deepgraph ().
Research has investigated the use of a graph’s topological features as a way of validating the accuracy of a neural network based graph generative model Liu2017 (). With the presented model, the authors aim to generate entirely new graph datasets which mimic the topological structure of a set of target graphs – a common task within the graph mining community Albert2002 (). To validate the quality of their model, they investigate if a new graph created from their generative procedure has a similar set of topological features to the original graph.
Perhaps most closely related to our present research is work exploring the use of random walk based graph embeddings as an approximation for more complex vertex level centrality measures on social network graphs salehi2017 (). The authors argue that graph embeddings could be used as a replacement for centrality measures as they potentially have a lower computational complexity, thus taking less time to compute. The work explores the use of linear regression to try to directly predict four centrality measures from the vertices of three graph datasets, with limited success salehi2017 (). Our own work differs significantly as we attempt to provide insight into what exactly graph embeddings are learning with a view to explain there success, explore a wider range of embeddings approaches, use datasets from a wider range of domains, explore more topological features, use classification rather than regression as the basis for the analysis and address the inherent unbalanced nature of most graph datasets.
Feature Learning in Other Domains
A large quantity of the successful unsupervised graph embedding approaches have adapted models originally designed for language modelling Grover2016 () Perozzi2014 (). Some recent research investigated how best to evaluate a variety of unsupervised approaches for embedding words into vectors schnabel2015 (). They choose a variety of Natural Language Processing (NLP) tasks, which capture some known and understood aspects of the structure of language, and investigate how well the chosen embedding models perform for these tasks. They conclude that no single word embedding model performs the best across all the tasks they investigated, suggesting there is not a single optimal vector representation for a word. What features are used to help word embeddings achieve compositionality – constructing the meaning of an entire sentence from the component words, has also been explored li2015visualizing (). Further research has investigated the use of word embeddings to create representations for the entire sentence using word features conneau2017 (). The work suggests that word features learned by the embeddings for natural language inference can be transferred to other tasks in NLP.
Outside of NLP, there has been work in the field of Computer Vision (CV) investigating what known features, already commonly used for image representation, are captured by deep convolutional neural network  potentially being used to explain how they work. For example, it has been shown that convolotuional networks, when trained for image classification, often detect the presence of edges in the images zeiler2014 (). The same work also shows how the complexity of the detected edges increases as the depth of the network increases.
In this present work, we take inspiration from these approaches and attempt to provide insight and a potential theoretical basis for the use of graph embeddings by exploring which known graph features can be reconstructed from the embedding space.
3 Semantic Content of Graph Embeddings
Despite extensive prior work in unsupervised graph embedding, performing well for the tasks they were proposed for (such as vertex classification and link prediction Goyal2017 ()), there has been little work in exploring why these approaches are successful. Inspired by recent work in Computer Vision and Natural Language Processing which examine if traditional features (The edges detected in images for example) are captured by deep models, we explore, in this paper, the following research question:
Problem Statement  Are graph embedding techniques capturing something similar to traditional topological features as part of the embedding process?
Topological features are a known and mathematically understood way to accurately identify graphs and vertices Li2012 () bonner2016 (). We hypothesis that if graph embeddings are shown to be learning approximations of existing features, than this could be used to begin to provide a theoretical basis for the functionality of graph embeddings. We hypothesise that if topological structures within a graph, similar to the known and mathematically understood examples from the literature, are being captured in the embedding space, then this could be used to begin to provide a theoretical basis for the functionality of graph embeddings. This would suggest that graph embedding are automatically learning detailed and known graph structures in order to create the representations. This could explain how they have been so successful in a variety of graph mining tasks. Effectively the graph embeddings techniques would be acting as an automated way of selecting the most representative topological feature(s) for a given objective function.
If graph embeddings are shown to be learning topological features, then other interesting research questions arise. For example, do competing embedding approaches learn different topological structures, do different graph datasets each require different features to be approximated in order to create a good representation, what is the structural complexity of the features approximated by the embeddings or even are the embeddings capable of approximating multiple features simultaneously.
In order to explore these questions, we attempt to predict a selection of topological features directly from graph embeddings computed from a range of stateoftheart approaches across a series of empirical datasets. We suggest that if a second mapping function can be found which accurately maps the embedding space to a given topological feature , then this is strong evidence that something approximating the structural information represented by is indeed present in the embedding space. Here the mapping function could take the form of a linear regression, but for this work we investigate a range of classification algorithms – this is explored more in Section 3.3. We assess a range of known topological features, from simple to complex, to gain a better understanding of the expressive capabilities of the embedding techniques.
3.1 Predicting Topological Features
Numerous topological features have been identified in the literature, measuring various aspects of a graph’s topology, at the vertex, edge and graph level Li2012 (). As we are focusing our work here upon methods for creating vertex embedding, we will focus on features which are measured at the vertex level of a given graph. We have selected a range of vertex level features from the graph mining literature, which capture information about a vertex’s local and global role within a graph Grover2016 (). This selection of features, range from ones which are simple to compute from vertices directly adjacent to each other, to more complex features which can require information from many hops
These features are defined in terms of a graph with it’s corresponding adjacency matrix , where is the total number of vertices in the graph, the total number of edges. For each vertex , we also define to be the total number of neighbours for , to be the number of connections has to other vertices, to be the subset of vertices in with edges to and is the total number of shortest paths from vertices and which also pass through .
Total Degree : The total number of edges from to other vertices.
Degree Centrality : The degree for the vertex over the total number of vertices in the graph, providing a normalised centrality score bonner2016 ().
Number Of Triangles : The number of triangles containing the vertex , where is the number of vertices in which are also connected via an edge bonner2016 ().
Local Clustering Score : Represents the probability of two neighbours of also being neighbours of each other Watts1998 ().
Eigenvector Centrality : Used to calculate the importance of each vertex within a graph, where is the largest eigenvalue and is the eigenvector centrality bonacich2007 ().
PageRank Centrality : PageRank centrality is commonly used to measure the local influence of a vertex within a graph Page1998 () Han2014 (). Where is a constant damping factor (0.85 for this work).
Betweenness Centrality : The Betweenness centrality of a vertex depends upon the frequency which it acts as a bridge between two additional vertices Han2014 (), where is the total number of shortest paths from to .
3.2 PowerLaw Feature Distribution
Many empirical graphs, especially those representing social, hyperlink and citation networks, have been shown to have an approximately powerlaw distribution of degree values faloutsos1999 (). This powerlaw distribution poses a challenge for machine learning models, as it means the features we are trying to predict are extremely unbalanced, with a heavy skew towards the lower range of features. Imbalanced class distribution creates difficulties for machine learning models, as there are fewer examples of the minority classes for the model to learn, which can often lead to poor predictive performance on these classes Goodfellow2016 (). It has been shown that the distribution of other topological features can also follow a powerlaw distribution in many graphs Albert2002 (). To demonstrate this phenomenon, Figure 1 shows the distribution of a range of topological feature values for the citHepTh dataset. The Figure shows that indeed, all the topological feature values tested largely follow an approximately powerlaw distribution. This fact has the potential to make predicting the value of a certain topological feature challenging, as the datasets will not be balanced and any model attempted to find the mapping , will be prone to over fitting to the majority classes. Our approach for tackling this issue is outlined in the following section.
3.3 Methodology
Unlike previous studies salehi2017 (), we use classification, rather than regression, as a way to explore the embedding space. Predicting topological features directly via the use of regression has proven challenging in prior work salehi2017 (), owing largely to the imbalance problem explored in the previous section. With such an imbalanced dataset, using a classification based approach is often advantageous oord2016 () as techniques exist to over sample minority examples. However, the features we are attempting to predict are continuous, so must go through some transformation stage before classification can be performed. For our transformation stage, we follow a procedure similar to oord2016 (). We bin the realvalued features into a series of classes via the use of a histogram, where the bin a particular features is placed becomes it class label. One can consider each of these newly created classes as representing a range of possible values for a given feature. As an example, we could transform a vertex’s continuous PageRank score Page1998 () into a series of discrete classes via the use of a histogram with a bin size of three, where each of the newly created classes represented a low, medium or high PageRank score.
In order to allow for a good distribution of feature values, for our experiments we utilise a bin size of six for the histogram function, meaning that six discrete classes were created for each of the features we are exploring. This value was chosen empirically from our datasets as it fully covered the numerical range of the topological features we were measuring. Although this binning process helps with the feature imbalance, it still produces a skew in number of features assigned to each class. To further address this issue, we take the logarithm of each feature value before it is passed to the binning function. Essentially, this will mean that features within the same order of magnitude will be assigned the same label, for example vertices with degrees in the range of 0 to would be assigned into one class, whilst degree values between to would be assigned to another class. This was performed as it dramatically improved the balance of the datasets, and as we are only attempting to discover if something approximating the topological features is present in the embedding space, we found that predicting the order of magnitude to be sufficient.
3.4 Embedding Approaches Compared
In this paper, we evaluate five stateoftheart unsupervised graph embedding approaches as a way of exploring what semantic content is extracted from graph to create it’s embedding. The approaches are as follows: DeepWalk, Poincaré Disk, Structural Deep Network Embedding and Node2Vec
Approach  Year  Type  Published  Complexity 

DeepWalk  2014  stochastic  KDD Perozzi2014 ()  
Node2Vec  2016  stochastic  KDD Grover2016 ()  
SDNE  2016  autoencoder  KDD Wang2016a ()  
Poincaré Disk  2017  hyperbolic  MLG Chamberlain2017 () 
4 Experimental Setup and Classification Algorithm Selection
In the following section we detail the setup of the experiments and evaluate potential classification algorithms.
4.1 Metrics
Presented Results
All the reported results are the mean of five replicated experiment runs along with confidence intervals. For the runtime analysis, the presented results are the mean runtime for job completion, presented in minutes. For the classification results, all the accuracy scores presented are the mean accuracy after fold cross validation – considered the gold standard for model testing Arlot2010 (). For fold cross validation, the original dataset is partitioned into equally sized partitions. partitions are used to train the model, with the remaining partition being used for testing. The process is repeated times using a unique partition for each repetition and a mean taken to produce the final result.
Precision Metrics
For reporting the results of the vertex feature classification tasks, we report the macrof1 and microf1 scores with varying percentages of labelled data available at training time. This is a similar setup to previous works Grover2016 () Goyal2017 ().
The microf1 score calculates the f1score for the dataset globally by counting the total number of true positives (TP), false positives (FP) and false negatives (FN) across a labelled dataset . Using the notation from Goyal2017 (), microf1 is defined as:
(6) 
where:
and denotes the number of true positives the model predicts for a given label , denotes the number of false positives and the number of false negatives.
The macrof1 score, when performing multilabel classification, is defined as the average microf1 score over the whole set of labels :
(7) 
where is the score for the given label .
4.2 Experimental Setup
Approach  Optimiser  Learning Rate  Specific Parameters 

SNDE  RMSProp  0.01  =500, =10, epochs=500 
Node2VecS  SGD  0.1  p=0.5, q=2, epochs=15 
Node2VecH  SGD  0.1  p=1.0, q=0.5, epochs=15 
DeepWalk  SGD  0.1  epochs=15 
Poincaré Disk (PD)  SGD  0.1  p=0.5, q=2, epochs=15 
Implementation Details
The approaches used for experimentation were reimplemented in Tensorflow abadi2016tensorflow (), as the authorprovided versions were not all available using the same framework. We also attempted to ensure the same Tensorflowbased optimisations were used across all the approaches shi2016 (). Neural Networks contain many hyperparameters a user can control to improve the performance, both of the predictive accuracy and the runtime, of a given dataset. This process can be extremely time consuming and often requires users to perform a grid search over a range of possible hyperparameter values to find a combination which performs best Goodfellow2016 (). For setting the required hyperparameters for the approaches, we took the default values provided by the authors in their respective papers Grover2016 () Chamberlain2017 () Wang2016a () keeping them constant across all datasets. The key hyperparameters used for each approach are detailed in Table 2. We have open sourced our implementations of these approaches and made them available online
Experimental Environment
Dataset  Domain  Source  

flydrosophilamedulla  1,800  33,500  Biological  rossi2015 () 
citHepTh  27,770  352,807  Citation  snapnets () 
emailEucore  1,005  25,571  Communication  snapnets () 
infopenflights  2,900  30,500  Infrastructure  rossi2015 () 
socsignbitcoinotc  5,881  35,592  Blockchain  snapnets () 
egoFacebook  4,039  88,234  Social  snapnets () 
Experimentation was performed on a compute system with 2 NVIDIA Tesla K40c’s, 2.3GHz Intel Xeon E52650 v3, 64GB RAM and the following software stack: Ubuntu Server 16.04 LTS, CUDA 9.0, CuDNN v7, TensorFlow 1.5, scikitlearn 0.19.0, Python 3.5 and NetworkX 2.0.
Experimental Datasets
The empirical datasets used for evaluation were taken from the Stanford Network Analysis Project (SNAP) data repository snapnets () and the Network Repository rossi2015 () and are detailed in Table 3. The domain label provided is taken from the listings of the graphs domain provided by SNAP snapnets () and Network Repository.
4.3 Classification Algorithm Selection
Feature  Classifier  F1Micro  F1Macro  Uniform  Strat  Freq 

LR  
SVM(Lin)  
SVM(RBF)  
NN  
NN2  
LR  
SVM(Lin)  
SVM(RBF)  
NN  
NN2  
LR  
SVM(Lin)  
SVM(RBF)  
NN  
NN2 
Feature  Classifier  F1Micro  F1Macro  Uniform  Strat  Freq 

LR  
SVM(Lin)  +59.08%  1.61%  
SVM(RBF)  +23.13%  
NN  
NN2  
LR  
SVM(Lin)  
SVM(RBF)  +25.57%  0.84%  
NN  
NN2  
LR  
SVM(Lin)  
SVM(RBF)  
NN  
NN2 
As highlighted throughout the paper, we are focusing our research on unsupervised graph embedding approaches. In order to be able to use the embeddings for further analysis, they must be classified using a supervised classification model. Traditionally in the embedding literature, a simple Logistic Regression is used in any classification task Perozzi2014 () Mikolov2013 (), with seemingly little work exploring the use of more sophisticated models to perform the classification.
In this section we explore the effectiveness of five different models at performing the classification of the different embedding approaches  Logistic Regression (LR), Support Vector Machine (SVM) (Linear Kernel), SVM (RBF Kernel), a single hidden layer Neural Network and finally a second more complex Neural Network with two hidden layers and a larger number of hidden units. All the classifiers utilised in this section were taken from the ScikitLearn Python package scikitlearn (). Additionally, given that our datasets do not have a equal distribution among the classes, we also explore the effectiveness of weighting the loss function used by the model inversely proportional to the frequency of the class karakoulas1999optimizing (). This use of a weighted loss function, although common in other areas of machine learning, has not been explored in regards to graph embeddings.
For the results in this section, we present the mean Macro and Micro F1 scores, introduced in Section 4.1.2, after 5fold cross validation. To assess the performance of the classifiers against the imbalance present in the datasets, we also display the percentage lift in mean test set accuracy over three rulebased prediction methods to act as baselines. These methods are Uniform Prediction (Where the classification of each item in the test is chosen uniformly at random from the possible classes), Stratified Prediction (where the classification follows the distribution of classes in the training set) and Frequent Class Prediction (Where the classification is determined by the most frequency class in the training set). A positive lift across all metrics strongly suggests that a mapping from the embedding space to the topological features is being learned, as the classification algorithm is over coming the biased distributions of classes in the dataset.
We performed this experiment for all combination of datasets, embedding approaches and features, but due to the large quantity of results, we present only a subset here. Specifically we present the results for egoFacebook dataset, using embeddings generated by DeepWalk and SDNE and classifying Degree, Triangle Count and Eigenvector Centrality. It should be noted that the patterns displayed here are representative of ones seen across all datasets.
Table 4 highlights the performance of the potential classifiers, when using the DeepWalk embeddings taken from the egoFacebook dataset. Results show that the choice of supervised classifier can have a large impact on the overall classification score. It can also be seen that the traditional choice of logistic regression does not produce the best results. Indeed the neural network and SVM classifier often gave the best scores but no classifier is best overall, suggesting that one needs to be chosen carefully for a given task.
Table 5 highlights the results for the potential classifiers, when using the SDNE embeddings taken from the egoFacebook dataset. Again, the variation in classification score across the set of tested classification metrics is quite substantial, with the linear SVM and neural network approaches having perhaps a small margin of improvement over the others. It is interesting to note that the logistic regression frequently used in the literature, never has the highest score in any metric. It can also be seen that, when compared with the DeepWalk results in Table 4, SDNE does worse at predicting all topological features which, although not the explicit purpose of this section, is interesting to note.
Using the results from this section, particularly the generally higher f1macro scores mean a better results across all classes, all the classification results in Section 5 are presented using a single hidden layer neural network.
5 Results
This section presents both the supervised and unsupervised results for predicting topological features from graph embeddings.
5.1 Topological Feature Prediction
In this section, we present the experimental evaluation of the classification of topological features using the embeddings generated from the five approaches (DeepWalk, Node2VecH, Node2VecS, SDNE and PD) on the datasets detailed in Table 3. We present both the macrof1 and microf1 scores plotted against a varying amount of labelled data available during the training process. Where a higher score equates to a better classification result – with a score of one meaning a perfect classification of every example in the data.
Figure 2 displays the classification f1 scores for predicting the simplest feature we are measuring: the degree of the vertices. Interestingly we see a large spread of results across the datasets and between approaches, with no clear pattern emerging in this Figure. On certain datasets, it is possible to see a high microf1 score, for example in the bitcoinotc dataset, suggesting that an approximation of the degree value is present in the embedding. The figure also shows that SDNE and PD often have a lower score when compared with the stochastic approaches.
Figure 3 highlights the macrof1 and microf1 scores for the classification of the Degree Centrality value. As the Degree Centrality of a given vertex is strongly influenced by it’s degree, it is perhaps unsurprising to observe largely similar patterns to those in Figure 2, which again shows the dataset bitcoinotc to be the dataset with the highest accuracies. As was seen in the previous figure, generally the three stochastic approaches have a similar score for both macrof1 and microf1.
The results for the classification of Triangle Counts for the vertices are presented in Figure 4. This is a more complex feature than the previous two, as it requires more information than is available from just the immediate neighbours of a given vertex. The Figure shows again that, to some degree of accuracy, the feature is able to be reconstructed from the embedding space, with bitcoinotc having the highest microf1 accuracy of all the datasets. SDNE and PD continue to have, on average, the lowest accuracies.
Classifying a vertex’s local clustering score across the datasets is explored in Figure 5. The figures shows that this features, although more complicated to compute than a vertices triangle count, appears to be easier for a classifier to reconstruct from the embedding space. With this more complicated feature, some interesting results regrading SDNE can be seen in the EmailEU and HepTh datasets, where the approach has the highest macrof1 score – perhaps indicating that the more complex model is better able to learn a good representation for this more complicated feature.
Figure 6 displays the result for the classification of a vertices Eigenvector centrality. This figure is perhaps the most interesting one so far as it shows high classification accuracies across many of the empirical datasets, even though this feature is of greater complexity than previous ones. This figure further supports the results presented in Table 4, which showed Eigenvector centrality having not only the highest accuracies, but also the highest lifts in accuracy over the rulebased predictors. Interestingly SDNE does not demonstrate higher macrof1 scores in this experiment.
In Figure 7, the approaches ability to correctly classify the PageRank score of the vertices is considered. Here we see generally lower classification accuracies than the last figure, perhaps owing to the more complicated nature of the PageRank algorithm. Although high classification accuracies can still be seen, particularly on the on the Bitcoinotc and Drosophila datasets.
Finally, Figure 8 highlights the ability of the graph embeddings to predict betweenness centrality. Here, the figure shows that this feature is on average, harder to predict from the embeddings than the previous two centrality measures as evidenced by the lower accuracies scores. Again SDNE shows the highest macrof1 scores on the Drosophila and HepTh datasets, indicating it’s embedding capture something akin to this structural information better than the other approaches.
5.2 Confusion Matrices
One consideration that must be made is that the binning process, used to transform the features into targets for classification, removes the inherent ordering present in continuous values. As an example, a vertex with a degree of 8 would still be classified incorrectly if the prediction was 10 or 100, but clearly one is more incorrect than the other. To address this, we present a selection of error matrices, to explore how ‘wrong’ an incorrect prediction is. This is made possible as the labels used for classification have consecutive ordering, as a result of a histogram binning function, meaning that a prediction of 2 for a true label of 1, is more correct than a prediction of 5.
For brevity, Figure 9 displays the error matrices for a selection of the tested embedding approaches when classifying Eigenvector Centrality in the egoFacebook dataset, although similar patterns were found across all datasets. With error matrices, the diagonal values represent correctly classified label, thus a good prediction will produce an error matrix with a higher concentration of diagonal values. Figure 9 shows that, for the stochastic walk approaches DeepWalk and Node2Vec, the error matrices have a higher clustering of values around the diagonals. Interestingly, when the classification is incorrect for these approaches, the incorrect prediction tends to be close to the true label. This phenomenon can clearly be seen in these approaches for labels 1 and 2, meaning that embeddings for vertices with these particularly Eigenvector Centrality are similar. The Figure also shows that, for this particular vertex feature, the embeddings produced via SDNE seemingly do not contain the same topological information. This is highlighted by the lack of structure on the diagonals of it’s error matrix.
5.3 Unsupervised LowDimensional Projections
Another way to explore assessing the semantic content of the graph embeddings, we utilised an embedding visualisation technique entitled tSNE maaten2008visualizing (). This techniques allows relatively high dimensional data, such as the graph embeddings we are dealing with, to be projected into a low dimensional space in such a way as to preserve the interspatial between points that were present in the original space. Thus, we utilise tSNE to project the embeddings down into two dimensions so they can be easily visualised. This process is performed without the need for any classification to be performed upon the embeddings, removing the problems of classifying unbalanced datasets.
Figure 10 displays a selection of tSNE plots taken from the egoFacebook data, where the points are coloured according to the Eigenvector centrality value after being passed through the binning process. The figure shows that the SDNE embeddings seemingly have no clear structure in the low dimensional space which correlates strongly with the Eigenvector centrality, as points in the same class are not clustered together. However, with the other embedding approaches, it is possible to see a clear clustering of points belonging to the same class. For example, in both the Node2Vec approaches, there is very clear clustering of classes 1, 4 and 5. This result provides further evidence for our observation that, even when exploring the embeddings using an unsupervised method, it is possible to find correlations between known topological features and the embedding space.
5.4 Discussion
This section has provided extensive experimentation evaluation to explore the questions raised in Section 3. Specifically, we investigated if a broad range of topological features can be predicted from the embedding created from a range of unsupervised graph embedding techniques. Across all the features and datasets tested, it can be seen that many topological features can be approximated by the different embedding approaches, with varying degrees of accuracy. The results which show the increase in accuracy over the rule based predictions (Section 4.3) give strong indication that the approaches are able to overcome the inherent unbalanced nature of graph datasets and a mapping from the embedding space to features is happening. It is also interesting to observe that numerous features can be approximated from the graph embeddings, suggesting that several structural properties are being captured to create the best representation for a vertex automatically. Of all the topological features measured in the experimentation section, the one which consistently gave the best results was Eigenvector centrality. Particularly for the stochastic approaches, Eigenvector centrality was predicted with a high degree of accuracy, suggesting that the topological structure represented by this feature is captured extremely well in the embedding space and indicates this is a useful feature for the minimising the objective functions of the approaches. This is further reinforced by the unsupervised projections (Figure 10), which shows clear and distinct clustering between classes, even without the use of a classification algorithm.
Another interesting observation from this study is that no one approach strongly out performs the other when classifying a particular feature – seemingly all the approaches are approximating similar topological structures. The figures show that the stochastic approaches (DeepWalk and Node2Vec) are the most consistent across all features and datasets, often having the highest macrof1 and microf1 scores. SDNE demonstrates a more inconsistent performance profile for feature classification, this is in contrast to other studies which have found it to have the best performance in vertex labelling problems Goyal2017 (). The performance of SDNE demonstrated in this work could be explained by it being the only deep model tested, meaning that it contains many more parameters. This increase in complexity means that SDNE could be very sensitive to the correct selection of hyperparameters or possibly that more complex topological features are being approximated by the embeddings, or even that entirely novel features are being learned. Finally, it is interesting to note the performance of Hyperbolic approach PD, as it has far fewer latent dimensions in which to capture topological information due to its limitation in modelling the space as a 2D disk. Empirically, PD shows largely similar performance to the other approaches on most datasets, providing strong evidence that the hyperbolic space is an appropriate space in which to represent graphs.
6 Conclusion
Graph embeddings are increasingly becoming a key tool to solve numerous tasks within the field of graph mining. They have demonstrated stateoftheart results by reporting to automatically learn a low dimensional, but highly expressive, representation of vertices, which captures the topological structure of the graph. However to date, there has been little work providing a theoretical grounding detailing why they have been so successful. In this paper, we explore making a step in this direction by investigating which traditional topological graph features can be reconstructed from the embedding space. The hypothesis being that if a mapping from the embedding space to a particular topological feature can be found, then the topological structure encapsulated by this feature is also captured by the embedding. We present an extensive set of experiments exploring this issue across five unsupervised graph embedding techniques (detailed in Section 3.4), classifying seven graph features (detailed in Section 3.1), across a range of empirical datasets (detailed in Table 3). We find that a mapping from many topological features to the embedding space of the tested approaches is indeed possible, using both supervised and unsupervised techniques. This discovery suggests that graph embeddings are indeed learning approximations of known topological features, with our experiments showing that Eigenvector centrality is best reconstructed by many of the approaches. This could allow key insight into how graph embedding learn to create high quality representations.
For future research, we plan to see if other Eigenvector based topological features, know to be representative of a graph’s topology Li2012 (), are also captured as well by the embedding approaches. We plan to perform more experimentation with synthetically created graphs with artificially balanced degree distributions. This will remove the unbalanced nature of empirical datasets, and allow us to explore the structure of the embeddings in more detail. Further more, we plan to investigate if directly predicting topological features during the embedding training process, perhaps in the form of a regularisation term, can produce embeddings which generalise better across other tasks.
Acknowledgments
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research. Additionally we thank the Engineering and Physical Sciences Research Council UK (EPSRC) for funding. We also thank the authors of papers Grover2016 (), Chamberlain2017 () and Wang2016a () for making implementations of their code publicly available. For invaluable feedback and comments during this research, we also thank Nik Khadijah Nik Aznan, Philip Jackson and Amir AtapourAbarghouei.
Footnotes
 In this work, we focus on vertex representation learning approaches.
 https://github.com/sbonner0/unsupervisedgraphembedding/
 To avoid confusion with neural networks we will use the term graph throughout the remainder of the paper without loss of generality.
 Note if then we skip these from the sum we are past the start of the current work.
 Hops represent the length of the sequences of vertices that must traversed to get from vertices to .
 Please note, we explore two variations of Node2Vec, bringing the total number of approaches to five
 https://github.com/sbonner0/unsupervisedgraphembedding/
References
 Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: a system for largescale machine learning. USENIX Symposium on Operating Systems Design and Implementation, 16:265–283, 2016.
 Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J Smola. Distributed largescale natural graph factorization. International conference on World Wide Web, pages 37–48, 2013.
 Réka Albert and Albert Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 2002.
 Sylvain Arlot and Alain Celisse. A survey of crossvalidation procedures for model selection *. Statistics Surveys, 2010.
 Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. Web Search and Data Mining (WSDM), 2011.
 Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems, pages 585–591, 2002.
 Michele Berlingerio, Danai Koutra, Tina EliassiRad, and Christos Faloutsos. NetSimile: A scalable approach to sizeindependent network similarity. arXiv preprint arXiv:1209.2684, 2012.
 Phillip Bonacich. Some unique properties of eigenvector centrality. Social networks, 29(4):555–564, 2007.
 Stephen Bonner, John Brennan, Georgios Theodoropoulos, Ibad Kureshi, and Andrew Stephen McGough. Deep topology classification: A new approach for massive graph classification. IEEE International Conference on Big Data, 2016.
 Stephen Bonner, John Brennan, Georgios Theodoropoulos, Ibad Kureshi, and Andrew Stephen McGough. Gfpx: A parallel approach to massive graph comparison using spark. IEEE International Conference on Big Data, pages 3298–3307, 2016.
 Stephen Bonner, John Brennan, Georgios Theodoropoulos, Ibad Kureshi, Andrew Stephen McGough, and Boguslaw Obara. Evaluating the quality of graph embeddings via topological feature reconstruction. IEEE International Conference on Big Data, 2017.
 Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. International Conference on Learning Representations (ICLR), 2013.
 Hongyun Cai, Vincent W Zheng, and Kevin ChenChuan Chang. A comprehensive survey of graph embedding: problems, techniques and applications. arXiv preprint arXiv:1709.07604, 2017.
 Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. ACM International on Conference on Information and Knowledge Management, pages 891–900, 2015.
 Ben Chamberlain, Marc Deisenroth Clough, and James. Neural embeddings of graphs in hyperbolic space. KDD Workshop on Mining and Learning with Graphs (MLG), 2017.
 Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.
 Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on network embedding. arXiv preprint arXiv:1711.08752, 2017.
 Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems (NIPS), 2016.
 David BA Epstein, Robert C Penner, et al. Euclidean decompositions of noncompact hyperbolic manifolds. Journal of Differential Geometry, 1988.
 Dumitru Erhan, Yoshua Bengio, Aaron Courville, PierreAntoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pretraining help deep learning? Journal of Machine Learning Research, 2010.
 Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On powerlaw relationships of the internet topology. ACM SIGCOMM Computer Communication Review, 1999.
 Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
 Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: a survey. arXiv preprint arXiv:1705.02801, 2017.
 Aditya Grover and Jure Leskovec. node2vec : scalable feature learning for networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
 Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
 William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216, 2017.
 William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584, 2017.
 Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Ozsu, Xingfang Wang, and Tianqi Jin. An experimental comparison of pregellike graph processing systems. Very Large Databases Endowment, 7(12):1047–1058, 2014.
 Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. Transforming autoencoders. International Conference on Artificial Neural Networks, 2011.
 Grigoris I Karakoulas and John ShaweTaylor. Optimizing classifers for imbalanced training sets. Advances in neural information processing systems, pages 253–259, 1999.
 Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR), 2017.
 Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.
 Cheng Li, Xiaoxiao Guo, and Qiaozhu Mei. Deepgraph: graph structure predicts network growth. arXiv preprint arXiv:1610.06251, 2016.
 Geng Li, Murat Semerci, Bülent Yener, and Mohammed J Zaki. Effective graph classification based on topological and label attributes. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(4):265–283, 2012.
 Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066, 2015.
 Weiyi Liu, Hal Cooper, Min Hwan Oh, Sailung Yeung, Pinyu Chen, Toyotaro Suzumura, and Lingli Chen. Learning graph topological features via GAN. arXiv preprint arXiv:1709.03545, 2017.
 Laurens van der Maaten and Geoffrey Hinton. Visualizing data using tsne. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. Conference on Neural Information Processing Systems (NIPS), 2013.
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. International Conference on Learning Representations (ICLR), 2013.
 Luis G. Moyano. Learning network representations. The European Physical Journal Special Topics, 2017.
 Tamara Munzner. Exploring large graphs in 3d hyperbolic space. IEEE Computer Graphics and Applications, 1998.
 Mark Newman. Networks: an introduction. Oxford university press, 2010.
 Maximilian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. arXiv preprint arXiv:1705.08039, 2017.
 Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. International Conference on Machine Learning, 2016.
 Boguslaw Obara, Vicente Grau, and Mark D Fricker. A bioimage informatics approach to automatically extract complex fungal networks. Bioinformatics, 28(18):2374–2381, 2012.
 Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
 Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1105–1114, 2016.
 L Page, S Brin, R Motwani, and T Winograd. The PageRank citation ranking:bringing order to the web., 1998.
 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: machine learning in Python. Journal of Machine Learning Research, 2011.
 Bryan Perozzi, Rami AlRfou, and Steven Skiena. DeepWalk: online learning of social representations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014.
 Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. AAAI Conference on Artificial Intelligence, 2015.
 Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. International Journal of Approximate Reasoning, 2009.
 Fatemeh Salehi Rizi, Michael Granitzer, and Konstantin Ziegler. Properties of vector embeddings in social networks. Algorithms, 10(4):109, 2017.
 Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. Evaluation methods for unsupervised word embeddings. Conference on Empirical Methods in Natural Language Processing, pages 298–307, 2015.
 Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. Benchmarking stateoftheart deep learning software tools. arXiv preprint arXiv:1608.07249, 2016.
 Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
 D J Watts and S H Strogatz. Collective dynamics of ’smallworld’ networks. Nature, 1998.
 Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(13):37–52, 1987.
 Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. European conference on computer vision, pages 818–833, 2014.
 Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: a survey. arXiv preprint arXiv:1801.05852, 2017.