MONET: Debiasing Graph Embeddings via the MetadataOrthogonal Training Unit
Abstract
In many real world graphs, the formation of edges can be influenced by certain sensitive features of the nodes (e.g. their gender, community, or reputation). In this paper we argue that when such influences exist, any downstream Graph Neural Network (GNN) will be implicitly biased by these structural correlations. To allow control over this phenomenon, we introduce the MetadataOrthogonal Node Embedding Training (MONET) unit, a general neural network architecture component for performing trainingtime linear debiasing of graph embeddings. MONET operates by ensuring that the node embeddings are trained on a hyperplane orthogonal to that of the node features (metadata). Unlike debiasing approaches in similar domains, our method offers exact guarantees about the correlation between the resulting embeddings and any sensitive metadata. We illustrate the effectiveness of MONET though our experiments on a variety of real world graphs against challenging baselines (e.g. adversarial debiasing), showing superior performance in tasks such as preventing the leakage of political party affiliation in a blog network, and preventing the gaming of embeddingbased recommendation systems.
1 Introduction
Graph embeddings – continuous, lowdimensional vector representations of nodes – have been eminently useful in network visualization, node classification, link prediction, and many other graph learning tasks [9]. While graph embeddings can be estimated directly by unsupervised algorithms using the graph’s structure [e.g. 22, 26, 13, 23], there is often additional (nonrelational) information available for each node in the graph. This information, frequently referred to as node attributes or node metadata, can contain information that is useful for prediction tasks including demographic, geospatial, and/or textual features.
The interplay between a node’s metadata and edges is a rich and active area of research. Interestingly, in a number of cases, this metadata can be measurably related to a graph’s structure [19], and in some instances there may be a causal relationship (the node’s attributes influence the formation of edges). As such, metadata can enhance graph learning models [28, 18], and conversely, graphs can be used as regularizers in supervised and semisupervised models of node features [29, 10]. Furthermore, metadata are commonly used as evaluation data for graph embeddings [7]. For example, node embeddings trained on a Flickr user graph were shown to predict userspecified Flickr “interests” [22]. This is presumably because users (as nodes) in the Flickr graph tend to follow users with similar interests, which illustrates a potential causal connection between node topology and node metadata.
However, despite the usefulness and prevalence of metadata in graph learning, there are instances where it is desirable to design a system to avoid the effects of a particular kind of sensitive data. For instance, the designers of a recommendation system may want to make recommendations independent of a user’s demographic information or location.
At first glance, this may seem like an artificial dilemma – surely one could just avoid the problem by not adding such sensitive attributes to the model. However, such an approach (ignoring a sensitive attribute) does not control for any existing correlations that may exist between the sensitive metadata and the edges of a node. In other words, if the edges of the graph are correlated with sensitive metadata, then any algorithm which does not explicitly model and remove this correlation will be biased as a result of it.
To this end, we propose two simple desiderata for handling sensitive metadata in GNNs:

The influence of metadata on the graph topology is modeled in a partitioned subset of embedding space, providing interpretability to the overall graph representation, and the option to remove metadata dimensions.

Nonmetadata or “topology” embeddings are debiased from metadata embeddings with a provable guarantee on the level of remaining bias.
Existing embedding methods for graphs with metadata [e.g. 32, 28, 18, 31, 16, 14] do not consider D2, and most do not not fully address D1 either. In the graph CNN literature [e.g. 15, 10, 2], metadata are treated as features in predictive models, rather than attributes to be controlled. The work most relevant to the above desiderata is [5], which introduces adversarial debiasing of metadata to graphbased recommender systems. However, that approach does not carry strong theoretical guarantees about the amount of bias removed from the graph representation, and therefore does not satisfy D2.
In this work we propose a novel technique for Graph Neural Networks (GNNs) that satisfies both of these desiderata. Our method, the MetadataOrthogonal Node Embedding Training (MONET) unit, operates by ensuring that metadata embeddings are trained on a hyperplane orthgonal to that of the topology embeddings (as conceptually illustrated in Figure 1). Specifically, our contributions are the following:

The MetadataOrthogonal Node Embedding Training (MONET) unit, a novel GNN algorithm which jointly embeds graph topology and graph metadata while enforcing linear decorrelation between the two embedding spaces.

Analysis which proves that addressing desiderata D1 alone – partitioning a metadata embedding space – still produces a biased topology embedding space, and that the MONET unit corrects this.

Experimental results on real world graphs which show that MONET can successfully debias topology embeddings while relegating metadata information to separate dimensions – achieving superior results to stateoftheart adversarial and NLPbased debiasing methods.
2 Preliminaries
Early graph embedding methods involved dimensionality reduction techniques like multidimensional scaling and singular value decomposition [7]. In this paper we use graph neural networks trained on random walks, similar to DeepWalk [22]. DeepWalk and many subsequent methods first generate a sequence of random walks from the graph, to create a “corpus” of node “sentences” which are then modeled via word embedding techniques (e.g. word2vec [17] or GloVe [21]) to learn low dimensional representations that preserve the observed cooccurrence similarity.
Let be a dimensional graph embedding matrix, , which aims to preserve the lowdimensional structure of a graph (). Rows of correspond to nodes, and node pairs with large dotproducts should be structurally or topologically close in the graph. As a concrete example, in this paper we consider the debiasing of a recently proposed graph embedding using the GloVe model [6]. Its training objective is:
(1) 
Above, are the “center” and “context” embeddings, are the biases, is the walkdistanceweighted context cooccurrences, and is the loss smoothing function [21]. We use the GloVe model in the next section, and throughout the paper, to illustrate the incorporation of metadata embeddings and the MONET unit. However, these innovations are generally deployable modules. To illustrate this, we also describe a MONET unit for DeepWalk [22], a popular graph embedding algorithm.
Notation. In this paper, given a matrix and an index , denotes the th row vector of . Column indices will not be used. denotes the zero matrix, and denotes the Frobenius norm.
3 Metadata Embeddings and Orthogonal Training
In this section we present the components of MONET. First, in Section 3.1, we extend traditional graph representation to include metadata embeddings, achieving desiderata D1. Next, in Section 3.2, we prove that a model with just D1 will still leak metadata information to the topology embeddings. Then, in Section 3.3 we present MONET’s key algorithm for training metadata and topology embeddings orthogonally, achieving desiderata D2. Finally, we conclude with some analysis of MONET in Section 3.4.
3.1 Jointly Modeling Metadata & Topology
A natural first approach to controlling metadata effects is to introduce a partition of the embedding space that models them explicitly, adding interpretability and separability to the overall representation. We denote the node metadata matrix as , where the metadata for node is contained in the row vector . To achieve D1, we feed through a singlelayer neural network with weights . Figure 2 shows a realization of this idea, using the GloVe embedding model, which we denote . Here, two weight matrices are needed to embed the metadata for both center and context nodes. This network yields metadata embeddings , , which correspond to the standard topology embeddings , (respectively), and can be concatenated to form a combined metadata and topology vector. The resulting joint metadata and topology loss is:
(2) 
While in this paper we demonstrate metadata embeddings using GloVe, we note that they can be incorporated in any GNN which utilizes a nodewise graph representation. For instance, the wellknown DeepWalk [22] loss, which is based on word2vec [17], would incorporate metadata embeddings as follows:
Above, is the set of context pairs from random walks, and is a set of negative samples associated with node .
This new modeling approach, which augments standard graph representations with a metadata partition, provides users of graph embeddings with the option to include or exclude metadatalearned dimensions. Furthermore, suppose that the metadata (e.g. demographic information) are indeed associated with the formation of links in the graph. In this case, ostensibly, the dedicated metadata embedding space could relieve the topology dimensions of the responsibility to encode this relationship, thereby reducing metadata bias in those dimensions. In our empirical study (Section 4) we show that to some extent, this does actually occur. However, we find – empirically and theoretically – that this naïve approach does not guarantee that the topology embeddings are completely decorrelated from the metadata embeddings. In fact, suprisingly, we find that most of the metadata bias sufferred by standard baselines remain in the topology dimensions. This is a phenomenon we call metadata leakage, which we formalize and prove in the next section.
3.2 Metadata Leakage in Graph Neural Networks
Here, we formally define metadata leakage for general topology and metadata embeddings, and show how it can occur even in embedding models with partitioned metadata embeddings. This motivates the need for both D1 and D2 in a complete approach to controlling metadata effects in GNNs. All proofs appear in the supplement.
Definition 1.
The metadata leakage of metadata embeddings into topology embeddings is defined . We say that there is no metadata leakage if and only if .
Without a more nuanced approach, metadata leakage can occur even in embedding models with a metadata partition, like those discussed in Section 3.1. To demonstrate this we consider a reduced metadataladen GloVe loss with unique topology and metadata representations and :
(3) 
We now show that under a random update of GloVe under Eq. (3), the expected metadata leakage is nonzero. Specifically, let be a node pair from , and define as the incurred Stochastic Gradient Descent update . Suppose there is a “groundtruth” metadata transformation , and define groundtruth metadata embeddings , which represent the “true” dimensions of the metadata effect on the cooccurrences . Define and . With expectations taken with respect to the sampling of a pair for Stochastic Gradient Descent, define and . Define , similarly. Then our main Theorem is as follows:
Theorem 1.
Assume for , , and . Suppose for some fixed we have . Let be a randomly sampled cooccurrence pair and the incurred update. Then if , we have
(4) 
Importantly, and are neural network hyperparameters, so we give a useful Corollary:
Corollary 1.
Under the assumptions of Theorem 1, as .
Note that under reasonable GNN initialization schemes, and are random perturbations. Thus, Corollary 1 implies the surprising result that incorporating a metadata embedding partition is not sufficient to prevent metadata leakage in practical settings.
3.3 MONET: MetadataOrthogonal Node Embedding Training
Here, we introduce the MetadataOrthogonal Node Embedding Training (MONET) unit for training joint topologymetadata graph representations without metadata leakage. MONET explicitly prevents the correlation between topology and metadata, by using the Singular Value Decomposition (SVD) of to orthogonalize updates to during training.
MONET. The MONET unit is a twostep algorithm applied to the training of a topology embedding in a neural network, and is detailed in Algorithm 1. The input to a MONET unit is a metadata embedding and a target topology embedding for debiasing. Then, let be the leftsingular vectors of , and define the projection . In the forward pass procedure, debiased topology weights are obtained by using the projection . Similarly, is used in place of in subsequent GNN layers. In the backward pass, MONET also debiases the backpropagation update to the topology embedding, , using .
Straightforward properties of the SVD show that MONET directly prevents metadata leakage:
Theorem 2.
Using Algorithm 1, and .
We note that in this work we have only considered linear metadata leakage; provable guarantees for debiasing nonlinear topology/metadata associations is an area of future work.
Implementation (). We demonstrate MONET in our experiments by applying Algorithm 1 to Eq. (2), which we denote as . We orthogonalize the input and output topology embeddings with the summed metadata embeddings . By linearity, this implies orthogonal training of the summed topology representation . We note that working with the sums of center and context embeddings is the standard way to combine these matrices [21]. Figure 3 gives a full illustration of .
Relaxation. A natural generalization of the MONET unit is to parameterize the level of orthogonalization incurred at each training step. Specifically, we introduce a parameter which controls the extent to which topology embeddings are projected onto the metadataorthogonal hyperplane:
(5) 
takes the role of in 1. Using MONET with removes all linear bias, whereas using prevents any debiasing. In Section 4.2, we explore how affects the tradeoff between linear debiasing and task accuracy.
3.4 Analysis
We briefly remark about the algorithmic complexity of MONET, and the interpretation of its parameters.
Algorithmic Complexity. The bottleneck of MONET occurs in the SVD computation and orthogonalization. In our setting, the SVD is [27]. The matrix need not be computed to perform orthogonalization steps, as , and the righthand quantity is to compute. Hence the general complexity of the MONET unit is . In the experiments section we compare the wall clock time of MONET and baselines, showing only about a 23% overall wall time increase from standard GloVe.
Metadata Parameter Interpretation. The terms in the sum of the loss for GloVe models with metadata ( and ) involve the dot product . That expansion suggests that the matrix contains all pairwise metadata dimension relationships. In other words, gives the direction and magnitude of the raw metadata effect on log cooccurrence, and is therefore a way to measure the extent to which the model has captured metadata information. We will refer to this interpretation in the experiments that follow. An important experiment will show that applying the MONET algorithm increases the magnitude of entries.
4 Experimental Analysis
In this section we design and analyze experiments with the goal of answering the following questions:

Can MONET remove linear bias from topology embeddings and downstream prediction tasks?

Can MONET help reduce bias in topology embeddings even when they are used in nonlinear models?

Can MONET provide a reasonable tradeoff between debiasing and accuracy on a downstream retrieval task?

Does adding the MONET unit to a method incur a significant performance overhead?
In all experiment settings, our topology embedding of interest for evaluation is the sum of the centercontext topology embeddings and . All GloVebased models are trained with TensorFlow [1] using the AdaGrad optimizer [11] with initial learning rate 0.05 and cooccurrence batch size 100. DeepWalk was trained using the gensim software [24]. Software for the MONET algorithm and code to reproduce the following experiments are available in the supplemental material
4.1 Experiment 1: Debiasing a Political Blogs Graph
Here we seek to answer Q1 and Q2, illustrating the effect of MONET debiasing by
removing the effect of political ideology from a blogger network [3]. The political blog network
Baseline Methods. In this experiment we compare against the following methods: (1) a random topology embedding generated from a multivariate Normal distribution; (2) DeepWalk; (3) GloVe; (4) ; (5) an adversarial version of GloVe, in which a 2layer MLP discriminator is trained to predict political affiliation [12, 5]. All methods use 16dimensional topology embeddings and (if appropriate) 2dimensional metadata embeddings of blog affiliation. Full details on methods and training are given in the supplement.
Design. In this experiment we use two metrics to measure the bias of topology embeddings :

Accuracy of a blog affiliation classifier trained on .

Metadata Leakage .
Because we are studying debiasing potential, Accuracy scores close to 0.5 are desirable (due to equallysized blog affiliation communities), and Metadata Leakage equal to 0.0 is optimal. For methods without metadata embeddings, is computed with the original metadata .
One repetition of the experiment proceeds as follows. Graph representations are trained with each method (some include metadata embedding partitions). For each , a train set of of nodes is sampled uniformly. Classifiers of blog affiliation are trained using each methods’ topology embeddings. We compute Accuracy on the test set and on all nodes. We report mean and standard deviations across 10 independent repetitions.
Results. First we analyze the accuracy of a linear SVM at predicting political party affliation, shown in Figure 3(a). The performance of under the linear classifier is indistinguishable from the random baseline, which shows that the MONET unit perfectly debiased the GloVe embeddings during training, answering question Q1 in the affirmative. We note that this result is a direct implication of Theorem 2.
Interestingly, although the naive also has a metadata embedding partition, its performance under the linear classifier is nearly the same as baselines with no metadata (GloVe and DeepWalk). This demonstrates the downstream effect of Theorem 1, and shows the necessity of desiderata D2 in a rigorous approach to graph embedding debiasing. We note that MONET also outperforms the adversarial extension of GloVe, even though adversarial approaches have performed well in similar domains [5].
Second, we observe the results from training a nonlinear classifier (the RBF SVM), as shown in Figure 3(b). Here, we see that all embedding methods contain nonlinear biases to exploit for the prediction of political affiliation. This includes because the MONET unit only performs linear debiasing. Surprisingly, here we find that still produces the least biased representations, even beating an adversarial approach. This answers question Q2, showing that MONET can reduce bias even when its representations are used as features inside of a nonlinear model.
Method  Metadata Importance  
GloVe  6503 196  N/A 
Random  283 22  N/A 
Adversary  4430 305  N/A 
DeepWalk  2670 104  N/A 
2103 183  
Empirical Analysis. Here we provide indepth model understanding of MONET and, by comparison, and GloVe. The metadata leakage (), as defined in Section 3.2, is shown for each embedding set in Table 1. We see that embedding models without metadata control have high leakage. As predicted by Corollary 1, ’s metadata leakage also remains large – whereas, due to Theorem 2, ’s is at machine precision. This again shows that a metadata embedding partition is not sufficient to isolate the metadata effect.
The need for over the naïve can be observed in two other ways. First, recall from Section 3.4 that the “Metadata Importance” matrix encodes pairwise relationships between metadata dimensions in embedding space. There is a noticeable increase in magnitude when MONET is used, implying that metadata embeddings are not capturing all possible metadata information. Second, Figure 5 shows the 2D PCA of each models’ 16dimensional topology embeddings, colored by political affiliation. Clear separation by affiliation is visible with the GloVe model’s PCA. The PCA of also shows separation, but less so. This shows that having a metadata partition in embedding space – desiderata D1 – does some work toward debiasing the topology dimensions. However, the orthogonalizing MONET unit does the bulk of the work, as shown by the overlap of the affiliation clusters in the PCA.
4.2 Experiment 2: Debiasing a Shilling Attack
In this experiment we investigate Q3 and Q4 in the context of using debiasing methods to defend against a shilling attack [8] on graphembedding based recommender systems [30]. In a shilling attack, a number of users act together to artificially increase the likelihood that a particular influenced item will be recommended for a particular target item.
Data. In a single repetition of this experiment, we inject an artificial shilling attack into the MovieLens 100k dataset
Methods. We include all methods from the political blogs experiment in Section 4.1, with the exception of the adversarial baseline, as it was not implemented for continuous metadata. Instead, we applied a generalized correlation removal framework developed for removing word embedding bias [25, 4]. Specifically, we compute an “attack” direction (analogous to “gender” direction in language modeling literature) as follows. Let be the GloVe topology embedding, and let be the attacker metadata described above. Note that is simply the scalar number of known attackers on item . Then we compute the weighted attack direction as:
(6) 
Following [25], we compute the “NLP” baseline debiased graph embeddings as
(7) 
We also run relaxed (see Section 3.3) for various values of , to investigate the tradeoff between accuracy and debiasing.
As we wish to study item recommendation, in the random walks, we simply remove user nodes each time they are visited (so the walks contain only pairwise cooccurrence information over items). All methods compute 128 dimensional topology embeddings for the items. As metadata, we allow MONET models to know the peritem attacker rating count for each attacked item. However, to better demonstrate realworld performance, we only allow 50% (randomly sampled) attackers from the original 5% sample to be “known” when constructing these metadata. Additional training details are given in the supplemental.
Design. In this experiment we compute two metrics to investigate the biasaccuracy tradeoff:

The number of influence items in in top20 embeddingnearestneighbor list of . This is a measure of downstream bias on a retrieval task.

The mean reciprocal rank for item retrieval. Let be the nearest neighbor of by randomwalk cooccurrence count. Let be the integer rank of item against ’s complete set of embedding cosine distances. Then
(8)
Results. The results of the shilling experiment are shown in Figure 6, which compares the number of top20 attacked items (a measure of bias) against MRR lift over a random baseline (a measure of accuracy).
First, we analyze how effectively each method debiased the embeddings. Here we see that the topology embeddings from () prevent the most attacker bias by a large margin, letting less than influenced item into the top20 neighbors of . We note that this behavior occurs even though the majority of observed cooccurrences for the algorithm had nothing to do with the attack in question, and only half of the true attackers were known. All other baselines (including those that explicitly model the attacker metadata) left at least around half of the attacked items in the top20 list.
Next, we consider the ranking accuracy of each method. Here we observe that outperforms the random baseline by at least 8.5x, and is comparable to one baseline (DeepWalk). Regarding the NLP debiasing method, we see that it outperformed in terms of ranking accuracy, but allowed a surprising number of attacked items (7.5) in top20 lists. We note that this method can not reduce bias further, and does offer any guarantees of performance. Finally, we observe that relaxing MONET’s orthogonality constraint (, the level of orthogonal projection), decreases the effective debiasing, but improves embedding performance (as might be expected). Note that (not shown), recovers the solution. This reveals a tradeoff between embedding debiasing and prediction efficacy which has also been observed in other contexts [5]. This confirms Q3 – that MONET allows for a controllable tradeoff between debiasing and accuracy.
In addition to the ranking metrics, we also compute the Pearson correlation of each embedding w.r.t. the GloVe topology embeddings, shown in Table 2. We find that achieves comparable correlation to and NLP, showing that the SVD operation does not harshly corrupt the overall GloVe embedding space. We also report wall times for each method, showing that the metadata embedding and SVD operations of add negligibly to the runtime over GloVe. Together, these metrics answer Q4 in the negative.
Method  Distances  Wall Time (sec) 

DeepWalk  0.347 0.003  45 4 
GloVe  1.000 0.000  357 54 
0.790 0.002  359 49  
0.758 0.004  442 58  
Random  0.031 0.001  N/A 
NLP  0.770 0.008  N/A 
5 Conclusion
This work introduced MONET, a novel GNN training technique which theoretically guarantees linear debiasing of graph embeddings from sensitive metadata. Our analyses illustrate the complexities of modeling metadata in GNNs, revealing that simply partitioning the embedding space to accommodate metadata does not achieve debiasing. In an empirical study, only MONET fully masked metadata signal from a linear classifier, and even outperformed adversarial techniques when used with a nonlinear model. A realworld item retrieval experiment showed that MONET successfully prevents shilling attack bias without severely impacting the performance of the underlying GNN.
There are two promising directions of future work in this new area. First, MONET only guarantees linear debiasing. Methods and exact guarantees for controlling nonlinear associations should be investigated. Toward this, the RBF SVM accuracy results on our political blogs experiment can serve as a benchmark. Second, we did not explore the performance of MONET in deeper or supervised GNNs, or its handling of highdimensional metadata, each of which present interesting modeling challenges. For instance, in a deep supervised graph GNNs, it is not clear which hidden (embedding) layers would be worth debiasing, or whether associations between the metadata and prediction task are detrimental. While our paper makes a strong case for the use of MONET in current industry applications, answering these questions will greatly expand the technique’s potential impact.
Footnotes
 Software and analysis code also available at http://urlremovedforreview
 Available within the GraphTool software [20]
 http://files.grouplens.org/datasets/movielens/ml100k/
References
 (2016) Tensorflow: a system for largescale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §4.
 (2019) MixHop: higherorder graph convolutional architectures via sparsified neighborhood mixing. Proceedings of the 36th International Conference on Machine Learning 97, pp. 21–29. Cited by: §1.
 (2005) The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. Cited by: §4.1.
 (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pp. 4349–4357. Cited by: §4.2.
 (2019) Compositional fairness constraints for graph embeddings. Proceedings of the 36th International Conference on Machine Learning. Cited by: §1, §4.1, §4.1, §4.2.
 (2019) Global vectors for node representations. In The World Wide Web Conference, pp. 2587–2593. Cited by: §2.
 (2018) A tutorial on network embeddings. arXiv preprint arXiv:1808.02590. Cited by: §1, §2.
 (2005) Preventing shilling attacks in online recommender systems. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, WIDM ’05, pp. 67–74. External Links: ISBN 1595931945, Document Cited by: §4.2.
 (2018) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
 (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §1, §1.
 (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (Jul), pp. 2121–2159. Cited by: §4.
 (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §4.1.
 (2016) Node2vec: scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1.
 (2018) Enhancing network embedding with auxiliary information: an explicit matrix factorization perspective. In International Conference on Database Systems for Advanced Applications, pp. 3–19. Cited by: §1.
 (2017) Semisupervised classification with graph convolutional networks. International Conference on Learning Representations. Cited by: §1.
 (2017) PPNE: property preserving network embedding. In International Conference on Database Systems for Advanced Applications, pp. 163–179. Cited by: §1.
 (2013) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119. Cited by: §2, §3.1.
 (2016) Structure and inference in annotated networks. Nature Communications 7, pp. 11863. Cited by: §1, §1.
 (2017) The ground truth about metadata and community detection in networks. Science advances 3 (5), pp. e1602548. Cited by: §1.
 (2014) The graphtool python library. figshare. External Links: Link, Document Cited by: footnote 2.
 (2014) Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Cited by: §2, §2, §3.3.
 (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. Cited by: §1, §1, §2, §2, §3.1.
 (2018) Network Embedding as Matrix Factorization: unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §1.
 (20100522) Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50 (English). Note: \urlhttp://is.muni.cz/publication/884893/en Cited by: §4.
 (2015) Rejecting the gender binary: a vectorspace operation. Ben’s Bookworm Blog. Cited by: §4.2.
 (2015) LINE: largescale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. Cited by: §1.
 (1997) Numerical linear algebra. Vol. 50, Siam. Cited by: §3.4.
 (2015) Network representation learning with rich text information. In TwentyFourth International Joint Conference on Artificial Intelligence, Cited by: §1, §1.
 (2016) Revisiting semisupervised learning with graph embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine Learning  Volume 48, ICML’16, pp. 40–48. Cited by: §1.
 (2018) Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: §4.2.
 (2016) Homophily, structure, and content augmented network representation learning. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 609–618. Cited by: §1.
 (2007) Combining content and link for classification using matrix factorization. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 487–494. Cited by: §1.