MONET: Debiasing Graph Embeddings via the Metadata-Orthogonal Training Unit

MONET: Debiasing Graph Embeddings via the Metadata-Orthogonal Training Unit


In many real world graphs, the formation of edges can be influenced by certain sensitive features of the nodes (e.g. their gender, community, or reputation). In this paper we argue that when such influences exist, any downstream Graph Neural Network (GNN) will be implicitly biased by these structural correlations. To allow control over this phenomenon, we introduce the Metadata-Orthogonal Node Embedding Training (MONET) unit, a general neural network architecture component for performing training-time linear debiasing of graph embeddings. MONET operates by ensuring that the node embeddings are trained on a hyperplane orthogonal to that of the node features (metadata). Unlike debiasing approaches in similar domains, our method offers exact guarantees about the correlation between the resulting embeddings and any sensitive metadata. We illustrate the effectiveness of MONET though our experiments on a variety of real world graphs against challenging baselines (e.g. adversarial debiasing), showing superior performance in tasks such as preventing the leakage of political party affiliation in a blog network, and preventing the gaming of embedding-based recommendation systems.

1 Introduction

Figure 1: Illustration of the Metadata-Orthogonal Node Embedding Training (MONET) model. Points are nodes from a toy graph, represented in 2-D, colored by known metadata. There is clear correlation and clustering by the metadata. MONET encodes the metadata signal, then debiases standard topology embeddings from the metadata embeddings. As described in Section 3.3, both representations are trained concurrently and orthogonally.

Graph embeddings – continuous, low-dimensional vector representations of nodes – have been eminently useful in network visualization, node classification, link prediction, and many other graph learning tasks [9]. While graph embeddings can be estimated directly by unsupervised algorithms using the graph’s structure [e.g. 22, 26, 13, 23], there is often additional (non-relational) information available for each node in the graph. This information, frequently referred to as node attributes or node metadata, can contain information that is useful for prediction tasks including demographic, geo-spatial, and/or textual features.

The interplay between a node’s metadata and edges is a rich and active area of research. Interestingly, in a number of cases, this metadata can be measurably related to a graph’s structure [19], and in some instances there may be a causal relationship (the node’s attributes influence the formation of edges). As such, metadata can enhance graph learning models [28, 18], and conversely, graphs can be used as regularizers in supervised and semi-supervised models of node features [29, 10]. Furthermore, metadata are commonly used as evaluation data for graph embeddings [7]. For example, node embeddings trained on a Flickr user graph were shown to predict user-specified Flickr “interests” [22]. This is presumably because users (as nodes) in the Flickr graph tend to follow users with similar interests, which illustrates a potential causal connection between node topology and node metadata.

However, despite the usefulness and prevalence of metadata in graph learning, there are instances where it is desirable to design a system to avoid the effects of a particular kind of sensitive data. For instance, the designers of a recommendation system may want to make recommendations independent of a user’s demographic information or location.

At first glance, this may seem like an artificial dilemma – surely one could just avoid the problem by not adding such sensitive attributes to the model. However, such an approach (ignoring a sensitive attribute) does not control for any existing correlations that may exist between the sensitive metadata and the edges of a node. In other words, if the edges of the graph are correlated with sensitive metadata, then any algorithm which does not explicitly model and remove this correlation will be biased as a result of it.

To this end, we propose two simple desiderata for handling sensitive metadata in GNNs:

  • The influence of metadata on the graph topology is modeled in a partitioned subset of embedding space, providing interpretability to the overall graph representation, and the option to remove metadata dimensions.

  • Non-metadata or “topology” embeddings are debiased from metadata embeddings with a provable guarantee on the level of remaining bias.

Existing embedding methods for graphs with metadata [e.g. 32, 28, 18, 31, 16, 14] do not consider D2, and most do not not fully address D1 either. In the graph CNN literature [e.g. 15, 10, 2], metadata are treated as features in predictive models, rather than attributes to be controlled. The work most relevant to the above desiderata is [5], which introduces adversarial debiasing of metadata to graph-based recommender systems. However, that approach does not carry strong theoretical guarantees about the amount of bias removed from the graph representation, and therefore does not satisfy D2.

In this work we propose a novel technique for Graph Neural Networks (GNNs) that satisfies both of these desiderata. Our method, the Metadata-Orthogonal Node Embedding Training (MONET) unit, operates by ensuring that metadata embeddings are trained on a hyperplane orthgonal to that of the topology embeddings (as conceptually illustrated in Figure 1). Specifically, our contributions are the following:

  1. The Metadata-Orthogonal Node Embedding Training (MONET) unit, a novel GNN algorithm which jointly embeds graph topology and graph metadata while enforcing linear decorrelation between the two embedding spaces.

  2. Analysis which proves that addressing desiderata D1 alone – partitioning a metadata embedding space – still produces a biased topology embedding space, and that the MONET unit corrects this.

  3. Experimental results on real world graphs which show that MONET can successfully debias topology embeddings while relegating metadata information to separate dimensions – achieving superior results to state-of-the-art adversarial and NLP-based debiasing methods.

2 Preliminaries

Early graph embedding methods involved dimensionality reduction techniques like multidimensional scaling and singular value decomposition [7]. In this paper we use graph neural networks trained on random walks, similar to DeepWalk [22]. DeepWalk and many subsequent methods first generate a sequence of random walks from the graph, to create a “corpus” of node “sentences” which are then modeled via word embedding techniques (e.g. word2vec [17] or GloVe [21]) to learn low dimensional representations that preserve the observed co-occurrence similarity.

Let be a -dimensional graph embedding matrix, , which aims to preserve the low-dimensional structure of a graph (). Rows of correspond to nodes, and node pairs with large dot-products should be structurally or topologically close in the graph. As a concrete example, in this paper we consider the debiasing of a recently proposed graph embedding using the GloVe model [6]. Its training objective is:


Above, are the “center” and “context” embeddings, are the biases, is the walk-distance-weighted context co-occurrences, and is the loss smoothing function [21]. We use the GloVe model in the next section, and throughout the paper, to illustrate the incorporation of metadata embeddings and the MONET unit. However, these innovations are generally deployable modules. To illustrate this, we also describe a MONET unit for DeepWalk [22], a popular graph embedding algorithm.

Notation. In this paper, given a matrix and an index , denotes the -th row vector of . Column indices will not be used. denotes the zero matrix, and denotes the Frobenius norm.

3 Metadata Embeddings and Orthogonal Training

In this section we present the components of MONET. First, in Section 3.1, we extend traditional graph representation to include metadata embeddings, achieving desiderata D1. Next, in Section 3.2, we prove that a model with just D1 will still leak metadata information to the topology embeddings. Then, in Section 3.3 we present MONET’s key algorithm for training metadata and topology embeddings orthogonally, achieving desiderata D2. Finally, we conclude with some analysis of MONET in Section 3.4.

3.1 Jointly Modeling Metadata & Topology

A natural first approach to controlling metadata effects is to introduce a partition of the embedding space that models them explicitly, adding interpretability and separability to the overall representation. We denote the node metadata matrix as , where the metadata for node is contained in the row vector . To achieve D1, we feed through a single-layer neural network with weights . Figure 2 shows a realization of this idea, using the GloVe embedding model, which we denote . Here, two weight matrices are needed to embed the metadata for both center and context nodes. This network yields metadata embeddings , , which correspond to the standard topology embeddings , (respectively), and can be concatenated to form a combined metadata and topology vector. The resulting joint metadata and topology loss is:


While in this paper we demonstrate metadata embeddings using GloVe, we note that they can be incorporated in any GNN which utilizes a node-wise graph representation. For instance, the well-known DeepWalk [22] loss, which is based on word2vec [17], would incorporate metadata embeddings as follows:

Above, is the set of context pairs from random walks, and is a set of negative samples associated with node .

Figure 2: Illustration of . and are topology embedding layers. A single-layer feed-forward neural network creates metadata embeddings layer and . The concatenations and create the output node representations.

This new modeling approach, which augments standard graph representations with a metadata partition, provides users of graph embeddings with the option to include or exclude metadata-learned dimensions. Furthermore, suppose that the metadata (e.g. demographic information) are indeed associated with the formation of links in the graph. In this case, ostensibly, the dedicated metadata embedding space could relieve the topology dimensions of the responsibility to encode this relationship, thereby reducing metadata bias in those dimensions. In our empirical study (Section 4) we show that to some extent, this does actually occur. However, we find – empirically and theoretically – that this naïve approach does not guarantee that the topology embeddings are completely decorrelated from the metadata embeddings. In fact, suprisingly, we find that most of the metadata bias sufferred by standard baselines remain in the topology dimensions. This is a phenomenon we call metadata leakage, which we formalize and prove in the next section.

3.2 Metadata Leakage in Graph Neural Networks

Here, we formally define metadata leakage for general topology and metadata embeddings, and show how it can occur even in embedding models with partitioned metadata embeddings. This motivates the need for both D1 and D2 in a complete approach to controlling metadata effects in GNNs. All proofs appear in the supplement.

Definition 1.

The metadata leakage of metadata embeddings into topology embeddings is defined . We say that there is no metadata leakage if and only if .

Without a more nuanced approach, metadata leakage can occur even in embedding models with a metadata partition, like those discussed in Section 3.1. To demonstrate this we consider a reduced metadata-laden GloVe loss with unique topology and metadata representations and :


We now show that under a random update of GloVe under Eq. (3), the expected metadata leakage is non-zero. Specifically, let be a node pair from , and define as the incurred Stochastic Gradient Descent update . Suppose there is a “ground-truth” metadata transformation , and define ground-truth metadata embeddings , which represent the “true” dimensions of the metadata effect on the co-occurrences . Define and . With expectations taken with respect to the sampling of a pair for Stochastic Gradient Descent, define and . Define , similarly. Then our main Theorem is as follows:

Theorem 1.

Assume for , , and . Suppose for some fixed we have . Let be a randomly sampled co-occurrence pair and the incurred update. Then if , we have


Importantly, and are neural network hyperparameters, so we give a useful Corollary:

Corollary 1.

Under the assumptions of Theorem 1, as .

Note that under reasonable GNN initialization schemes, and are random perturbations. Thus, Corollary 1 implies the surprising result that incorporating a metadata embedding partition is not sufficient to prevent metadata leakage in practical settings.

3.3 MONET: Metadata-Orthogonal Node Embedding Training

Here, we introduce the Metadata-Orthogonal Node Embedding Training (MONET) unit for training joint topology-metadata graph representations without metadata leakage. MONET explicitly prevents the correlation between topology and metadata, by using the Singular Value Decomposition (SVD) of to orthogonalize updates to during training.

Figure 3: The MONET unit adds a feed-forward transformation of the metadata, resulting in metadata embeddings and . gives the combined metadata representation, used to debias and via .

MONET. The MONET unit is a two-step algorithm applied to the training of a topology embedding in a neural network, and is detailed in Algorithm 1. The input to a MONET unit is a metadata embedding and a target topology embedding for debiasing. Then, let be the left-singular vectors of , and define the projection . In the forward pass procedure, debiased topology weights are obtained by using the projection . Similarly, is used in place of in subsequent GNN layers. In the backward pass, MONET also debiases the backpropagation update to the topology embedding, , using .

1:Input: topology embedding , metadata embedding
2:procedure Forward Pass debiasing(, )
3:     Compute left-singular vectors and projection:
5:     Compute orthogonal topology embedding:
7:     return debiased graph representation
8:end procedure
9:procedure Backward Pass debiasing()
10:     Compute orthogonal topology embedding update:
12:     Apply update:
14:     return debiased topology embedding
15:end procedure
Algorithm 1 MONET Unit Training Step

Straightforward properties of the SVD show that MONET directly prevents metadata leakage:

Theorem 2.

Using Algorithm 1, and .

We note that in this work we have only considered linear metadata leakage; provable guarantees for debiasing nonlinear topology/metadata associations is an area of future work.

Implementation (). We demonstrate MONET in our experiments by applying Algorithm 1 to Eq. (2), which we denote as . We orthogonalize the input and output topology embeddings with the summed metadata embeddings . By linearity, this implies -orthogonal training of the summed topology representation . We note that working with the sums of center and context embeddings is the standard way to combine these matrices [21]. Figure 3 gives a full illustration of .

Relaxation. A natural generalization of the MONET unit is to parameterize the level of orthogonalization incurred at each training step. Specifically, we introduce a parameter which controls the extent to which topology embeddings are projected onto the metadata-orthogonal hyperplane:


takes the role of in 1. Using MONET with removes all linear bias, whereas using prevents any debiasing. In Section 4.2, we explore how affects the trade-off between linear debiasing and task accuracy.

3.4 Analysis

We briefly remark about the algorithmic complexity of MONET, and the interpretation of its parameters.

Algorithmic Complexity. The bottleneck of MONET occurs in the SVD computation and orthogonalization. In our setting, the SVD is [27]. The matrix need not be computed to perform orthogonalization steps, as , and the right-hand quantity is to compute. Hence the general complexity of the MONET unit is . In the experiments section we compare the wall clock time of MONET and baselines, showing only about a 23% overall wall time increase from standard GloVe.

Metadata Parameter Interpretation. The terms in the sum of the loss for GloVe models with metadata (  and ) involve the dot product . That expansion suggests that the matrix contains all pairwise metadata dimension relationships. In other words, gives the direction and magnitude of the raw metadata effect on log co-occurrence, and is therefore a way to measure the extent to which the model has captured metadata information. We will refer to this interpretation in the experiments that follow. An important experiment will show that applying the MONET algorithm increases the magnitude of entries.

4 Experimental Analysis

In this section we design and analyze experiments with the goal of answering the following questions:

  • Can MONET remove linear bias from topology embeddings and downstream prediction tasks?

  • Can MONET help reduce bias in topology embeddings even when they are used in nonlinear models?

  • Can MONET provide a reasonable trade-off between debiasing and accuracy on a downstream retrieval task?

  • Does adding the MONET unit to a method incur a significant performance overhead?

In all experiment settings, our topology embedding of interest for evaluation is the sum of the center-context topology embeddings and . All GloVe-based models are trained with TensorFlow [1] using the AdaGrad optimizer [11] with initial learning rate 0.05 and co-occurrence batch size 100. DeepWalk was trained using the gensim software [24]. Software for the MONET algorithm and code to reproduce the following experiments are available in the supplemental material1.

4.1 Experiment 1: Debiasing a Political Blogs Graph

Here we seek to answer Q1 and Q2, illustrating the effect of MONET debiasing by removing the effect of political ideology from a blogger network [3]. The political blog network2 has has 1,107 nodes corresponding to blog websites, 19,034 hyperlink edges between the blogs (after converting the graph to be undirected), and two clearly defined, equally sized communities of liberal and conservative bloggers.

Baseline Methods. In this experiment we compare  against the following methods: (1) a random topology embedding generated from a multivariate Normal distribution; (2) DeepWalk; (3) GloVe; (4) ; (5) an adversarial version of GloVe, in which a 2-layer MLP discriminator is trained to predict political affiliation [12, 5]. All methods use 16-dimensional topology embeddings and (if appropriate) 2-dimensional metadata embeddings of blog affiliation. Full details on methods and training are given in the supplement.

Design. In this experiment we use two metrics to measure the bias of topology embeddings :

  • Accuracy of a blog affiliation classifier trained on .

  • Metadata Leakage .

Because we are studying debiasing potential, Accuracy scores close to 0.5 are desirable (due to equally-sized blog affiliation communities), and Metadata Leakage equal to 0.0 is optimal. For methods without metadata embeddings, is computed with the original metadata .

One repetition of the experiment proceeds as follows. Graph representations are trained with each method (some include metadata embedding partitions). For each , a train set of of nodes is sampled uniformly. Classifiers of blog affiliation are trained using each methods’ topology embeddings. We compute Accuracy on the test set and on all nodes. We report mean and standard deviations across 10 independent repetitions.

(a) Sensitive attribute prediction using linear model
(b) Sensitive attribute prediction using nonlinear model
Figure 4: Accuracy results from blog political affiliation classifiers. Lower is better (embeddings more debiased).  achieves this perfectly with a linear classifier, due to the MONET unit’s novel orthogonalization technique. Interestingly, on this task,  is dominant even under a nonlinear classifier.

Results. First we analyze the accuracy of a linear SVM at predicting political party affliation, shown in Figure 3(a). The performance of  under the linear classifier is indistinguishable from the random baseline, which shows that the MONET unit perfectly debiased the GloVe embeddings during training, answering question Q1 in the affirmative. We note that this result is a direct implication of Theorem 2.

Interestingly, although the naive  also has a metadata embedding partition, its performance under the linear classifier is nearly the same as baselines with no metadata (GloVe and DeepWalk). This demonstrates the downstream effect of Theorem 1, and shows the necessity of desiderata D2 in a rigorous approach to graph embedding debiasing. We note that MONET also outperforms the adversarial extension of GloVe, even though adversarial approaches have performed well in similar domains [5].

Second, we observe the results from training a non-linear classifier (the RBF SVM), as shown in Figure 3(b). Here, we see that all embedding methods contain non-linear biases to exploit for the prediction of political affiliation. This includes   because the MONET unit only performs linear debiasing. Surprisingly, here we find that  still produces the least biased representations, even beating an adversarial approach. This answers question Q2, showing that MONET can reduce bias even when its representations are used as features inside of a non-linear model.

Method Metadata Importance
GloVe 6503 196 N/A
Random 283 22 N/A
Adversary 4430 305 N/A
DeepWalk 2670 104 N/A
2103 183
Table 1: When trained to debias political affiliation, MONET removes metadata leakage to precision error. Metadata Importance values - computed from layers of  and  - show that the MONET unit allows stronger metadata learning.

Empirical Analysis. Here we provide in-depth model understanding of MONET and, by comparison,  and GloVe. The metadata leakage (), as defined in Section 3.2, is shown for each embedding set in Table 1. We see that embedding models without metadata control have high leakage. As predicted by Corollary 1, ’s metadata leakage also remains large – whereas, due to Theorem 2, ’s is at machine precision. This again shows that a metadata embedding partition is not sufficient to isolate the metadata effect.

The need for  over the naïve  can be observed in two other ways. First, recall from Section 3.4 that the “Metadata Importance” matrix encodes pairwise relationships between metadata dimensions in embedding space. There is a noticeable increase in magnitude when MONET is used, implying that  metadata embeddings are not capturing all possible metadata information. Second, Figure 5 shows the 2-D PCA of each models’ 16-dimensional topology embeddings, colored by political affiliation. Clear separation by affiliation is visible with the GloVe model’s PCA. The PCA of  also shows separation, but less so. This shows that having a metadata partition in embedding space – desiderata D1 – does some work toward debiasing the topology dimensions. However, the orthogonalizing MONET unit does the bulk of the work, as shown by the overlap of the affiliation clusters in the  PCA.

Figure 5: PCA of political blog graph embeddings. (a): affiliation separation clearly visible on standard GloVe embeddings. (b): affiliation separation reduces when  captures some metadata information. (c): affiliation separation disappears with  orthogonalized training.

4.2 Experiment 2: Debiasing a Shilling Attack

In this experiment we investigate Q3 and Q4 in the context of using debiasing methods to defend against a shilling attack [8] on graph-embedding based recommender systems [30]. In a shilling attack, a number of users act together to artificially increase the likelihood that a particular influenced item will be recommended for a particular target item.

Data. In a single repetition of this experiment, we inject an artificial shilling attack into the MovieLens 100k dataset3. The raw data is represented as a bipartite graph with 943 users, 1682 movies (“items”), and a total of 100,000 ratings (edges). Each user has rated at least 20 items. At random, we sample 10 items into an influence set , and a target item to be attacked. We take a random sample of 5% of the existing users to be the set of attackers, . We then create a new graph, which in addition to all the existing ratings, contains new ratings from each attacker to each item as well as the target video.

Methods. We include all methods from the political blogs experiment in Section 4.1, with the exception of the adversarial baseline, as it was not implemented for continuous metadata. Instead, we applied a generalized correlation removal framework developed for removing word embedding bias [25, 4]. Specifically, we compute an “attack” direction (analogous to “gender” direction in language modeling literature) as follows. Let be the GloVe topology embedding, and let be the attacker metadata described above. Note that is simply the scalar number of known attackers on item . Then we compute the weighted attack direction as:


Following [25], we compute the “NLP” baseline debiased graph embeddings as


We also run relaxed  (see Section 3.3) for various values of , to investigate the trade-off between accuracy and debiasing.

As we wish to study item recommendation, in the random walks, we simply remove user nodes each time they are visited (so the walks contain only pairwise co-occurrence information over items). All methods compute 128 dimensional topology embeddings for the items. As metadata, we allow MONET models to know the per-item attacker rating count for each attacked item. However, to better demonstrate real-world performance, we only allow 50% (randomly sampled) attackers from the original 5% sample to be “known” when constructing these metadata. Additional training details are given in the supplemental.

Design. In this experiment we compute two metrics to investigate the bias-accuracy trade-off:

  • The number of influence items in in top-20 embedding-nearest-neighbor list of . This is a measure of downstream bias on a retrieval task.

  • The mean reciprocal rank for item retrieval. Let be the nearest neighbor of by random-walk co-occurrence count. Let be the integer rank of item against ’s complete set of embedding cosine distances. Then


Results. The results of the shilling experiment are shown in Figure 6, which compares the number of top-20 attacked items (a measure of bias) against MRR lift over a random baseline (a measure of accuracy).

First, we analyze how effectively each method debiased the embeddings. Here we see that the topology embeddings from () prevent the most attacker bias by a large margin, letting less than influenced item into the top-20 neighbors of . We note that this behavior occurs even though the majority of observed co-occurrences for the algorithm had nothing to do with the attack in question, and only half of the true attackers were known. All other baselines (including those that explicitly model the attacker metadata) left at least around half of the attacked items in the top-20 list.

Next, we consider the ranking accuracy of each method. Here we observe that  outperforms the random baseline by at least 8.5x, and is comparable to one baseline (DeepWalk). Regarding the NLP debiasing method, we see that it outperformed  in terms of ranking accuracy, but allowed a surprising number of attacked items (7.5) in top-20 lists. We note that this method can not reduce bias further, and does offer any guarantees of performance. Finally, we observe that relaxing MONET’s orthogonality constraint (, the level of orthogonal projection), decreases the effective debiasing, but improves embedding performance (as might be expected). Note that (not shown), recovers the  solution. This reveals a trade-off between embedding debiasing and prediction efficacy which has also been observed in other contexts [5]. This confirms Q3 – that MONET allows for a controllable trade-off between debiasing and accuracy.

Figure 6: Shilling experiment accuracy-bias trade-off: the -axis is the lift-above-random of the mean reciprocal rank (MRR) of embedding-based item retrieval against item random-walk co-occurrence ordering. The -axis is the number of attacked items in the top-20 embedding-distance nearest-neighbors of the target item . All points are averages over ten runs of the shilling experiment. Only  completely prevents attack bias, and maintains comparable performance to DeepWalk and greater than 8x better than a random baseline. Confidence intervals along both axes are provided in a supplemental table.

In addition to the ranking metrics, we also compute the Pearson correlation of each embedding w.r.t. the GloVe topology embeddings, shown in Table 2. We find that  achieves comparable correlation to  and NLP, showing that the SVD operation does not harshly corrupt the overall GloVe embedding space. We also report wall times for each method, showing that the metadata embedding and SVD operations of  add negligibly to the runtime over GloVe. Together, these metrics answer Q4 in the negative.

Method Distances  Wall Time (sec)
DeepWalk 0.347 0.003 45 4
GloVe 1.000 0.000 357 54
0.790 0.002 359 49
0.758 0.004 442 58
Random 0.031 0.001 N/A
NLP 0.770 0.008 N/A
Table 2: Shilling experiment performance metrics: “ Distances ” for a method is the Pearson correlation between (1) the cosine embedding distances produced by GloVe topology embeddings and (2) those same distances produced by the method. These figures show that  does not overly corrupt the GloVe embedding signal, and does not add prohibitive computation time.

5 Conclusion

This work introduced MONET, a novel GNN training technique which theoretically guarantees linear debiasing of graph embeddings from sensitive metadata. Our analyses illustrate the complexities of modeling metadata in GNNs, revealing that simply partitioning the embedding space to accommodate metadata does not achieve debiasing. In an empirical study, only MONET fully masked metadata signal from a linear classifier, and even outperformed adversarial techniques when used with a nonlinear model. A real-world item retrieval experiment showed that MONET successfully prevents shilling attack bias without severely impacting the performance of the underlying GNN.

There are two promising directions of future work in this new area. First, MONET only guarantees linear debiasing. Methods and exact guarantees for controlling nonlinear associations should be investigated. Toward this, the RBF SVM accuracy results on our political blogs experiment can serve as a benchmark. Second, we did not explore the performance of MONET in deeper or supervised GNNs, or its handling of high-dimensional metadata, each of which present interesting modeling challenges. For instance, in a deep supervised graph GNNs, it is not clear which hidden (embedding) layers would be worth debiasing, or whether associations between the metadata and prediction task are detrimental. While our paper makes a strong case for the use of MONET in current industry applications, answering these questions will greatly expand the technique’s potential impact.


  1. Software and analysis code also available at http://urlremovedforreview
  2. Available within the Graph-Tool software [20]


  1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving and M. Isard (2016) Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §4.
  2. S. Abu-El-Haija, B. Perozzi, A. Kapoor, N. Alipourfard, K. Lerman, H. Harutyunyan, G. V. Steeg and A. Galstyan (2019) MixHop: higher-order graph convolutional architectures via sparsified neighborhood mixing. Proceedings of the 36th International Conference on Machine Learning 97, pp. 21–29. Cited by: §1.
  3. L. A. Adamic and N. Glance (2005) The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. Cited by: §4.1.
  4. T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama and A. T. Kalai (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pp. 4349–4357. Cited by: §4.2.
  5. A. J. Bose and W. L. Hamilton (2019) Compositional fairness constraints for graph embeddings. Proceedings of the 36th International Conference on Machine Learning. Cited by: §1, §4.1, §4.1, §4.2.
  6. R. Brochier, A. Guille and J. Velcin (2019) Global vectors for node representations. In The World Wide Web Conference, pp. 2587–2593. Cited by: §2.
  7. H. Chen, B. Perozzi, R. Al-Rfou and S. Skiena (2018) A tutorial on network embeddings. arXiv preprint arXiv:1808.02590. Cited by: §1, §2.
  8. P. Chirita, W. Nejdl and C. Zamfir (2005) Preventing shilling attacks in online recommender systems. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, WIDM ’05, pp. 67–74. External Links: ISBN 1-59593-194-5, Document Cited by: §4.2.
  9. P. Cui, X. Wang, J. Pei and W. Zhu (2018) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
  10. M. Defferrard, X. Bresson and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §1, §1.
  11. J. Duchi, E. Hazan and Y. Singer (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (Jul), pp. 2121–2159. Cited by: §4.
  12. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §4.1.
  13. A. Grover and J. Leskovec (2016) Node2vec: scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1.
  14. J. Guo, L. Xu, X. Huang and E. Chen (2018) Enhancing network embedding with auxiliary information: an explicit matrix factorization perspective. In International Conference on Database Systems for Advanced Applications, pp. 3–19. Cited by: §1.
  15. T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations. Cited by: §1.
  16. C. Li, S. Wang, D. Yang, Z. Li, Y. Yang, X. Zhang and J. Zhou (2017) PPNE: property preserving network embedding. In International Conference on Database Systems for Advanced Applications, pp. 163–179. Cited by: §1.
  17. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119. Cited by: §2, §3.1.
  18. M. E. Newman and A. Clauset (2016) Structure and inference in annotated networks. Nature Communications 7, pp. 11863. Cited by: §1, §1.
  19. L. Peel, D. B. Larremore and A. Clauset (2017) The ground truth about metadata and community detection in networks. Science advances 3 (5), pp. e1602548. Cited by: §1.
  20. T. P. Peixoto (2014) The graph-tool python library. figshare. External Links: Link, Document Cited by: footnote 2.
  21. J. Pennington, R. Socher and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Cited by: §2, §2, §3.3.
  22. B. Perozzi, R. Al-Rfou and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. Cited by: §1, §1, §2, §2, §3.1.
  23. J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang and J. Tang (2018) Network Embedding as Matrix Factorization: unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §1.
  24. R. Řehůřek and P. Sojka (2010-05-22) Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50 (English). Note: \url Cited by: §4.
  25. B. Schmidt (2015) Rejecting the gender binary: a vector-space operation. Ben’s Bookworm Blog. Cited by: §4.2.
  26. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan and Q. Mei (2015) LINE: large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. Cited by: §1.
  27. L. N. Trefethen and D. Bau III (1997) Numerical linear algebra. Vol. 50, Siam. Cited by: §3.4.
  28. C. Yang, Z. Liu, D. Zhao, M. Sun and E. Chang (2015) Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Cited by: §1, §1.
  29. Z. Yang, W. W. Cohen and R. Salakhutdinov (2016) Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 40–48. Cited by: §1.
  30. R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: §4.2.
  31. D. Zhang, J. Yin, X. Zhu and C. Zhang (2016) Homophily, structure, and content augmented network representation learning. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 609–618. Cited by: §1.
  32. S. Zhu, K. Yu, Y. Chi and Y. Gong (2007) Combining content and link for classification using matrix factorization. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 487–494. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description