Abstract
We propose NetGAN– the first implicit generative model for graphs able to mimic realworld networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs that exhibit the wellknown network patterns without explicitly specifying them in the model definition. At the same time, our model exhibits strong generalization properties, as highlighted by its competitive link prediction performance, despite not being trained specifically for this task. Being the first approach to combine both of these desirable properties, NetGAN opens exciting further avenues for research.

1 Introduction
Generative models for graphs have a longstanding history, with applications including data augmentation, anomaly detection and recommendation (Chakrabarti & Faloutsos, 2006). Explicit probabilistic models such as BarabásiAlbert or stochastic blockmodels are the defacto standard in this field (Goldenberg et al., 2010). However, it has also been shown on multiple occasions that our intuitions about structure and behavior of graphs may be misleading. For instance, heavytailed degree distributions in real graphs were in stark disagreement with the models existing at the time of their discovery (Barabási & Albert, 1999). More recent works like Dong et al. (2017) and Broido & Clauset (2018) keep bringing up other surprising characteristics of realworld networks that question the validity of the established models. This leads us to the question: “How do we define a model that captures all the essential (potentially still unknown) properties of real graphs?”
An increasingly popular way to address this issue in other fields is by switching from explicit (prescribed) models to implicit ones. This transition is especially notable in computer vision, where generative adversarial networks (GANs) (Goodfellow et al., 2014) significantly advanced the state of the art over the classic prescribed approaches like mixtures of Gaussians (Blanken et al., 2007). GANs achieve unparalleled results in scenarios such as image and 3D objects generation (e.g., Karras et al., 2017; Berthelot et al., 2017; Wu et al., 2016). However, despite their massive success when dealing with realvalued data, adapting GANs to handle discrete objects like graphs or text remains an open research problem (Goodfellow, 2016). In fact, discreteness is only one of the obstacles when applying GANs to network data. Large repositories of graphs that all come from the same distribution are not available. This means that in a typical setting one has to learn from a single graph. Additionally, any model operating on a graph necessarily has to be permutation invariant, as graphs are isomorphic under node reordering.
In this work we introduce NetGAN – the first implicit generative model for graphs and networks that tackles all of the above challenges. We formulate the problem of learning the graph topology as learning the distribution of biased random walks over the graph. Like in the typical GAN setting, the generator – in our case defined as a stochastic neural network with discrete output samples – learns to generate random walks that are plausible in the real graph, while the discriminator then has to distinguish them from the true ones that are sampled from the original graph. The objective function of our model is based on the Wasserstein GAN (Arjovsky et al., 2017).
The main requirement for a graph generative model is the ability to generate realistic graphs. In the experimental section we compare NetGAN to other established prescribed models in this task. We observe that our proposed method consistently reproduces most known patterns inherent to realworld networks without explicitly specifying any of them in the model definition (e.g., degree distribution, as seen in Fig. 1). However, a model that simply replicates the original graph would also trivially fulfill this requirement, which clearly isn’t our goal. In order to prove that this is not the case we examine the generalization properties of NetGAN by evaluating its link prediction performance. As our experiments show, our model exhibits competitive performance in this task and even achieves stateoftheart scores on some datasets. This result is especially impressive, since NetGAN is not trained explicitly for performing link prediction. To summarize, our main contributions are:

We introduce NetGAN the first of its kind GAN architecture that generates graphs via random walks. Our model tackles the associated challenges of staying permutation invariant, learning from a single graph and generating discrete output.

We show that our method preserves important topological properties, without having to explicitly specifying them in the model definition. Moreover, we demonstrate how latent space interpolation leads to producing graphs with smoothly changing characteristics.

We highlight the generalization properties of NetGAN by its link prediction performance that is competitive with the state of the art on realword datasets, despite the model not being trained explicitly for this task.
2 Related work
So far, no GAN architectures applicable to realworld networks have been proposed. Liu et al. (2017) propose a GAN architecture for learning topological features of subgraphs. Tavakoli et al. (2017) apply GANs to graph data by trying to directly generate adjacency matrices. Because their model produces the entire adjacency matrix – including the zero entries – it requires computations and memory quadratic in the number of nodes. Such quadratic complexity is infeasible in practice, allowing to process only small graphs (with reported runtime of over 60 hours for a graph with only 154 nodes). In contrast, NetGAN operates on the random walks, thus only considering the nonzero entries of the adjacency matrix and efficiently exploiting the sparsity of realworld graphs, and is readily applicable to networks with thousands of nodes.
Deep learning methods for graph data have mostly been studied in the context of node embeddings (Perozzi et al., 2014; Grover & Leskovec, 2016; Kipf & Welling, 2016). The main idea behind these approaches is that of modeling the probabilities of each individual edge’s existence, , as some function of the respective node embeddings, , where is represented by a neural network. The recently proposed GraphGAN (Wang et al., 2017) is another instance of such prescribed edgelevel probabilistic models, where is optimized using the GAN objective instead of the traditional crossentropy. Deep embedding based methods achieve stateoftheart scores in tasks like link prediction and node classification. Nevertheless, as we show in Sec. 3.2, using such approaches for generating entire graphs produces samples that don’t preserve any of patterns inherent to realworld networks.
Prescribed generative models for graphs have a long history and are wellstudied. For a survey we refer the reader to Chakrabarti & Faloutsos (2006) and Goldenberg et al. (2010). Typically, prescribed generative approaches are designed to capture and reproduce some predefined subset of graph properties (e.g., degree distribution, community structure, clustering coefficient). Notable examples include the configuration model (Bender & Canfield, 1978; Molloy & Reed, 1995), degreecorrected stochastic blockmodel (Karrer & Newman, 2011), Exponential Random Graph Models (Holland & Leinhardt, 1981), and the block twolevel ErdősRéniy random graph model (Seshadhri et al., 2012). In Sec. 4 we compare NetGAN with the abovementioned prescribed models in the tasks of graph generation and link prediction.
Due to the challenging nature of the problem, only few approaches able to generate discrete data using GANs exist. Most approaches focus on generating discrete sequences such as text, with some of them using reinforcement learning techniques to enable backpropagation through sampling discrete random variables (Yu et al., 2017; Kusner & HernándezLobato, 2016; Li et al., 2017; Liang et al., 2017). Other approaches modify the GAN objective to tackle the same challenge (Che et al., 2017; Hjelm et al., 2017). Focusing on nonsequential discrete data, Choi et al. (2017) generate highdimensional discrete features (e.g. binary indicators, counts) in patient records. None of these methods have considered graph structured data.
3 Model
In this section we introduce NetGAN  a Generative Adversarial Network model for graph and network data. Its core idea lies in capturing the topology of a graph by learning the distribution over the random walks. Given is an input graph of nodes, defined by a binary adjacency matrix . First, we sample a set of random walks of length from . This collection of random walks serves as a training set for our model. We use the biased secondorder random walk sampling strategy described in Grover & Leskovec (2016), as it better captures both local and global graph structure. An important advantage of using random walks is their invariance under node reordering. Additionally, random walks only include the nonzero entries of , thus efficiently exploiting the sparsity of realworld graphs.
Like any typical GAN architecture, NetGAN consists of two main components  a generator and a discriminator . The goal of the generator is to generate synthetic random walks that are plausible in the input graph. At the same time, the discriminator learns to distinguish the synthetic random walks from the real ones that come from the training set. Both and are trained endtoend using backpropagation. At any point of the training process it is possible to use to generate a set of random walks, which can then be used to produce an adjacency matrix of a new generated graph. In the rest of this section we describe each stage of this process and our design choices in more detail. An overview of our model’s complete architecture can be seen in Fig. 2.
3.1 Architecture
Generator. The generator defines an implicit probabilistic model for generating random walks: . We model as a sequential process based on a neural network parametrized by . At each step , produces two values: the probability distribution over the next node to be sampled, denoted as , and the current memory state of the model, denoted as . The new node (represented as a onehot vector) is sampled from , and together with passed into at the next step . Similarly to the classic GAN setting, a latent code drawn from a multivariate standard normal distribution is passed through a parametric function to initialize . The generative process of is summarized in the box below.
In this work we focus our attention on the Long shortterm memory (LSTM) architecture for , introduced by Hochreiter & Schmidhuber (1997). The memory state of an LSTM is represented by the cell state , and the hidden state . The latent code goes through two separate streams, each consisting of two fully connected layers with activation, and then used to initialize .
A natural question might arise: ”Why use a model with memory and temporal dependencies, when the random walks are Markov processes?” (2nd order Markov for biased RWs). Or put differently, what’s the benefit of using random walks of length greater than 2? In theory, a model with large enough capacity could simply memorize all existing edges in the graph and recreate them. However, for large graphs achieving this in practice is not feasible. More importantly, pure memorization is not the goal of NetGAN, rather we want to have generalization and to generate graphs with similar properties, not exact replicas. Having longer random walks combined with memory helps the model to learn the topology and general patterns in the data (e.g., community structure). Our experiments in Sec. 4.2 confirm this, showing that longer random walks are indeed beneficial.
After each time step, to generate the next node in the random walk, the network has to output the vector of probabilities of length . Operating in such high dimensional space within the LSTM cell leads to an unnecessary computational overhead. To tackle this issue the LSTM produces outputs , with , that are then upprojected to using the matrix . This enables us to efficiently handle graphs with thousands of nodes.
Given the probability distribution over the next node in the random walk, , from which is to be drawn, we are faced with another challenge: Sampling from a categorical distribution is a nondifferentiable operation – thus, it blocks the flow of gradients and precludes backpropagation. We solve this problem by using the StraightThrough Gumbel estimator by Jang et al. (2016). More specifically, we perform the following transformation: First, we let , where is a temperature parameter, and ’s are i.i.d. samples from a Gumbel distribution with zero mean and unit scale. Then, the next sample is chosen as . While the onehot sample is passed as input to the next time step, during the backward pass the gradients will flow through the differentiable . The choice of allows to tradeoff between better flow of gradients (large , more uniform ) and more exact calculations (small , ).
Now that a new node is sampled, it needs to be projected back to a lowerdimensional representation before feeding into the LSTM. This is done by means of downprojection matrix .
Discriminator. The discriminator is based on the standard LSTM architecture. At every time step , a onehot vector , denoting the node at the current position, is fed as input. After processing the entire sequence of nodes, the discriminator outputs a single score that represents the probability of the random walk being real.
3.2 Training
Wasserstein GAN. We train our model based on the Wasserstein GAN (WGAN) framework (Arjovsky et al., 2017), as it prevents mode collapse and leads to more stable training overall. To enforce the Lipschitz constraint of the discriminator, we use the gradient penalty as in Gulrajani et al. (2017). The model parameters are trained using stochastic gradient descent with Adam (Kingma & Ba, 2014). Weights are regularized with an penalty.
Early stopping. Because we are interested in generalizing the input graph, the “trivial” solution where the generator has memorized all existing edges is of no interest to us. This means that we need to control how closely the generated graphs resemble the original one. To achieve this, we propose two possible early stopping strategies, either of which can be used depending on the task at hand. The first strategy, named ValCriterion is concerned with the generalization properties of NetGAN. During training, we keep a sliding window of the random walks generated in the last 1,000 iterations and use them to construct a matrix of transition counts. This matrix is then used to evaluate the link prediction performance on a validation set (i.e. ROC and AP scores, for more details see Sec. 4.2). We stop with training when the validation performance stops improving.
The second strategy, named EOCriterion makes NetGAN very flexible and gives the user control over the graph generation. We stop training when we achieve a user specified edge overlap between the generated graphs (see next section) and the original one at a given iteration. Based on her end task the user can choose to generate graphs with either small or large edge overlap with the original, while maintaining structural similarity. This will lead to generated graphs that either generalize better or are closer replicas respectively, yet still capture the properties of the original.
Graph  Max. degree  Assorta tivity  Triangle count  Power law exponent  Avg. intercom munity density  Avg. intracom munity density  
CoraML  240  0.075  2,814  1.860  4.3e4  1.7e3  
Conf. model  (1% EO)  *  0.030  322  *  1.6e3  2.8e4 
Conf. model  (52% EO)  *  0.051  626  *  9.8e4  9.9e4 
DCSBM  (11% EO)  165  0.052  1,403  1.814  6.7e4  1.2e3 
ERGM  (56% EO)  243  0.077  2,293  1.786  6.9e4  1.2e3 
BTER  (2.2% EO)  199  0.033  3,060  1.787  1.0e3  7.5e4 
VGAE  (0.3% EO)  13  0.009  14  1.674  1.4e3  3.2e4 
NetGAN Val  (39% EO)  199  0.060  1,410  1.773  6.5e4  1.3e3 
NetGAN EO  (52% EO)  233  0.066  1,588  1.793  6.0e4  1.4e3 
3.3 Assembling the adjacency matrix
After finishing the training, we use the generator to construct a score matrix of transition counts, i.e. we count how often an edge appears in the set of generated random walks (typically, using a much larger number of random walks than for early stopping, e.g., 500K). While the raw counts matrix is sufficient for link prediction purposes, we need to convert it to a binary adjacency matrix if we wish to reason about the synthetic graph. First, is symmetrized by setting . Because we cannot explicitly control the starting node of the random walks generated by , some highdegree nodes will likely be overrepresented. Thus, a simple binarization strategy like thresholding or choosing top entries might lead to leaving out the lowdegree nodes and producing singletons. To address this issue, we use the following approach. (i) We ensure that every node has at least one edge by sampling a neighbor with probability . If an edge was already sampled before, we repeat the procedure. (ii) We continue sampling edges without replacement, using for each edge the probability , until we reach the desired amount of edges (e.g., as many edges as in the original graph). Note that this procedure is not guaranteed to produce a fully connected graph.
4 Experiments
In this section we evaluate the quality of the graphs generated by NetGAN by computing various graph statistics. We quantify the generalization power of the proposed model by evaluating its link prediction performance. Furthermore, we demonstrate how we can generate graphs with smoothly changing properties via latent space interpolation.
Datasets. For the experiments we use 5 wellknown citation datasets and the Political Blogs dataset. CoraML is the subset of machine learning papers from the original Cora dataset. For all the experiments we treat the graphs as undirected and only consider the largest connected component (LCC). Information about the datasets is listed in Table 2.
Name  Reference  
CoraML  2,810  7,981  (McCallum et al., 2000) 
Cora  18,800  64,529  (McCallum et al., 2000) 
CiteSeer  2,110  3,757  (Sen et al., 2008) 
Pubmed  19,717  44,324  (Sen et al., 2008) 
DBLP  16,191  51,913  (Pan et al., 2016) 
Pol. Blogs  1,222  16,714  (Adamic & Glance, 2005) 
4.1 Graph generation
Setup. In this task, we fit NetGAN to the CoraML and Citeseer citation networks in order to evaluate quality of the generated graphs. We compare to the following baselines: configuration model (Molloy & Reed, 1995), degreecorrected stochastic blockmodel (DCSBM) (Karrer & Newman, 2011), exponential random graph model (ERGM) (Holland & Leinhardt, 1981) and the block twolevel ErdősRéniy random graph model (BTER) (Seshadhri et al., 2012). Additionally, we use the variational graph autoencoder (VGAE) (Kipf & Welling, 2016) as a representative of network embedding approaches. We randomly hide of the edges (which are used for the stopping criterion; see Sec. 3.2) and fit all the models on the remaining graph. We sample 5 graphs from each of the trained models and report their average statistics in Table 1. Definitions of the statistics, additional metrics, standard deviations and details about the baselines are given in the supplementary material.
Evaluation. The general trend that becomes apparent from the results in Table 1 (and Table 2 in supplementary material) is that prescribed models excel at recovering the statistics that they directly model (e.g., degree sequence for DCSBM). At the same time, these models struggle when dealing with graph properties that they don’t account for (e.g., assortativity for BTER). On the other hand, NetGAN is able to capture all the graph properties well, although none of them are explicitly specified in its model definition. We also see that VGAE is not able to produce realistic graphs. This is expected, since the main purpose of VGAE is learning node embeddings, and not generating entire graphs.
Note, that the surprisingly good performance of ERGM on CoraML is caused by overfitting. When performing link prediction using the same fitted ERGM we get both AUC and AP scores close to 0.5 (worst possible value), which clearly indicates overfitting. In contrast, NetGAN does a good job both at preserving properties in generated graphs, as well as generalizing, as we see in Sec. 4.2.
Is the good performance of NetGAN in this experiment only due to the overlapping edges (existing in the input graph)? To rule out this possibility we perform the following experiment: We take the graph generated by NetGAN, fix the overlapping edges and rewire the rest according to the configuration model. The properties of the resulting graph (row #3 in Table 1) deviate strongly from the input graph. This confirms that NetGAN does not simply memorize some edges and generates the rest at random, but rather captures the underlying structure of the network.
In line with our intuition, we can see that higher EO leads to generated graphs with statistics closer to the original. Figs. 2(b) and 2(c) show how the graph statistics evolve during the training process. Fig. 2(c) shows that the edge overlap smoothly increasing with the number of epochs. We provide similar plots for other statistics and for Citeseer in the supplementary material.
4.2 Link prediction
Setup. Link prediction is a classical task in graph mining, where the goal is to predict existence of unobserved links in a given graph. We use it to evaluate the generalization properties of NetGAN. We hold out 10% of edges from the graph for validation, and 5% as the test set, along with the same amount of randomly selected nonedges. We also ensure that the training network remains connected and does not contain any singletons. We measure the performance with the commonly used metrics area under the ROC curve (AUC) and the average precision score (AP).
To evaluate NetGAN’s link prediction performance, we sample a specific number of random walks (500K/100M) from the trained generator. We use the observed transition counts between any two nodes as a measure of how likely there is an edge between them. We compare with DCSBM, node2vec and VGAE (like in the previous experiment), as well as Adamic/Adar (Adamic & Adar, 2003).
Evaluation. The results are listed in Table 3. There is no overall dominant method, with different methods achieving best results on different datasets. NetGAN shows competitive performance for all datasets, even achieving stateoftheart results for some of them (Citeseer and PolBlogs), despite not being explicitly trained for this task.
Method  CoraML  Cora  Citeseer  DBLP  Pubmed  PolBlogs  
AUC  AP  AUC  AP  AUC  AP  AUC  AP  AUC  AP  AUC  AP  
Adamic/Adar  92.16  85.43  93.00  86.18  88.69  77.82  91.13  82.48  84.98  70.14  85.43  92.16 
DCSBM  96.03  95.15  98.01  97.45  94.77  93.13  97.05  96.57  96.76  95.64  95.46  94.93 
node2vec  92.19  91.76  98.52  98.36  95.29  94.58  96.41  96.36  96.49  95.97  85.10  83.54 
VGAE  95.79  96.30  97.59  97.93  95.11  96.31  96.38  96.93  94.50  96.00  93.73  94.12 
NetGAN (500K)  94.00  92.32  82.31  68.47  95.18  91.93  82.45  70.28  87.39  76.55  95.06  94.61 
NetGAN (100M)  95.19  95.24  84.82  88.04  96.30  96.89  86.61  89.21  93.41  94.59  95.51  94.83 

Interestingly, the NetGAN performance increases when increasing the number of random walks sampled from the generator. This is especially true for the larger networks (Cora, DBLP, Pubmed), since given their size we need more random walks to cover the entire graph. This suggests that for an additional computational cost one can get significant gains in link prediction performance. Note, that while 100M may seem like an large number, the sampling procedure can be trivially parallelized.
Sensitivity analysis. Although NetGAN has many hyperparameters – typical for a GAN model – in practice most of them are not critical for performance.
One important exception is the the random walk length . To choose the optimal value, we evaluate how the link prediction performance on the CoraML dataset changes as we vary . We train the model with different random walk lengths, and then evaluate the scores obtained by sampling 500K random walks. Results averaged over 5 runs are given in Fig. 6. We empirically confirm the choice of a model that generates random walks of length =16 as opposed to just edges (i.e. random walks of length =2). The performance gain for random walk length 20 over 16 is marginal and does not outweigh the additional computational cost; therefore, we use random walks of length 16 for all experiments.
4.3 Latent variable interpolation
Setup. Latent space interpolation is a good way to gain insight into what kind of structure the generator was able to capture. To be able to visualize the properties of the generated graphs we train our model using a 2dimensional noise vector drawn as before from a bivariate standard normal distribution. This corresponds to a 2dimensional latent space . Then, instead of sampling from the entire latent space , we now sample from subregions of and visualize the results. More specifically, we divide into subregions (bins) of equal probability mass using the standard normal cumulative distribution function . For each bin we generate 62.5K random walks. We evaluate properties of both the generated random walks themselves, as well as properties of the resulting graphs (obtained by when sampling a binary adjacency matrix for each bin), visualizing them as heatmaps.
Evaluation. In Fig. 3(a) and 3(b) we see properties of the generated random walks; in Fig. 3(c) and 3(d), we visualize properties of graphs sampled from the random walks in the respective bins. In all four heatmaps, we see distinct patterns, e.g. higher average degree of starting nodes for the bottom right region of Fig. 3(a), or higher degree distribution inequality in the topright area of Fig. 3(c). While Fig. 3(c) and 3(d) show that certain regions of correspond to generated graphs with very different degree distributions, recall that sampling from the entire latent space () yields graphs with degree distribution similar to the original graph (see Fig. 0(c)). The model was trained on CoraML. More heatmaps for other metrics (16 in total) and visualizations for Citeseer can be found in the supplementary material.
This experiment clearly demonstrates that by interpolating in the latent space we can obtain graphs with smoothly changing properties. The smooth transitions in the heatmaps provide evidence that our model learns to map specific parts of the latent space to specific properties of the graph.
We can also see this mapping from latent space to the generated graph properties in the community distribution histograms on a grid in Fig. 5. Marked by (*) and () we see the community distributions for the input graph and the graph obtained by sampling on the complete latent space respectively. In Fig. 4(b) and 4(c), we see the evolution of selected community shares when following a trajectory from top to bottom, and left to right, respectively. The community histograms resulting from sampling random walks from opposing regions of the latent space are very different; again the transitions between these histograms are smooth, as can be seen in the trajectories in Fig. 4(b) and 4(c).
5 Discussion and future work
When evaluating different graph generative models in Sec. 3.2, we observed a major limitation of explicit models. While the prescribed approaches excel at recovering the properties directly included in their definition, they perform significantly worse with respect to the rest. This clearly indicates the need for implicit graph generators such as NetGAN. Indeed, we notice that our model is able to consistently capture all the important graph characteristics (see Table 1). Moreover, NetGAN generalizes beyond the input graph, as can be seen by its strong link prediction performance in Sec. 4.2. Still, being the first model of its kind, NetGAN possesses certain limitations, and a number of related questions could be addressed in followup works:
Scalability. We have observed in Sec. 4.2 that it takes a large number of generated random walks to get representative transition counts for large graphs. While sampling random walks from NetGAN is trivially parallelizable, a possible extension of our model is to use a conditional generator, i.e. the generator can be provided a desired starting node, thus ensuring a more even coverage. On the other hand, the sampling procedure itself can be sped up by incorporating a hierarchical softmax output layer  a method commonly used in natural language processing.
Evaluation. It is nearly impossible to judge whether a graph is realistic by visually inspecting it (unlike images, for example). In this work we already quantitatively evaluate the performance of NetGAN on a large number of standard graph statistics. However, developing new measures applicable to (implicit) graph generative models will deepen our understanding of their behavior.
Experimental scope. In the current work we focus on the setting of a single large graph. Adaptation to other scenarios, such as a collection of smaller i.i.d. graphs, that frequently occur in other fields (e.g., chemistry, biology), would be an important extension of our model. Studying the influence of the graph topology (e.g., sparsity, diameter) on NetGAN’s performance will shed more light on the model’s properties.
Other types of graphs. While plain graphs are ubiquitous, many of important applications deal with attributed, kpartite or heterogeneous networks. Adapting the NetGAN model to handle these other modalities of the data is a promising direction for future research. Especially important would be an adaptation to the dynamic / inductive setting, when new nodes are added over time.
6 Conclusion
In this work we introduce NetGAN an implicit generative model for network data. NetGAN is able to generate graphs that capture important topological properties of complex networks, such as community structure and degree distribution, without having to manually specify any of them. Moreover, our proposed model shows strong generalization properties, as highlighted by its competitive link prediction performance on a number of datasets. NetGAN can also be used for generating graphs with continuously varying characteristics using latent space interpolation. Combined our results provide strong evidence that implicit generative models for graphs are wellsuited for capturing the complex nature of realworld networks.
Appendix A Graph statistics
Metric name  Computation  Description 
Maximum degree  Maximum degree of all nodes in a graph.  
Assortativity  Pearson correlation of degrees of connected nodes, where the pairs are the degrees of connected nodes.  
Triangle count  Number of triangles in the graph, where denotes that and are connected.  
Power law exponent  Exponent of the power law distribution, where denotes the minimum degree in a network.  
Intercommunity density  Fraction of possible intercommunity edges present in graph.  
Intracommunity density  Fraction of possible intracommunity edges present in graph.  
Wedge count  Number of wedges (2stars), i.e. twohop paths in an undirected graph.  
Rel. edge distr. entropy  Entropy of degree distribution, 1 means uniform, 0 means a single node is connected to all others.  
LCC  Size of largest connected component, where are all connected components of the graph.  
Claw count  Number of claws (3stars)  
Gini coefficient  Common measure for inequality in a distribution, where is the sorted list of degrees in the graph.  
Community distribution  Share of in and outgoing edges of community , normalized by the number of edges in the graph. 
Appendix C Properties of generated graphs
Graph 







Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  
Citeseer  77  0.022  451  2.239  
NetGAN (42% EO)  54  4.2  0.082  0.009  316  11.2  2.154  0.003  
NetGAN (76% EO)  63  4.3  0.054  0.006  227  13.3  2.204  0.003  
DCSBM (6.6% EO)  53  5.6  0.022  0.018  257  30.9  2.066  0.014  
Conf. model  *  *  0.017  0.006  20  6.50  *  *  
Conf. model (42% EO)  *  *  0.020  0.009  54  8.8  *  *  
Conf. model (76% EO)  *  *  0.024  0.006  207  11.8  *  *  
ERGM (27% EO)  66  1  0.052  0.005  415.6  8  2.0  0.01  
BTER (2% EO)  70  7.2  0.065  0.014  449  33  2.049  0.01  
VGAE (0.2% EO)  9.2  0.7  0.057  0.016  2  1  2.039  0.00  
CoraML  240  0.075  2,814  1.86  
NetGAN (39% EO)  199  6.7  0.060  0.004  1,410  30  1.773  0.002  
NetGAN (52% EO)  233  3.6  0.066  0.003  1,588  59  1.793  0.003  
DCSBM (11% EO)  165  9.0  0.052  0.004  1,403  67  1.814  0.008  
Conf. model  *  *  0.030  0.003  322  31  *  *  
Conf. model (39% EO)  *  *  0.050  0.005  420  14  *  *  
Conf. model (52% EO)  *  *  0.051  0.002  626  19  *  *  
ERGM (56% EO)  243  1.94  0.077  0.000  2,293  23  1.786  0.003  
BTER (2% EO)  199  13  0.033  0.008  3060  114  1.787  0.004  
VGAE (0.3% EO)  13.1  1  0.010  0.014  14  3  1.674  0.001 
Graph 







Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  Avg.  Std.  
Citeseer  16,824  0.959  2,110  125,701  0.404  1  
NetGAN (42% EO)  12,998  84.6  0.969  0.000  2,079  12.6  57,654  4,226  0.354  0.001  0.42  0.006  
NetGAN (76% EO)  15,202  378  0.963  0.000  2,053  23  94,149  11,926  0.385  0.002  0.76  0.01  
DCSBM (6.6% EO)  15,531  592  0.938  0.001  1,697  27  69,818  11,969  0.502  0.005  0.066  0.011  
Conf. model  *  *  0.955  0.001  2,011  6.8  *  *  *  *  0.008  0.001  
Conf. model (42% EO)  *  *  0.956  0.001  2,045  12.5  *  *  *  *  0.42  0.002  
Conf. model (76% EO)  *  *  0.957  0.001  2,065  10.2  *  *  *  *  0.76  0.0  
ERGM (27% EO)  16,346  101  0.945  0.001  1,753  15  80,510  1,337  0.474  0.003  0.27  0.01  
BTER (2% EO)  18,193  661  0.940  0.001  1,708  14  113,425  19,737  0.491  0.007  0.02  0.002  
VGAE (0.2% EO)  8,141  47  0.986  0.000  2,110  0  6,611  144  0.256  0.003  0.002  0.001  
CoraML  101,872  0.941  2,810  0.482  1  
NetGAN (39% EO)  75,724  1,401  0.959  0.000  2,809  1.6  141,795  0.398  0.002  0.39  0.004  
NetGAN (52% EO)  86,763  1,096  0.954  0.001  2,807  1.6  103,667  0.42  0.003  0.52  0.001  
DCSBM (11% EO)  73,921  3,436  0.934  0.001  2,474  18.9  170,045  0.523  0.003  0.11  0.003  
Conf. model  *  *  0.928  0.002  2,785  4.9  *  *  *  *  0.013  0.001  
Conf. model (39% EO)  *  *  0.931  0.002  2,793  2.0  *  *  *  *  0.39  0.0  
Conf. model (52% EO)  *  *  0.933  0.001  2,793  6.0  *  *  *  *  0.52  0.0  
ERGM (56% EO)  98,615  385  0.932  0.001  2,489  11  57,092  0.517  0.002  0.56  0.014  
BTER (2% EO)  91,813  3,546  0.935  0.000  2,439  19  280,945  0.515  0.003  0.02  0.001  
VGAE (0.3% EO)  31,290  178  0.990  0.000  2,810  0  46,586  937  0.223  0.003  0.003  0.001 






Appendix E Latent space interpolation heatmaps
of start node
(input graph: 0.48)
(input graph: 240)
(input graph: 0.075)
(input graph: )
py (input graph: 0.94)
(input graph: 2,810)
overlap
(input graph: 1.86)
link prediction
link prediction
in single community
(input graph: 2,814)
(input graph: 101,872)
of start node 

(input graph: 0.404) 
(input graph: 77) 
(input graph: 0.022) 
(input graph: 125,701) 
py (input graph: 0.96) 
(input graph: 2,110) 
overlap 
(input graph: 2.239) 
link prediction 
link prediction 
in single community 

(input graph: 451) 
(input graph: 16,824) 
Appendix F Latent space interpolation community histrograms – Citeseer
References
 Adamic, Lada A and Adar, Eytan. Friends and neighbors on the web. Social networks, 25(3):211–230, 2003.
 Adamic, Lada A and Glance, Natalie. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pp. 36–43, 2005.
 Arjovsky, Martin, Chintala, Soumith, and Bottou, Léon. Wasserstein GAN. arXiv preprint arXiv:1701.07875, 2017.
 Barabási, AlbertLászló and Albert, Réka. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
 Bender, Edward A and Canfield, E Rodney. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, 24(3):296–307, 1978.
 Berthelot, David, Schumm, Tom, and Metz, Luke. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017.
 Blanken, Henk M, de Vries, Arjen P, Blok, Henk Ernst, and Feng, Ling. Multimedia retrieval. Springer, 2007.
 Broido, Anna D and Clauset, Aaron. Scalefree networks are rare. arXiv preprint arXiv:1801.03400, 2018.
 Chakrabarti, Deepayan and Faloutsos, Christos. Graph mining: Laws, generators, and algorithms. ACM computing surveys (CSUR), 38(1):2, 2006.
 Che, Tong, Li, Yanran, Zhang, Ruixiang, Hjelm, R Devon, Li, Wenjie, Song, Yangqiu, and Bengio, Yoshua. Maximumlikelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983, 2017.
 Choi, Edward, Biswal, Siddharth, Malin, Bradley, Duke, Jon, Stewart, Walter F, and Sun, Jimeng. Generating multilabel discrete electronic health records using generative adversarial networks. arXiv preprint arXiv:1703.06490, 2017.
 Dong, Yuxiao, Johnson, Reid A, Xu, Jian, and Chawla, Nitesh V. Structural diversity and homophily: A study across more than one hundred big networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 807–816. ACM, 2017.
 Goldenberg, Anna, Zheng, Alice X, Fienberg, Stephen E, Airoldi, Edoardo M, et al. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
 Goodfellow, Ian. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
 Goodfellow, Ian, PougetAbadie, Jean, Mirza, Mehdi, Xu, Bing, WardeFarley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
 Grover, Aditya and Leskovec, Jure. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. ACM, 2016.
 Gulrajani, Ishaan, Ahmed, Faruk, Arjovsky, Martin, Dumoulin, Vincent, and Courville, Aaron. Improved training of Wasserstein GANs. arXiv preprint arXiv:1704.00028, 2017.
 Handcock, Mark S., Hunter, David R., Butts, Carter T., Goodreau, Steven M., Krivitsky, Pavel N., and Morris, Martina. ergm: Fit, Simulate and Diagnose ExponentialFamily Models for Networks. The Statnet Project (http://www.statnet.org), 2017. URL https://CRAN.Rproject.org/package=ergm. R package version 3.8.0.
 Hjelm, R Devon, Jacob, Athul Paul, Che, Tong, Cho, Kyunghyun, and Bengio, Yoshua. Boundaryseeking generative adversarial networks. arXiv preprint arXiv:1702.08431, 2017.
 Hochreiter, Sepp and Schmidhuber, Jürgen. Long shortterm memory. Neural Computation, 9(8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735. URL http://dx.doi.org/10.1162/neco.1997.9.8.1735.
 Holland, Paul W and Leinhardt, Samuel. An exponential family of probability distributions for directed graphs. Journal of the american Statistical association, 76(373):33–50, 1981.
 Jang, Eric, Gu, Shixiang, and Poole, Ben. Categorical reparameterization with Gumbelsoftmax. arXiv preprint arXiv:1611.01144, 2016.
 Karras, Tero, Aila, Timo, Laine, Samuli, and Lehtinen, Jaakko. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
 Karrer, Brian and Newman, Mark EJ. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):016107, 2011.
 Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Kipf, Thomas N and Welling, Max. Variational graph autoencoders. arXiv preprint arXiv:1611.07308, 2016.
 Kusner, Matt J and HernándezLobato, José Miguel. GANs for sequences of discrete elements with the Gumbelsoftmax distribution. arXiv preprint arXiv:1611.04051, 2016.
 Li, Jiwei, Monroe, Will, Shi, Tianlin, Ritter, Alan, and Jurafsky, Dan. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547, 2017.
 Liang, Xiaodan, Hu, Zhiting, Zhang, Hao, Gan, Chuang, and Xing, Eric P. Recurrent topictransition GAN for visual paragraph generation. arXiv preprint arXiv:1703.07022, 2017.
 Liu, Weiyi, Chen, PinYu, Cooper, Hal, Oh, Min Hwan, Yeung, Sailung, and Suzumura, Toyotaro. Can GAN learn topological features of a graph? arXiv preprint arXiv:1707.06197, 2017.
 McCallum, Andrew Kachites, Nigam, Kamal, Rennie, Jason, and Seymore, Kristie. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127–163, 2000.
 Molloy, Michael and Reed, Bruce. A critical point for random graphs with a given degree sequence. Random structures & algorithms, 6(23):161–180, 1995.
 Pan, Shirui, Wu, Jia, Zhu, Xingquan, Zhang, Chengqi, and Wang, Yang. Triparty deep network representation. Network, 11(9):12, 2016.
 Peixoto, Tiago P. The graphtool python library. figshare, 2014. doi: 10.6084/m9.figshare.1164194. URL http://figshare.com/articles/graph_tool/1164194.
 Perozzi, Bryan, AlRfou, Rami, and Skiena, Steven. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. ACM, 2014.
 Sen, Prithviraj, Namata, Galileo, Bilgic, Mustafa, Getoor, Lise, Galligher, Brian, and EliassiRad, Tina. Collective classification in network data. AI magazine, 29(3):93, 2008.
 Seshadhri, Comandur, Kolda, Tamara G, and Pinar, Ali. Community structure and scalefree collections of ErdősRényi graphs. Physical Review E, 85(5):056109, 2012.
 Tavakoli, Sahar, Hajibagheri, Alireza, and Sukthankar, Gita. Learning social graph topologies using generative adversarial neural networks. 2017.
 Wang, Hongwei, Wang, Jia, Wang, Jialin, Zhao, Miao, Zhang, Weinan, Zhang, Fuzheng, Xie, Xing, and Guo, Minyi. GraphGAN: Graph representation learning with generative adversarial nets. arXiv preprint arXiv:1711.08267, 2017.
 Wu, Jiajun, Zhang, Chengkai, Xue, Tianfan, Freeman, Bill, and Tenenbaum, Josh. Learning a probabilistic latent space of object shapes via 3d generativeadversarial modeling. In Advances in Neural Information Processing Systems, pp. 82–90, 2016.
 Yu, Lantao, Zhang, Weinan, Wang, Jun, and Yu, Yong. SeqGAN: Sequence generative adversarial nets with policy gradient. In AAAI, pp. 2852–2858, 2017.