# Adversarial Network Embedding

## Abstract

Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other high-order proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks.

## 1Introduction

Graph is a natural way of organizing data objects with complicated relationships, and encodes rich information of nodes in the graph. For example, paper citation networks capture the information of innovation flow, and can reflect topic relatedness between papers. To analyze graphs, an efficient and effective way is to learn low-dimensional representations for nodes in the graph, i.e., node embedding [25]. The learned representations should encode meaningful semantic, relational and structural information, so that they can be used as features for downstream tasks such as network visualization, link prediction and node classification. Network embedding is a challenging research problem because of the high-dimensionality, sparsity and non-linearity of the graph data.

In recent years, many methods for network embedding have been proposed, such as DeepWalk [25], LINE [30] and node2vec [9]. They aim to capture various connectivity patterns in network during representation learning. These patterns include relations of local neighborhood connectivity, first and second order proximities, global structural role similarities (i.e. structural equivalence), and other high-order proximities. As demonstrated in the literature, network embedding methods were shown to be more effective in many network analysis tasks than some classical approaches, such as Common Neighbors [19] and Spectral Clustering [28].

Though existing methods are effective in structure preserving with different carefully designed objectives, they suffer from lack of additional constraints for enhancing the robustness of the learned representations. When processing noisy network data, which is very common in real-world applications, these unsupervised network embedding techniques can easily result in poor representations. Thus, it is critical to consider some amount of uncertainty in the process of representation learning. One famous technique for robust representation learning in unsupervised manner is denoising autoencoder [33]. It obtains stable and robust representations by recovering clean input from the corrupted one, that is denoising criterion. In [3], the authors have applied this criterion for network embedding. Recently, many generative adversarial models [26] have also been proposed for learning robust and reusable representations. They have been shown to be effective in learning representations for image [26] and text data [7]. However, none of such models have been specially designed for dealing with graph data.

In this paper, we propose a novel approach called Adversarial Network Embedding (ANE) for learning robust network representations by leveraging the principle of adversarial learning [8]. In addition to optimize the objective for preserving structure, a process of adversarial learning is introduced for modeling the data uncertainty. Figure ? presents an illustrative example with the well-known Zachary’s Karate network on the effect of adversarial learning. By comparing representations of two schemes without/with adversarial learning regularization, it can be easily found that the latter scheme obtains more meaningful and robust representations.

More specifically, ANE naturally combines a structure preserving component and an adversarial learning component in a unified framework. The former component can help capture network structural properties, while the latter contributes to the learning of more robust representations through adversarial training with samples from some prior distribution. For structure preserving, we propose an inductive variant of DeepWalk that is suitable for our ANE framework. It maintains random walk for exploring neighborhoods of nodes and optimizes similar objective function, but employs parameterized function to generate embedding vectors. Besides, the adversarial learning component consists of two parts, i.e., a generator and a discriminator. It is acting as a regularizer for learning stable and robust feature extractor, which is achieved by imposing a prior distribution on the embedding vectors through adversarial training. To the best of our knowledge, this is the first work to design network embedding model with the adversarial learning principle. We empirically evaluate the proposed ANE approach through network visualization and node classification on benchmark datasets. The qualitative and quantitative results prove the effectiveness of our method.

## 2Related Work

### 2.1Network Embedding Methods

In recent years, many unsupervised network embedding methods have been proposed, which can be divided into three groups according to the techniques they use, i.e., probabilistic methods, matrix factorization based methods and autoencoder based methods. The probabilistic methods include DeepWalk [25], LINE [30], node2vec [9] and so on. DeepWalk firstly obtains node sequences from the original graph through random walk, and then learns the latent representations using Skip-gram model [22] by regarding node sequences as word sentences. LINE tries to preserve first-order and second-order proximities in two separate objective functions, and then directly concatenates the representations. In [9], the authors proposed to use biased random walk to determine neighboring structure, which can strike a balance between homophily and structural equivalence. It is actually a variant of DeepWalk.

Matrix factorization based methods first preprocess the adjacency matrix to capture different kinds of high-order proximities and then decompose the processed matrix to obtain graph embeddings. For example, GraRep [2] employs positive pointwise mutual information (PPMI) matrix as the preprocessing based on a proof of the equivalence between a -step random walk in DeepWalk and a -step probability transition matrix. HOPE [24] preprocesses the adjacency matrix of the directed graph with high-order proximity measurements, such as Katz Index [13], which can help capture asymmetric transitivity property. M-NMF [34] learns embeddings that can well capture community structure by building upon the modularity based community detection model [23].

Autoencoder is a widely used model for learning compact representations of high-dimensional data, which aims to preserve as much information in the latent space as possible for the reconstruction of the original data [11]. DNGR [3] firstly calculates the PPMI matrix, and then learns the representations through stacked denosing autoencoder. SDNE [35] is a variant of stacked autoencoder which adds a constraint in the loss function to force the connected nodes to have similar embedding vectors. In [15], the authors proposed a variational graph autoencoder (VGAE) by using a graph convolutional network [16] encoder for capturing network structural properties. Compared to variational autoencoder (VAE) [14], our ANE approach explicitly regularizes the posterior distribution of the latent space while VAE only assumes a prior distribution.

### 2.2Generative Adversarial Networks

Generative Adversarial Networks (GANs) [8] are deep generative models, of which the framework consists of two components, i.e., a generator and a discriminator. GANs can be formulated as a minimax adversarial game, where the generator aims to map data samples from some prior distribution to data space, while the discriminator tries to tell fake samples from real data. This framework is not directly suitable for unsupervised representation learning, due to the lack of explicit structure for inference.

There are three possible solutions for this problem as demonstrated by existing works. Firstly, some works managed to integrate some structures into the framework to do inference, i.e., projecting sample in data space back into the space of latent features, such as BiGAN [4], ALI [5] and EBGAN [38]. These methods can learn robust representations in many applications, such as image classification [4] and document retrieval [7]. The second approach is to generate representations from the hidden layer of the discriminator, like DCGANs [26]. By employing fractionally-strided convolutional layers, DCGANs can learn expressive image representations from both the generator and discriminator networks for supervised tasks. The third idea is to use adversarial learning process to regularize the representations. One successful practice is the Adversarial Autoencoders [20], which can learn powerful representations from unlabeled data without any supervision.

## 3Adversarial Network Embedding

In this section, we will first introduce the problem definition and notations to be used. Then, we will present an overview of the proposed adversarial network embedding framework, followed by detailed descriptions of each component.

### 3.1Problem Definition and Notations

Network embedding is aimed at learning meaningful representations for nodes in information network. An information network can be denoted as , where is the node set, is a set of edges with each representing the relationship between a pair of nodes, and is a weighted adjacency matrix with its entries quantifying the strength of the corresponding relations. Particularly, the value of each entry in is either 0 or 1 in an unweighted graph specifying whether an edge exists between two nodes. Given an information network , network embedding is doing a mapping from nodes to low-dimensional vectors with the formal format as follows: , where is the th row of (, ) and is the dimension of representations. We call representation matrix. These representations should encode structural information of networks.

### 3.2An Overview of the Framework

In this work, we leverage adversarial learning principle to help learn stable and robust representations. Figure 1 shows the proposed framework of *Adversarial Network Embedding* (ANE), which mainly consists of two components, i.e., a structure preserving component and an adversarial learning component. Specifically, the structure preserving component is dedicated to encoding network structural information into the representations. These information include the local neighborhood connectivity patterns, global structural role similarities, and other high-order proximities. There are many possible alternatives for the implementation of this component. Actually, existing methods [25] can be considered as structure preserving models, but without any constraints to help enhance the robustness of the representations. In this paper, we propose an inductive DeepWalk for structure preserving. It maintains random walk for exploring neighborhoods of nodes and optimizes similar objective function, but employs parameterized function to generate embedding vectors. In the training process, parameters of are directly updated instead of the embedding vectors. Besides, the adversarial learning component consists of two parts, i.e., a generator and a discriminator . It is acting as a regularizer for learning stable and robust feature extractor, which is achieved by imposing a prior distribution on the embedding vectors through adversarial training. It needs to emphasize that the parameterized function is shared by both the structure preserving component and the adversarial learning component. These two components will update the parameters of alternatively in the training process.

### 3.3Graph Preprocessing

In real-world applications, information network is usually extremely sparse, which may result in serious over-fitting problem when training deep models. To help alleviate the sparsity problem, one commonly used method is to preprocess the adjacency matrix with high-order proximities [30]. In this paper, we employ the shifted PPMI matrix [17] as input features for the generator^{1}

where can capture different high-order proximities, is the -step probability transition matrix obtained from the weighted adjacency matrix after a row-wise normalization, and is set to in this paper. Row vector in is the feature vector characterizing the context information of node in the graph , but with high-dimension.

### 3.4Structure Preserving Model

Ideally, existing unsupervised network embedding methods can be utilized as structure preserving component in our framework for encoding node dependencies into representations. However, many of them are transductive methods with an embedding lookup as embedding generator such as DeepWalk and LINE, which are not directly suitable for the generator of the adversarial learning component since we utilize parameterized generator as standard GANs. With parameterized generator, our framework can well deal with networks with node attributes and explore nonlinear properties of network with deep learning models. In this work, we design an inductive variant of DeepWalk that is applicable for both weighted and unweighted graphs. Theoretically, it can also generalize to unseen nodes for networks with node attributes as some inductive methods do [36], but we do not explore it in this paper. Besides, we also investigate to use denoising autoencoder [33] as the structure preserving component.

#### Inductive DeepWalk (IDW)

The IDW model uses random walk to sample node sequences as that in DeepWalk. Starting from each node , sequences are randomly sampled with the length as . In every step, a new node is randomly selected from the neighbors of the current node with the probability proportional to the corresponding weight in matrix . To improve efficiency, the alias table method [18] is employed to sample node from the candidate node set in every sampling step. It only takes time in a single sampling step. Then, positive node pairs can be constructed from node sequences. For every node sequence , we determine the positive target-context pairs as the set , where is the th node in sequence and denotes the context size.

Similar to Skip-gram [22], a node has two different representations, i.e., a target representation and a context representation , which are generated by the target generator and context generator , respectively. The generators are parameterized functions which are implemented with neural networks in this work. Given row vector in corresponding to node , we have and . To capture network structural properties, we define the following objective function for each positive target-context pair with negative sampling approach:

where is the sigmoid function, is the number of negative samples for each positive pair, is the noise distribution for sampling negative context nodes, and are parameters to be learnt. As suggested in [22], can achieve quite good performance in practice, where is the degree of node .

### 3.5Adversarial Learning

The adversarial learning component is employed to regularize the representations. It consists of a generator and a discriminator . Specifically, represents a non-linear transformation of input high-dimensional features to embedding vectors. represents the probability of a sample coming from real data. The generator function is shared with the structure preserving component. Different from GANs [8], in our framework, a prior distribution is selected as the data distribution for generating real data, while the embedding vectors are regarded as fake samples. In the training process, the discriminator is trained to tell apart the prior samples from the embedding vectors, while the generator is aimed to fit embedding vectors to the prior distribution. This process can be considered as a two-player minimax game with the generator and discriminator playing against each other. The utility function of the discriminator is:

In order to camouflage its output as prior samples, the generator is trained to improve the following payoff:

We argue that the adversarial learning component can help improve the learned representations in terms of robustness and structural meanings. We instantiate our framework with two structure preserving models, i.e., inductive DeepWalk (IDW) and denoising autoencoder (DAE). We call the ANE framework with IDW as Adversarial Inductive DeepWalk (AIDW) for easy illustration. Actually, with DAE as the structure preserving component, the ANE framework will become an adversarial autoencoder [20], and we represent it as ADAE to highlight the importance of the denoising criterion in learning representations.

During adversarial learning, it is also important to choose a proper prior distribution. Like many practices in GANs research [26], the prior distribution is usually defined as Uniform or Gaussian noise which enables GANs to learn meaningful and robust representations against uncertainty. In our experiments, we also considered the ANE framework with both kinds of prior distributions but find no significant difference. One possible reason is both kinds of uncertainty can help the ANE framework to achieve a certain level of robustness against noise. It is likely that a careful choice of prior distribution, possibly guided by prior domain knowledge, may further improve application-specific performance.

### 3.6Algorithm

To implement the ANE approach, we consider a joint training procedure with two phases, including a structure preserving phase and an adversarial learning phase. In the structure preserving phase, we optimize objective function (Equation 1) for AIDW. In the adversarial learning phase, a prior distribution is imposed on representations through a minimax optimization problem. Firstly, the discriminator is trained to distinguish between prior samples and embedding vectors. Then, the parameters of the generator are updated to fit the embedding vectors to prior space to fool the discriminator. Besides, some tricks proposed in [1] can be employed to help improve the stability of learning and avoid the mode collapse problem in traditional GANs training.

## 4Experiments

### 4.1Experiment Settings

#### Datasets

We conduct experiments on four real-world datasets with the statistics presented in Table ?, where denotes the label set. Cora and Citeseer are paper citation networks constructed by [21]. Wiki [27] is a network with nodes as web pages and edges as the hyperlinks between web pages. We regard these three networks as undirected networks, and do some preprocessing on the original datasets by deleting self-loops and nodes with zero degree. Cit-DBLP is a paper citation network extracted from DBLP dataset [29].

Dataset | |||

Cora | 2,708 | 5,278 | 7 |

Citeseer | 3,264 | 4,551 | 6 |

Wiki | 2,363 | 11,596 | 17 |

Cit-DBLP | 5,318 | 28,085 | 3 |

#### Baselines

We compare our model with several baseline methods, including DeepWalk, LINE, GraRep and node2vec. There are many other network embedding methods, but we do not consider them here, because their performances are inferior to these baseline models as shown in corresponding papers. The descriptions of the baselines are as follows.

DeepWalk

[25]: DeepWalk first transforms the network into node sequences by truncated random walk, and then uses it as input to the Skip-gram model to learn representations.

LINE

[30]: LINE can preserve both first-order and second-order proximities for undirected graph through modeling node co-occurrence probability and node conditional probability.

GraRep

[2]: GraRep preserves node proximities by constructing different -step probability transition matrices.

node2vec

[9]: node2vec develops a biased random walk procedure to explore neighborhood of a node, which can strike a balance between local properties and global properties of a network.

Besides, we consider inductive DeepWalk and denoising autoencoder as another two baseline methods. Note that both of them employ shifted PPMI matrix as preprocessing.

#### Parameter Settings

For LINE, we follow the settings of parameters in [30]. The embedding vectors are normalized by L2-norm. Besides, we specially preprocess the original sparse networks by adding two-hop neighbors to low degree nodes. For both DeepWalk and node2vec, the window size , the walk length and the number of walks per node are set to 10, 80 and 10, respectively, for fair comparison. For GraRep, the maximum matrix transition step is set to 4, and the settings of other parameters follow those in [2]. Note that the dimension of representations for all methods are set to 128 for fair comparison.

For our methods, we only use the most simple structure for the generator. Specifically, the generator is a single-layer network with leaky ReLU activations (with a leak of 0.2) and batch normalization [12] on the output. The shifted PPMI matrix is obtained by setting as 4 for Cora and Citeseer, and 3 for Wiki. For inductive DeepWalk, the number of negative samples is set to 5, and other parameters are set the same as DeepWalk. For denoising autoencoder, it has only one hidden layer with dimension as 128. For the discriminator of the framework, it is a three-layer neural networks, with the layer structure as 512-512-1. For the first two layers, we use leaky ReLU activations (with leak of 0.2) and batch normalization. For the output layer, we use sigmoid activation. For AIDW and ADAE, the settings of the structure preserving component are the same as those of IDW and DAE, respectively. The prior distribution of the adversarial learning component is set to . We use RMSProp optimizer with learning rate as 0.001.

### 4.2Network Visualization

%Labeled Nodes | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |

DeepWalk | 71.43 | 73.83 | 75.61 | 76.92 | 77.79 | 77.78 | 78.47 | 79.17 | 79.04 |

LINE | 71.26 | 74.50 | 76.04 | 76.81 | 77.68 | 77.99 | 78.46 | 79.00 | 79.11 |

GraRep | 74.78 | 76.78 | 78.56 | 78.99 | 79.39 | 79.85 | 79.96 | 80.94 | 81.29 |

node2vec | 75.06 | 78.49 | 80.06 | 80.94 | 81.52 | 82.07 | 82.39 | 83.28 | 83.17 |

DAE | 75.21 | 78.07 | 79.39 | 80.51 | 80.91 | 81.41 | 82.36 | 83.23 | 83.32 |

ADAE | 75.01 | 77.45 | 79.65 | 80.96 | 81.64 | 82.11 | 82.62 | 83.52 | 84.10 |

IDW | 66.32 | 72.21 | 75.23 | 76.65 | 77.66 | 78.32 | 79.20 | 79.93 | 80.63 |

AIDW | 76.93 |
79.50 |
81.31 |
82.01 |
82.28 |
83.03 |
83.23 |
84.46 |
84.21 |

%Labeled Nodes | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |

DeepWalk | 49.45 | 52.68 | 54.60 | 55.71 | 56.44 | 57.04 | 57.42 | 58.04 | 59.11 |

LINE | 47.70 | 51.04 | 52.95 | 54.19 | 55.00 | 55.82 | 56.02 | 57.08 | 57.52 |

GraRep | 51.62 | 53.29 | 53.59 | 53.83 | 54.55 | 54.62 | 54.97 | 54.90 | 56.27 |

node2vec | 52.50 | 55.47 | 56.66 | 57.70 | 58.81 | 59.26 | 60.10 | 60.34 | 60.58 |

DAE | 51.08 | 54.54 | 55.84 | 56.50 | 57.47 | 58.30 | 58.50 | 59.19 | 60.15 |

ADAE | 52.40 | 55.58 | 56.64 | 57.32 | 58.34 | 59.60 | 60.27 | 60.63 | 61.31 |

IDW | 45.45 | 50.47 | 52.32 | 53.38 | 54.75 | 55.18 | 55.98 | 56.23 | 57.19 |

AIDW | 53.25 |
56.76 |
57.95 |
59.06 |
59.45 |
59.95 |
60.28 |
60.87 |
62.26 |

%Labeled Nodes | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |

DeepWalk | 57.23 | 61.22 | 63.53 | 64.68 | 65.96 | 66.24 | 67.17 | 68.63 | 68.69 |

LINE | 56.29 | 61.63 | 63.98 | 65.43 | 66.25 | 67.04 | 67.94 | 68.86 | 68.61 |

GraRep | 57.68 |
61.14 | 62.73 | 63.89 | 64.86 | 65.55 | 66.01 | 67.55 | 68.02 |

node2vec | 57.61 | 61.52 | 63.47 | 64.83 | 65.54 | 66.16 | 67.17 | 68.44 | 68.69 |

DAE | 57.08 | 61.63 | 63.71 | 65.20 | 66.84 | 67.41 | 67.91 | 69.03 | 69.45 |

ADAE | 57.24 | 61.67 | 63.85 | 65.34 | 66.67 | 67.11 | 67.79 | 69.68 | 70.59 |

IDW | 56.01 | 60.77 | 63.08 | 64.37 | 65.66 | 66.47 | 67.15 | 67.86 | 68.52 |

AIDW | 57.43 | 62.14 |
64.18 |
65.53 |
67.07 |
68.00 |
69.44 |
71.63 |
72.03 |

Network visualization is an indispensable way to analyze high-dimensional graph data, which can help reveal intrinsic structure of the data intuitively [31]. In this section, we visualize the representations of nodes generated by several different models using *t-SNE* [32]. We construct a paper citation network, namely Cit-DBLP, from DBLP with papers from three different publication divisions, including Information Sciences, ACM Transactions on Graphics and Human-Computer Interaction. Some statistics of this dataset have been presented in Table ?. These papers are naturally classified into three categories based on the research fields they belong to.

Figure ? shows the visualization of embedding vectors obtained from different models using *t-SNE* tookit under the same parameter configuration. For both DeepWalk and LINE, papers from different categories are mixed with each other in the center of the figure. For LINE, there are 6 clusters with each category corresponding to two separate clusters, which is in conflict with the true structure of the network. Besides, the boundaries between different clusters are not clear. The visualizations of node2vec and IDW form three main clusters, which are better than those of DeepWalk and LINE. However, the boundary between blue cluster and green cluster for node2vec is not clear, while that of red cluster and green cluster is a little messy for IDW. AIDW performs better compared with baseline methods. We can observe that the visualization of AIDW has three clusters with quite large margin between each other. Furthermore, each cluster is linearly separable with another cluster in the figure, which can not be achieved by other baselines as showed by the figures. Intuitively, this experiment demonstrates that adversarial learning regularization can help learn more meaningful and robust representations.

### 4.3Node Classification

The label information can indicate interests, beliefs or other characteristics of nodes, which can help facilitate many applications, such as friend recommendation in online social networks and targeted advertising. However, in many real-world contexts, only a subset of nodes are labeled. Thus, node classification can be conducted to dig out information of unlabeled nodes. In this section, we conduct multi-class classification on three benchmark datasets, i.e., Cora, Citeseer and Wiki. We range the training ratio from 10% to 90% for comprehensive evaluation. All experiments are carried out with support vector classifier in Liblinear package^{2}

Results and discussion.

To ensure the reliability, we obtain the experimental results by taking an average of that of 10 runs, which are shown in Tables ?, ? and ?. We have the following observations:

IDW produces similar results on Cora and Wiki with DeepWalk, and slightly inferior performance on Citeseer. The proposed model AIDW is built upon IDW with additional adversarial learning component. It consistently outperforms both IDW and DeepWalk on three datasets across all training ratios. For example, on Cora, AIDW gives more than 4% gain in accuracy over DeepWalk under all training ratio settings. It demonstrates that adversarial learning regularization can significantly improve the robustness and discrimination of the learned representations. The quantitative results also verify our previous qualitative findings in network visualization analysis.

ADAE achieves about 1% gain in accuracy over DAE on Citeseer when varying the training ratio from 10% to 90%, slightly better results on Cora, and comparable performance on Wiki. It shows that ANE framework can also guide the learning of more robust embeddings when building upon DAE. However, we notice that ADAE does not achieve obvious improvements over the corresponding structure preserving model as AIDW does. One reason is that denoising criterion already contributes to learning stable and robust representations [33].

Overall, the proposed method AIDW consistently outperforms all the baselines. As shown in Tables ?, ? and ?, node2vec produces better results than DeepWalk, LINE and GraRep on average. Our method can further achieve improvements over node2vec. More specifically, AIDW achieves the best classification accuracy on all three benchmark datasets across different training ratios, with only one exception on Wiki with training ratio as 10%.

### 4.4Model Sensitivity

In this section, we investigate the performance of AIDW w.r.t parameters and the type of prior on Cora dataset. Specifically, for parameter sensitivity analysis, we examine how the representation dimension , walk-length and context-size affect the performance of node classification with the training ratio as 50%. Note that except for the parameter being tested, all other parameters are set to default values. We also compare the performance of AIDW with two different priors, i.e., a Gaussian distribution () and a Uniform distribution ().

Figure ?(a) displays the results on the test of dimension . When the dimension increases from 8 to 512, the accuracy shows apparent increase at first, and then tends to saturate once the dimension reaches around 128. Besides, the performance of AIDW is not sensitive on walk-length and context-size. As shown in Figure ?(b), the accuracy slightly increases first, and then becomes stable when the walk-length varies from 40 to 100. With the increase of context-size, the performance keeps stable first, and then slightly degrades after the context-size is over 6, as shown in Figure ?(c). The degradation might be caused by the noisy neighborhood information brought in by the large context-size, since the average node degree of Cora is just about 1.95.

Figure 2 shows the results of multi-class classification on Cora with training ratio ranging from 10% to 90%. The accuracy curve of AIDW with uniform prior is almost coincided with that of AIDW with gaussian prior. It demonstrates that both types of prior can contribute to learning robust representations with no significant difference.

## 5Conclusion

An adversarial network embedding framework has been proposed for learning robust graph representations. This framework consists of a structure preserving component and an adversarial learning component. For structure preserving, we proposed inductive DeepWalk to capture network structural properties. For adversarial learning, we formulated a minimax optimization problem to impose a prior distribution on representations to enhance the robustness. Empirical evaluations in network visualization and node classification confirmed the effectiveness of the proposed method.

## 6Acknowledgments

The authors would like to thank Dr. Liang Zhang of Data Science Lab at JD.com and Prof. Xiaoming Wu of The Hong Kong Polytechnic University for their valuable discussion. Dan Wang’s work is supported in part by HK PolyU G-YBAG.

### Footnotes

- Note that we can use other ways to preprocess raw graph data to obtain the input feature X with lower dimension for large graphs. One simple way is to directly use existing scalable methods, e.g. DeepWalk and LINE, to obtain initial embeddings X as input.
- https://www.csie.ntu.edu.tw/ cjlin/liblinear/

### References

**2017.**

Arjovsky, M.; Chintala, S.; and Bottou, L. Wasserstein generative adversarial networks.**2015.**

Cao, S.; Lu, W.; and Xu, Q. Grarep: Learning graph representations with global structural information.**2016.**

Cao, S.; Lu, W.; and Xu, Q. Deep neural networks for learning graph representations.**2017.**

Donahue, J.; Krähenbühl, P.; and Darrell, T. Adversarial feature learning.**2017.**

Dumoulin, V.; Belghazi, I.; Poole, B.; Lamb, A.; Arjovsky, M.; Mastropietro, O.; and Courville, A. C. Adversarially learned inference.**2008.**

Fan, R.; Chang, K.; Hsieh, C.; Wang, X.; and Lin, C. LIBLINEAR: A library for large linear classification.**2016.**

Glover, J. Modeling documents with generative adversarial networks.**2014.**

Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A. C.; and Bengio, Y. Generative adversarial nets.**2016.**

Grover, A., and Leskovec, J. node2vec: Scalable feature learning for networks.**2017.**

Hamilton, W. L.; Ying, R.; and Leskovec, J. Inductive representation learning on large graphs.**2006.**

Hinton, G. E., and Salakhutdinov, R. Reducing the Dimensionality of Data with Neural Networks.**2015.**

Ioffe, S., and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift.**1953.**

Katz, L. A new status index derived from sociometric analysis.**2014.**

Kingma, D. P., and Welling, M. Auto-encoding variational bayes.**2016.**

Kipf, T. N., and Welling, M. Variational graph auto-encoders.**2017.**

Kipf, T. N., and Welling, M. Semi-supervised classification with graph convolutional networks.**2014.**

Levy, O., and Goldberg, Y. Neural word embedding as implicit matrix factorization.**2014.**

Li, A. Q.; Ahmed, A.; Ravi, S.; and Smola, A. J. Reducing the sampling complexity of topic models.**2007.**

Liben-Nowell, D., and Kleinberg, J. M. The link-prediction problem for social networks.**2016.**

Makhzani, A.; Shlens, J.; Jaitly, N.; and Goodfellow, I. Adversarial autoencoders.**2000.**

McCallum, A.; Nigam, K.; Rennie, J.; and Seymore, K. Automating the construction of internet portals with machine learning.**2013.**

Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. Distributed representations of words and phrases and their compositionality.**2006.**

Newman, M. E. J. Finding community structure in networks using the eigenvectors of matrices.**2016.**

Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; and Zhu, W. Asymmetric transitivity preserving graph embedding.**2014.**

Perozzi, B.; Al-Rfou, R.; and Skiena, S. Deepwalk: online learning of social representations.**2016.**

Radford, A.; Metz, L.; and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks.**2008.**

Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; and Eliassi-Rad, T. Collective classification in network data.**2011.**

Tang, L., and Liu, H. Leveraging social media networks for classification.**2008.**

Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z. Arnetminer: extraction and mining of academic social networks.**2015.**

Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. LINE: large-scale information network embedding.**2016.**

Tang, J.; Liu, J.; Zhang, M.; and Mei, Q. Visualizing large-scale and high-dimensional data.**2008.**

van der Maaten, L., and Hinton, G. Visualizing data using t-sne.**2010.**

Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; and Manzagol, P. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.**2017.**

Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; and Yang, S. Community preserving network embedding.**2016.**

Wang, D.; Cui, P.; and Zhu, W. Structural deep network embedding.**2016.**

Yang, Z.; Cohen, W. W.; and Salakhutdinov, R. Revisiting semi-supervised learning with graph embeddings.**1977.**

Zachary, W. W. An information flow model for conflict and fission in small groups.**2016.**

Zhao, J. J.; Mathieu, M.; and LeCun, Y. Energy-based generative adversarial network.