# [

###### Abstract

###### Abstract

Social network analysis is an important problem in data mining. A fundamental step for analyzing social networks is to encode network data into low-dimensional representations, i.e., network embeddings, so that the network topology structure and other attribute information can be effectively preserved. Network representation leaning facilitates further applications such as classification, link prediction, anomaly detection and clustering. In addition, techniques based on deep neural networks have attracted great interests over the past a few years. In this survey, we conduct a comprehensive review of current literature in network representation learning utilizing neural network models. First, we introduce the basic models for learning node representations in homogeneous networks. Meanwhile, we will also introduce some extensions of the base models in tackling more complex scenarios, such as analyzing attributed networks, heterogeneous networks and dynamic networks. Then, we introduce the techniques for embedding subgraphs. After that, we present the applications of network representation learning. At the end, we discuss some promising research directions for future work.

\helveticabold## 1 Keywords:

Deep learning, social networks, deep social network analysis, representation learning, network embedding

1

Deep Representation Learning for Social Network Analysis]Deep Representation Learning for Social Network Analysis

Qiaoyu Tan et al.]Qiaoyu Tan , Ninghao Liu and Xia Hu \correspondance

## 2 Introduction

Social networks, such as Facebook, Twitter and Linkedin, have greatly facilitated communications between web users around the world. The analysis of social networks helps summarize the interests and opinions of users (nodes), discovering patterns from the interactions (links) between users, and mining the events that take place in online platforms. The information obtained by analyzing social networks could be especially valuable for many applications. Some typical examples include online advertisement targeting (Li et al., 2015), personalized recommendation (Song et al., 2006), viral marketing (Chen et al., 2010; Leskovec et al., 2007), social healthcare (Tang and Yang, 2012), social influence analysis (Peng et al., 2017), academic networks analysis (Dietz et al., 2007; Guo et al., 2014).

One central problem in social network analysis is how to extract useful features from non-Euclidean structured networks, to enable the deployment of downstream machine learning prediction models for specific analysis. For example, in the case of recommending new friends to a user in a social network, the key challenge might be how to embed network users into a low-dimensional space so that the closeness between users could be easily measured with distance metrics. To process the structure information in networks, most previous efforts mainly rely on hand-crafted features, such as kernel functions (Vishwanathan et al., 2010), graph statistics (i.e., degrees or clustering coefficients) (Bhagat et al., 2011), or other carefully engineered features (Liben-Nowell and Kleinberg, 2007). However, such feature engineering process could be very time-consuming and expensive, making it ineffective for many real-world applications. An alternative way to avoid this limitation is to automatically learn feature representations that capture various information sources in networks (Bengio et al., 2013; Liao et al., 2018). The goal is to learn a transformation function that maps nodes, subgraphs or even the whole network as vectors to a low-dimensional feature space, where the spatial relations between the vectors reflect the structures or contents in the original network. Given these feature vectors, subsequent machine learning models such as classification models, clustering models and outlier detection models could be directly used towards target applications.

Along with the substantial performance improvement gained by deep learning on image recognition, text mining, and natural language processing tasks (Bengio et al., 2009), developing network representation methods using neural network models have received increasing attentions in recent years. In this survey, we provide a comprehensive overview of recent advancements in network representation learning using neural network models. After introducing the notations and problem definitions, we first review the basic representation learning models for node embedding in homogeneous networks. Specifically, based on the type of representation generation modules, we divide the existing approaches into three categories: embedding look-up based, autoencoder based and graph convolution based. Then, we give an overview of approaches that learn representations for subgraphs in networks, which to some extent rely on the techniques of node representation learning. After that, we list some applications of network representation models. At the end, we discuss some promising research directions for future work.

## 3 Notations and Problem Definitions

In this section, we define some important terminologies that will be used in later sections, and then give the formal definition of network representation learning problem. In general, we use boldface uppercase letters (e.g., ) to denote matrices, boldface lowercase letters (e.g., ) to denote vectors, and lowercase letters (e.g., ) to denote scalars. The entry, the -th row and the -th column of a matrix is denoted as , and , respectively.

Definition 1 (Network). Let be a network, where the -th node (or vertex) is denoted as and denotes the edge between node and . and are node attributes and labels, if available. Besides, we let denote the associated adjacency matrix of . is the weight of , where indicates that the two nodes are connected, and otherwise . For undirected graphs, .

In many scenarios, the nodes and edges in can also be associated with type information. Let be a node-type mapping function and be an edge-type mapping function, where and denote the set of node and edge types, respectively. Here, each node has one specific type, e.g., . Similarly, for each edge , .

Definition 2 (Homogeneous Network). A homogeneous network is a network in which . All nodes and edges in belong to one single type.

Definition 3 (Heterogeneous Network). A heterogeneous network is a network with . There are at least two different types of nodes or edges in heterogeneous networks.

Given a network , the task of network representation learning is to train a mapping function that maps certain components in , such as nodes or subgraphs, into a latent space. Let be the dimension of the latent space and usually . In this work, we focus on the problem of node representation learning and subgraph representation learning.

Definition 4 (Node Representation Learning). Suppose denote the latent vector of node , node representation learning aims to build a mapping function so that . It is expected that nodes with similar roles or characteristics, which is defined according to specific application domains, are mapped close to each other in the latent space.

Definition 5 (Subgraph Representation Learning). Let denote a subgraph of . The nodes and edges in are denoted as and , respectively, and we have and . The subgraph representation learning aims to learn a mapping function so that , where in this case corresponds to the latent vector of .

Figure 1 shows a toy example of network embedding. There are three subgraphs in this network distinguished with different colors: , , and . Given a network as input, the example below generates one representation for each node, as well as for each of the three subgraphs.

## 4 Neural Network Based Models

Neural networks have been demonstrated to have powerful capabilities in capturing complex patterns in data, and have achieved substantial success in the fields of computer vision, audio recognition and natural language processing, etc. Recently, some efforts have been made to extend neural network models to learn representations from network data. Based on the type of base neural networks that are applied, we categorize them into three subgroups: look-up table based models, autoencoder based models, and GCN based models. In this section, we first give an overview of network representation learning from the perspective of encoding and decoding. Then we discuss the details of some well-known network embedding models and how they fulfill the two steps. In this section, we only discuss representation learning for nodes. The models dealing with subgraphs will be introduced in later sections.

### 4.1 Framework Overview from the Encoder-Decoder Perspective

In order to elaborate the diversity of various neural network architectures, we argue that different techniques can be derived from the aspect of encoding and decoding schema, as well as their target network structure constrained for low dimensional feature space. Specifically, existing methods can be reduced to solving the following optimization problem:

(1) |

where is the target relations that the embedding algorithm expects to preserve, and denotes the nodes involved in . is the encoding function that maps nodes into representation vectors, and is a decoding function that reconstructs the original network structure from the representation space. denotes the trainable parameters in encoders and decoders. By minimizing the loss function above, model parameters are trained so that the desired network structure are preserved. As we will show in subsequent sections, from the overview framework aspect, the primary distinctions between various network representation methods rely on how they define the three components.

### 4.2 Models with Embedding Look-up Tables

Instead of using multiple layers of nonlinear transformation, network representation learning could be achieved simply using look-up tables which directly map a node index into its corresponding representation vector. Specifically, a look-up table could be implemented using a matrix, where each row corresponds to the representation of one node. The diversity of different models mainly lies in the definition of target relations in the network data that we hope to preserve. In the rest of this subsection, we will first introduce DeepWalk (Perozzi et al., 2014) to discuss the basic concepts and techniques in network embedding, and then extend the discussion to more complex and practical scenarios.

Skip-Gram Based Models. As a pioneering network representation model, DeepWalk treats nodes as words, samples random walks as sentences, and utilizes the skip-gram model (Mikolov et al., 2013) to learn the representations of nodes as shown in Figure 2. In this case, the encoder is implemented as two embedding look-up tables and , respectively for target embeddings and context embeddings. The network information that we try to preserve is defined as the node-context pairs observed in the random walks, where denotes the context nodes (or neighborhood) of . The objective is to maximize the probability of observing a node’s neighborhood conditioned on embeddings:

(2) |

where is a one-hot row vector of length that picks the -th row of . Let and , the conditional probability above is formulated as

(3) |

so that could be regarded as link reconstruction based on the normalized proximity between different nodes. In practice, the computation of the probability is expensive due to the summation over every node in the network, but hierarchical softmax or negative sampling can be applied to reduce time complexity.

There are also some approaches that are developed based on similar ideas. LINE (Tang et al., 2015) defines the first-order and second-order proximity for learning node embedding, where the latter can be seen as a special case of DeepWalk with context window length set as . Meanwhile, node2vec (Grover and Leskovec, 2016) applies different random walk strategies, which provides a trade-off between breadth-first search (BFS) and depth-first search (DFS) in networks search strategies. Planetoid (Yang et al., 2016) extends skip-gram models for semi-supervised learning, which predicts the class label of nodes along with the context in the input network data. In addition, it has been shown that there exists a close relationship between skip-gram models and matrix factorization algorithms (Qiu et al., 2018; Levy and Goldberg, 2014). Therefore, network embedding models that utilize matrix factorization techniques, such as LE (Belkin and Niyogi, 2002), Grarep (Cao et al., 2015), and HOPE (Ou et al., 2016), may also be implemented in the similar manner. Random sampling based approaches have the capacity to allow a flexible and stochastic measure of node similarity, making them not only achieve higher performance in many applications but also become more scalable toward large-scale datasets.

Attributed Network Embedding Models. Social networks are rich in side information, where nodes could be associated with various attributes that characterize their properties. Inspired by the idea of inductive matrix completion (Natarajan and Dhillon, 2014), TADW (Yang et al., 2015) extends the framework of DeepWalk by incorporating features of vertices into network representation learning. Besides sampling from plain networks, FeatWalk (Huang et al., 2019) proposes a novel feature-based random walk strategy to generate node sequences by considering node similarity on attributes. With the random walks based on both topological and attribute information, the skip-gram model is then applied to learn node representations.

Heterogeneous Network Embedding Models. Nodes in networks could be of different types, which poses the challenge of how to preserve relations among them. HERec (Shi et al., 2019) and metapath2vec++ (Dong et al., 2017) propose meta-path based random walk schema to discover the context across different types of nodes. The skip-gram architecture in metapath2vec++ is also modified, so that the normalization term in softmax only consider the nodes of the same type as the target node. In a more complex scenario where we have both nodes and attributes of different types, HNE (Chang et al., 2015) combines feed-forward neural networks and embedding models towards a unified framework. Suppose and denote the latent vectors of two different types of nodes, HNE defines two additional transformation matrices and to respectively map and to the joint space. Let and , intra-type node similarity and inter-type node similarity are defined as

(4) |

where we hope to preserve various types of similarities during training. As for obtaining and , HNE applies different feed-forward neural networks to map raw input (e.g., images and texts) to latent spaces, thus enables an end-to-end training framework. Specifically, the authors use a CNN to process images and a fully-connected neural network to process texts.

Dynamic Embedding Models. Real world social networks are not static and will evolve over time with addition/deletion of nodes and links. To deal with this challenge, DNE (Du et al., 2018a) presents a decomposable objective to learn the representation of each node separately, where the impact of network changes on existing nodes is measurable and the greatly affected nodes will be chosen for update as learning process proceeds. In addition, DANE (Li et al., 2017b) leverages matrix perturbation theory for tackling online embedding updates.

### 4.3 Autoencoder Techniques

In this section, we discuss network representation models based on the autoencoder architecture (Hinton and Salakhutdinov, 2006; Bengio et al., 2013). As shown in Figure 3, an autoencoder consists of two neural network modules: encoder and decoder. The encoder maps the features of each node into a latent space, and the decoder reconstructs the information about the network from the latent space. Usually the hidden representation layer has a smaller size than that of the input/output layer, forcing it to create a compressed representation that captures the non-linear structure of network. Formally, following Equation 1, the objective function of autoencoder is to minimize the reconstruction error between the input and the output decoded from low-dimensional representations.

Deep Neural Graph Representation (DNGR). DNGR (Cao et al., 2016) attempts to preserve a node’s local neighborhood information using a stacked denoising autoencoder. Specifically, assume is the PPMI matrix (Bullinaria and Levy, 2007) constructed from , then DNGR minimizes the following loss:

(5) |

where denotes the associated neighborhood information of . In this case, and DNSR targets to reconstruct the PPMI matrix. is the embedding of node in hidden layer.

Structural Deep Network Embedding (SDNE). SDNE (Wang et al., 2016) is another autoencoder-based model for network representation learning. The objective function of SDNE is:

(6) |

The first term is an autoencoder as in Equation 5, except that the recostruction error is weighted, so that more emphasis is put on recovering non-zero entries in . The second part is motivated by Laplacian Eigenmaps that imposes nearby nodes to have similar embeddings. Besides, SDNE differs from DNGR in the definition of , where DNGR defines as the PPMI matrix while SDNE sets as the adjacency matrix.

It is worth noting that, unlike in Equation 2 that uses one-hot indicator vector for embedding look-up, DNGR and SDNE transform each node’s information to an embedding by training neural network modules. Such distinction allows autoencoder-based methods to directly model on a node’s neighborhood structure and features, which is not straightforward for random walk approaches. Therefore, it is straightforward to incorporate richer information sources (e.g., node attributes) into representation learning, as to be introduced below. However, autoencoder-based methods may suffer from scalability issues as the input dimension is , which may result in significant time costs in real massive datasets.

Autoencoder-Based Attributed Network Embedding. The structure of autoencoders facilitates the incorporation of multiple information sources towards joint representation learning. Instead of only mapping nodes to the latent space, CAN (Meng et al., 2019) proposes to learn the representation of nodes and attributes in the same latent space by using variational autoencoders (VAEs) (Doersch, 2016), in order to capture the affinities between nodes and attributes. DANE (Gao and Huang, 2018) utilizes the correlation between topological and attribute information of nodes by building two autoencoders for each information source, and then encourages the two sets of latent representations to be consistent and complementary. (Li et al., 2017a) adopts another strategy, where topological feature vector and content information vector (learned by doc2vec (Le and Mikolov, 2014)) are directly concatenated and put into a VAE to capture the nonlinear relationship between them.

### 4.4 Graph Convolutional Approaches

Inspired by the significant performance improvement of convolutional neural networks (CNN) in image recognition, recent years have witnessed a surge in adapting convolutional modules to learn representations of network data. The intuition behind is to generate node embedding by aggregating information from its local neighborhood as shown in Figure 4. Different from autoencoder-based approaches, the encoding function of graph convolutional approaches leverages a node’s local neighborhood as well as attribute information. Some efforts (Bruna et al., 2013; Henaff et al., 2015; Defferrard et al., 2016; Hamilton et al., 2017a) have been made to extend traditional convolutional networks for network data to generate network embedding in the past few years. The convolutional filters of these approaches are either spatial filters or spectral filters. Spatial filters operate directly on the adjacency matrix whereas spectral filters operate on the spectrum of graph Laplacian (Defferrard et al., 2016).

Graph Convolutional Networks (GCN). GCN (Bronstein et al., 2017) is a well-known semi-supervised graph convolutional networks. It defines a convolutional operator on network, and iteratively aggregates embeddings of neighbors of a node and uses the aggregated embedding as well as its own embedding at previous iteration to generate the node’s new representation. The layer-wise propagation rule of encoding function is defined as:

(7) |

where denotes the learned embeddings in layer , and . is the adjacency matrix with added self-connections. is the identity matrix, . is a layer-wise trainable weight matrix. denotes an activation function such as ReLU. The loss function for supervised training is to evaluate the cross-entroy error over all labeled nodes:

(8) |

where is the predictive matrix with candidate labels. can be viewed as a fully-connected network with the softmax activation function to map representations to predicted labels. Note that unlike autoencoders that explicitly treat each node’s neighborhood as features or reconstruction goals as in Equation 5 or Equation 6, GCN implicitly applies the local neighborhood links on each encoding layer as pathways to aggregate embeddings from neighbors, so that higher order network structures are utilized. Since Equation 8 is a supervised loss function, is not applicable here. However, the loss function can also be formulated in unsupervised manners, similar to the skip-gram model (Hamilton et al., 2017a; Kipf and Welling, 2016). GCN may suffer from the scalability problem when the size of is large. The corresponding training algorithms have been proposed to tackle this challenge (Ying et al., 2018a), where the network data is processed in small batches and we can sample a node’s local neighbors instead of using all of them.

Inductive Training With GCN. So far many basic models we have reviewed mainly generate network representations in a transductive manner. GraphSAGE (Hamilton et al., 2017a) emphasized the inductive capability of GCN. Inductive learning is essential for high-throughput machine learning systems, especially when operating on evolving networks that constantly encounter unseen nodes (Yang et al., 2016; Guo et al., 2018). The core representation update scheme of GraphSAGE is similar to that of traditional GCN, except that the operation on the whole network is replace by sample-based representation aggregators:

(9) |

where is the hidden representation of node in the -th layer. CONCAT denotes concatenation operator and represents neighborhood aggregation function of the -th layer (e.g., element-wise mean or max operator). denotes the neighbors of . Compared with Equation 7, GraphSAGE only needs to aggregate feature vectors from the partial set of neighbors, making it scalable for large-scale data. Given the attribute features and neighborhood relations of an unseen node, GraphSAGE can generate the embedding of this node by leveraging its local neighbors as well as attributes via forward propagation.

Graph Attention Mechanisms. Attention mechanisms have become the standard technique in many sequence-based tasks, in order to make models focus on the most relevant parts of the input in making decisions. We could also utilize attention mechanisms to aggregate the most important features from nodes’ local neighbors. GAT (Velickovic et al., 2017) extends the framework of GCN by replacing the standard aggregation function with an attention layer to aggregate message from most important neighbors. Also, (Thekumparampil et al., 2018) proposes to remove all intermediate fully-connected layers in conventional GCN, and replace the propagation layers with attention layers. It thus allows the model to learn a dynamic and adaptive local summary of neighborhoods, greatly reduces the parameters, and also achieves more accurate predictions.

## 5 Subgraph Embedding

Besides learning representations for nodes, recent years have also witnessed an increasing branch of research efforts that try to learn representations for a set of nodes and edges as an integral. Thus, the goal is to represent a subgraph with a low-dimensional vector. Many traditional methods that operate on subgraphs rely on graph kernels (Haussler, 1999), which decompose a network into some atomic substructures such as graphlets, subtree patterns and paths, and treat these substructures as features to obtain an embedding through further transformation. In this section, however, we focus on reviewing methods that seek to automatically learn embeddings of subgraphs using deep models. For those who are interested in graph kernels, we refer the readers to (Vishwanathan et al., 2010).

According to the literature, most existing methods are built upon the techniques used for node embedding, as introduced in Section 4. However, in graph representation problems, the label information is associated with particular subgraphs instead of individual nodes or links. In this survey, we divide the approaches of subgraph representation learning into two categories based on how they aggregate node-level embeddings in each subgraph. The detailed discussion for each category is as below.

### 5.1 Flat Aggregation

Assume denotes the set of nodes in a particular subgraph and represents the subgraph’s embedding, could be obtained by aggregating the embeddings of all individual nodes in the subgraph:

(10) |

where denotes the aggregation function. Methods based on such flat aggregation usually define that captures simple correlations among nodes. For example, (Niepert et al., 2016) directly concatenates node embeddings together and utilize standard convolutional neural networks as aggregation function to generate graph representation. (Dai et al., 2016) employs a simple element-wise summation operation to define , and learns graph embedding by summing all embeddings of individual nodes.

In addition, some methods apply recurrent neural networks (RNNs) for representing graphs. Some typical methods first sample a number of graph sequences from the input network, and then apply RNN-based autoencoders to generate embedding for each graph sequence. The final graph representation is obtained by either averaging (Jin et al., 2018) or concatenating (Taheri et al., 2018) these graph sequence embeddings.

### 5.2 Hierarchical Aggregation

In contrast to flat aggregation, the motivation behind hierarchical aggregation is to preserve the hierarchical structure that might be presented in the subgraph by aggregating neighborhood information via a hierarchical way. (Bruna et al., 2013) and (Defferrard et al., 2016) attempt to utilize such hierarchical structure of networks by combining convolutional neural networks with graph coarsening. The main idea behind them is to stack multiple graph coarsening and convolutional layers. In each layer, they first apply graph cluster algorithms to group nodes, and then merge node embeddings within each cluster using element-wise max-pooling. After clustering, they generate a new coarse network by stacking embeddings of clusters together, which is again fed into convolutional layers and the same process repeats. Clusters in each layer can be viewed as subgraphs, and cluster algorithms are used to learn the assignment matrix of subgraphs, so that the hierarchical structure of network is also propagated through layers. Although these methods work well in certain applications, they actually follow a two-stage fashion, where the stages of clustering and embedding may not reinforce each other.

To avoid this limitation, DiffPool (Ying et al., 2018b) proposes an end-to-end model that does not depend on a deterministic clustering subroutine. The layer-wise propagation rule is formulated as below:

(11) |

where denotes node embeddings, is the cluster assignment matrix learned from the previous layer. The goal of the left equation is to generate the -th coarser network embedding by aggregating node embeddings according to cluster assignment ; while the right equation is to learn a new coarsened adjacency matrix from the previous adjacency matrix , which stores the similarity between each pair of clusters. Here, instead of applying deterministic clustering algorithm to learn , they adopt graph neural networks (GNNs) to learn it. Specifically, they use two separate GNNs on the input embedding matrix and coarsened adjacency matrix to generate assignment matrix and embedding matrix , respectively. Formally, , and . The two steps could reinforce each other to improve the performance. DiffPool may suffer from computational issues brought by the computation of soft clustering assignment, which is further addressed in (Cangea et al., 2018).

## 6 Applications

The representations learned from networks can be easily applied to downstream machine learning models for further analysis on social networks. Some common applications include node classification, link prediction, anomaly detection and clustering.

### 6.1 Node Classification

In social networks, people are often associated with semantic labels with respect to certain aspects of them, such as affiliations, interests or beliefs. However, in real-world scenarios, people are usually partially or sparsely labeled, since labeling is expensive and time consuming. The goal of node classification is to predict labels of unlabeled nodes in networks by leveraging their connections with the labeled ones considering the network structure. According to (Bhagat et al., 2011), existing methods can be classified into two categories, e.g., random walk based, and feature extraction based methods. The former aims to propagate labels with random walks (Baluja et al., 2008), while the latter targets to extract features from a node’s surrounding information and network statistics.

In general, network representation approach follows the second principle. A number of existing network representation models, like (Yang et al., 2015; Wang et al., 2016; Liao et al., 2018), focus on extracting node features from network using representation learning techniques, and then apply machine learning classifiers like support vector machine, naive bayes classifiers, and logistic regression for prediction. In contrast to separating the steps of node embedding and node classification, some recent work (Hamilton et al., 2017a; Dai et al., 2016; Monti et al., 2017) designs a end-to-end framework to combine the two tasks, so that the discriminative information inferred from labels can directly benefit the learning of network embedding.

### 6.2 Link Prediction

Social networks are not necessarily complete as some links might be missing. For example, friendship links between two users in a social network can be missing even they actually know each other in real world. The goal of link prediction is to infer the existence of new interactions or emerging links between users in the future, based on the observed links and the network evolution mechanism (Lü and Zhou, 2011; Al Hasan and Zaki, 2011; Liben-Nowell and Kleinberg, 2007). In network embedding, an effective model is expected to preserve both network structure and inherent dynamics of the network in the low-dimensional space. In general, the majority of previous work focus on predicting missing links between users under homogeneous network settings (Grover and Leskovec, 2016; Ou et al., 2016; Zhou et al., 2017), and some efforts also attempt to predict missing links in heterogeneous networks (Liu et al., 2017b, 2018b). Although beyond the scope of this survey, applying network embedding for building recommender systems (Ying et al., 2018a) may also be a direction that is worth exploring.

### 6.3 Anomaly Detection

Another challenging task in social network analysis is anomaly detection. Malicious activities in social networks, such as spamming, fraud and phishing, can be interpreted as rare or unexpected behaviors that deviate from the majority of normal users. While numerous algorithms have been proposed for spotting anomalies and outliers in networks (Savage et al., 2014; Akoglu et al., 2015; Liu et al., 2017a), anomaly detection methods based on network embedding techniques are receiving increasing attentions recently (Hu et al., 2016; Peng et al., 2018; Liang et al., 2018). The discrete and structural information in networks are merged and projected into the continuous latent space, which facilitates the application of various statistical or geometrical algorithms in measuring the degree of isolation or outlierness of network components. In addition, in contrast to detect malicious activities in a static way, (Sricharan and Das, 2014) and (Yu et al., 2018) also attempt to study the problem in dynamic networks.

### 6.4 Node Clustering

In addition to the above applications, node clustering is another important network analysis problem. The target of node clustering is to partition a network into a set of clusters (or subgraphs), so that nodes in the same cluster are more similar to each other than those from other clusters. In social networks, such clusters are widely spread in terms of communities, such as groups of people that belong to similar affiliations or have similar interests. Most previous work focuses on clustering networks with various metrics of proximity or connection strength between nodes. For examples, (Shi and Malik, 2000) and (Ding et al., 2001) seek to maximize the number of connections within clusters while minimize the connections between clusters. Recently, many efforts have resort to network representation techniques for node clustering. Some methods treat embedding and clustering as disjoint tasks, where they first embed nodes to low-dimensional vectors, and then apply traditional clustering algorithms to produce clusters (Tian et al., 2014; Cao et al., 2015; Wang et al., 2017). Other methods such as (Tang et al., 2016) and (Wei et al., 2017) consider the optimization problem of clustering and network embedding in a unified objective function and generate cluster-induced node embeddings.

## 7 Conclusion and Future Directions

Recent years have witnessed a surge in leveraging representation learning techniques for network analysis. In this survey, we have provided a overview of the recent efforts on this topic. Specifically, we summarize existing techniques into three subgroups based on the type of the core learning modules: representation look-up tables, autoencoders and graph convolutional networks. Although many techniques have been developed for a wide spectrum of social networks analysis problems in the past few years, we believe there still remains many promising directions worth of further exploration.

Dynamic networks. Social networks are inherently highly dynamic in real-life scenarios. The overall set of nodes, the underlying network structure, as well as attribute information, might evolve over time. As an example, these elements in real world social networks such as Facebook could correspond to users, connections and personal profiles. This property makes existing static learning techniques fail to work properly. Although several methods have been proposed to tackle dynamic networks, they often rely on certain assumptions, such as assuming that the node set is fixed and only deal with dynamics caused by edge deletion and addition (Li et al., 2017b). Also, the changes in attribute information are rarely considered in existing works. Therefore, how to design effective and efficient network embedding techniques for truly dynamic networks is still an open question.

Hierarchical network structure. Most of the existing techniques mainly focus on designing advanced encoding or decoding functions trying to capture node pairwise relationships. Nevertheless, pairwise relations can only provide insights about local neighborhoods, and might not infer global hierarchical network structures, which however is crucial for more complex networks (Benson et al., 2016). How to design effective network embedding methods that are capable of preserving hierarchical structures of networks is an promising direction for further work.

Heterogeneous networks. Existing network embedding methods mainly deal with homogeneous networks. However, many relational systems in real-life scenarios can be abstracted as heterogeneous networks with multiple types of nodes or edges. In this case, it is hard to evaluate semantic proximity between different network elements in the low-dimensional space. While some work have investigated the use of metapaths (Dong et al., 2017; Huang and Mamoulis, 2017) to approximate semantic similarity for heterogeneous network embedding, many tasks on heterogeneous networks have not been fully evaluated. Learning embeddings for heterogeneous networks is still at the early stage, more comprehensive techniques are needed to fully capture the relations between different types of network elements, towards modeling more complex real systems.

Scalability. Although deep learning based network embedding methods have achieved substantial performances due to their great capacities, they still suffer from the problem of efficiency. This problem will become more severe when dealing with real-life massive datasets with billions of nodes and edges. Designing deep representation learning frameworks that are scalable for real network datasets is another driving factor to advance the research on this domain. In addition, similar to using GPUs for traditional deep models built on grid structured data, developing computational paradigms for large-scale network processing could be an alternative way towards efficiency improvement (Bronstein et al., 2017).

Interpretability. Despite the superior performances achieved by deep models, one fundamental limitation of them is the lack of interpretability (Liu et al., 2018a). Different dimensions in the embedding space usually have no specific meaning, thus it is difficulty to comprehend the underlying factors that have been preserved in the latent space. Since the interpretability aspect of machine learning models is receiving more and more attentions recently (Montavon et al., 2018; Du et al., 2018b), it might also be important to explore how to understand the representation learning outcome, how to develop interpretable network representation learning models, as well as how to utilize interpretation to improve the representation models. Answering these questions is helpful for learning more meaningful and task-specific embeddings towards various social network analysis problems.

## References

- Akoglu et al. (2015) Akoglu, L., Tong, H., and Koutra, D. (2015). Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29, 626–688
- Al Hasan and Zaki (2011) Al Hasan, M. and Zaki, M. J. (2011). A survey of link prediction in social networks. In Social Network Data Analytics (Springer). 243–275
- Baluja et al. (2008) Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., et al. (2008). Video suggestion and discovery for youtube: taking random walks through the view graph. In International Conference on World Wide Web (ACM), 895–904
- Belkin and Niyogi (2002) Belkin, M. and Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems. 585–591
- Bengio et al. (2013) Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1798–1828
- Bengio et al. (2009) Bengio, Y. et al. (2009). Learning deep architectures for ai. Foundations and trends® in Machine Learning 2, 1–127
- Benson et al. (2016) Benson, A. R., Gleich, D. F., and Leskovec, J. (2016). Higher-order organization of complex networks. Science 353, 163
- Bhagat et al. (2011) Bhagat, S., Cormode, G., and Muthukrishnan, S. (2011). Node classification in social networks. In Social Network Data Analytics (Springer). 115–148
- Bronstein et al. (2017) Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34, 18–42
- Bruna et al. (2013) Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203
- Bullinaria and Levy (2007) Bullinaria, J. A. and Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior research methods 39, 510–526
- Cangea et al. (2018) Cangea, C., Veličković, P., Jovanović, N., Kipf, T., and Liò, P. (2018). Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:1811.01287
- Cao et al. (2015) Cao, S., Lu, W., and Xu, Q. (2015). Grarep: Learning graph representations with global structural information. In ACM International Conference on Information and Knowledge Management (ACM), 891–900
- Cao et al. (2016) Cao, S., Lu, W., and Xu, Q. (2016). Deep neural networks for learning graph representations. In AAAI Conference on Artificial Intelligence (AAAI), 1145–1152
- Chang et al. (2015) Chang, S., Han, W., Tang, J., Qi, G.-J., Aggarwal, C. C., and Huang, T. S. (2015). Heterogeneous network embedding via deep architectures. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 119–128
- Chen et al. (2010) Chen, W., Wang, C., and Wang, Y. (2010). Scalable influence maximization for prevalent viral marketing in large-scale social networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 1029–1038
- Dai et al. (2016) Dai, H., Dai, B., and Song, L. (2016). Discriminative embeddings of latent variable models for structured data. In International Conference on Machine Learning. 2702–2711
- Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852
- Dietz et al. (2007) Dietz, L., Bickel, S., and Scheffer, T. (2007). Unsupervised prediction of citation influences. In International Conference on Machine learning (ACM), 233–240
- Ding et al. (2001) Ding, C. H., He, X., Zha, H., Gu, M., and Simon, H. D. (2001). A min-max cut algorithm for graph partitioning and data clustering. In IEEE International Conference on Data Mining (IEEE), 107–114
- Doersch (2016) Doersch, C. (2016). Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908
- Dong et al. (2017) Dong, Y., Chawla, N. V., and Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 135–144
- Du et al. (2018a) Du, L., Wang, Y., Song, G., Lu, Z., and Wang, J. (2018a). Dynamic network embedding: An extended approach for skip-gram based network embedding. In International Joint Conference on Artificial Intelligence. 2086–2092
- Du et al. (2018b) Du, M., Liu, N., and Hu, X. (2018b). Techniques for interpretable machine learning. arXiv preprint arXiv:1808.00033
- Gao and Huang (2018) Gao, H. and Huang, H. (2018). Deep attributed network embedding. In International Joint Conference on Artificial Intelligence. 3364–3370
- Grover and Leskovec (2016) Grover, A. and Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 855–864
- Guo et al. (2018) Guo, J., Xu, L., and Chen, E. (2018). Spine: Structural identity preserved inductive network embedding. arXiv preprint arXiv:1802.03984
- Guo et al. (2014) Guo, Z., Zhang, Z. M., Zhu, S., Chi, Y., and Gong, Y. (2014). A two-level topic model towards knowledge discovery from citation networks. IEEE Transactions on Knowledge and Data Engineering 26, 780–794
- Hamilton et al. (2017a) Hamilton, W., Ying, Z., and Leskovec, J. (2017a). Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034
- Hamilton et al. (2017b) Hamilton, W. L., Ying, R., and Leskovec, J. (2017b). Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584
- Haussler (1999) Haussler, D. (1999). Convolution kernels on discrete structures. Tech. rep., Technical report, Department of Computer Science, University of California at Santa Cruz
- Henaff et al. (2015) Henaff, M., Bruna, J., and LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163
- Hinton and Salakhutdinov (2006) Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science 313, 504–507
- Hu et al. (2016) Hu, R., Aggarwal, C. C., Ma, S., and Huai, J. (2016). An embedding approach to anomaly detection. In IEEE International Conference on Data Engineering. 385–396
- Huang et al. (2019) Huang, X., Song, Q., Yang, F., and Hu, X. (2019). Large-scale heterogeneous feature embedding. In AAAI Conference on Artificial Intelligence
- Huang and Mamoulis (2017) Huang, Z. and Mamoulis, N. (2017). Heterogeneous information network embedding for meta path based proximity. arXiv preprint arXiv:1701.05291
- Jin et al. (2018) Jin, H., Song, Q., and Hu, X. (2018). Discriminative graph autoencoder. In IEEE International Conference on Big Knowledge (ICBK)
- Kipf and Welling (2016) Kipf, T. N. and Welling, M. (2016). Variational graph auto-encoders. arXiv preprint arXiv:1611.07308
- Le and Mikolov (2014) Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188–1196
- Leskovec et al. (2007) Leskovec, J., Adamic, L. A., and Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web 1, 5
- Levy and Goldberg (2014) Levy, O. and Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177–2185
- Li et al. (2017a) Li, H., Wang, H., Yang, Z., and Odagaki, M. (2017a). Variation autoencoder based network representation learning for classification. In Proceedings of ACL 2017, Student Research Workshop
- Li et al. (2017b) Li, J., Dani, H., Hu, X., Tang, J., Chang, Y., and Liu, H. (2017b). Attributed network embedding for learning in a dynamic environment. In ACM Conference on Information and Knowledge Management. 387–396
- Li et al. (2015) Li, Y., Zhang, D., and Tan, K.-L. (2015). Real-time targeted influence maximization for online advertisements. Proceedings of the VLDB Endowment 8, 1070–1081
- Liang et al. (2018) Liang, J., Jacobs, P., Sun, J., and Parthasarathy, S. (2018). Semi-supervised embedding in attributed networks with outliers. In SIAM International Conference on Data Mining (SIAM), 153–161
- Liao et al. (2018) Liao, L., He, X., Zhang, H., and Chua, T.-S. (2018). Attributed social network embedding. IEEE Transactions on Knowledge and Data Engineering
- Liben-Nowell and Kleinberg (2007) Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58, 1019–1031
- Liu et al. (2017a) Liu, N., Huang, X., and Hu, X. (2017a). Accelerated local anomaly detection via resolving attributed networks. In International Joint Conference on Artificial Intelligence. 2337–2343
- Liu et al. (2018a) Liu, N., Huang, X., Li, J., and Hu, X. (2018a). On interpretation of network embedding via taxonomy induction. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
- Liu et al. (2017b) Liu, Z., Zheng, V. W., Zhao, Z., Zhu, F., Chang, K. C.-C., Wu, M., et al. (2017b). Semantic proximity search on heterogeneous graph by proximity embedding. In AAAI Conference on Artificial Intelligence. 154–160
- Liu et al. (2018b) Liu, Z., Zheng, V. W., Zhao, Z., Zhu, F., Chang, K. C.-C., Wu, M., et al. (2018b). Distance-aware dag embedding for proximity search on heterogeneous graphs. In AAAI Conference on Artificial Intelligence
- Lü and Zhou (2011) Lü, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications 390, 1150–1170
- Meng et al. (2019) Meng, Z., Liang, S., Bao, H., and Zhang, X. (2019). Co-embedding attributed networks. In ACM International Conference on Web Search and Data Mining. 393–401
- Mikolov et al. (2013) Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
- Montavon et al. (2018) Montavon, G., Samek, W., and Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing
- Monti et al. (2017) Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model cnns. In The IEEE Conference on Computer Vision and Pattern Recognition. vol. 1, 3
- Natarajan and Dhillon (2014) Natarajan, N. and Dhillon, I. S. (2014). Inductive matrix completion for predicting gene–disease associations. Bioinformatics 30, i60–i68
- Niepert et al. (2016) Niepert, M., Ahmed, M., and Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In International Conference on Machine Learning. 2014–2023
- Ou et al. (2016) Ou, M., Cui, P., Pei, J., Zhang, Z., and Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 1105–1114
- Peng et al. (2017) Peng, S., Wang, G., and Xie, D. (2017). Social influence analysis in social networking big data: Opportunities and challenges. IEEE Network 31, 11–17
- Peng et al. (2018) Peng, Z., Luo, M., Li, J., Liu, H., and Zheng, Q. (2018). Anomalous: A joint modeling approach for anomaly detection on attributed networks. In International Joint Conference on Artificial Intelligence. 3513–3519
- Perozzi et al. (2014) Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk: Online learning of social representations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 701–710
- Qiu et al. (2018) Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., and Tang, J. (2018). Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In ACM International Conference on Web Search and Data Mining. 459–467
- Savage et al. (2014) Savage, D., Zhang, X., Yu, X., Chou, P., and Wang, Q. (2014). Anomaly detection in online social networks. Social Networks 39, 62–70
- Shi et al. (2019) Shi, C., Hu, B., Zhao, W. X., and Philip, S. Y. (2019). Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering 31(2), 357–370
- Shi and Malik (2000) Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905
- Song et al. (2006) Song, X., Tseng, B. L., Lin, C.-Y., and Sun, M.-T. (2006). Personalized recommendation driven by information flow. In International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM), 509–516
- Sricharan and Das (2014) Sricharan, K. and Das, K. (2014). Localizing anomalous changes in time-evolving graphs. In ACM SIGMOD International Conference on Management of Data (ACM), 1347–1358
- Taheri et al. (2018) Taheri, A., Gimpel, K., and Berger-Wolf, T. (2018). Learning graph representations with recurrent neural network autoencoders. KDD Deep Learning Day
- Tang et al. (2015) Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). Line: Large-scale information network embedding. In International Conference on World Wide Web. 1067–1077
- Tang et al. (2016) Tang, M., Nie, F., and Jain, R. (2016). Capped lp-norm graph embedding for photo clustering. In ACM Multimedia Conference (ACM), 431–435
- Tang and Yang (2012) Tang, X. and Yang, C. C. (2012). Ranking user influence in healthcare social media. ACM Transactions on Intelligent Systems and Technology 3, 73
- Thekumparampil et al. (2018) Thekumparampil, K. K., Wang, C., Oh, S., and Li, L.-J. (2018). Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735
- Tian et al. (2014) Tian, F., Gao, B., Cui, Q., Chen, E., and Liu, T.-Y. (2014). Learning deep representations for graph clustering. In AAAI Conference on Artificial Intelligence. 1293–1299
- Velickovic et al. (2017) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903
- Vishwanathan et al. (2010) Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., and Borgwardt, K. M. (2010). Graph kernels. Journal of Machine Learning Research 11, 1201–1242
- Wang et al. (2016) Wang, D., Cui, P., and Zhu, W. (2016). Structural deep network embedding. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), 1225–1234
- Wang et al. (2017) Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., and Yang, S. (2017). Community preserving network embedding. In AAAI Conference on Artificial Intelligence. 203–209
- Wei et al. (2017) Wei, X., Xu, L., Cao, B., and Yu, P. S. (2017). Cross view link prediction by learning noise-resilient representation consensus. In International Conference on World Wide Web. 1611–1619
- Yang et al. (2015) Yang, C., Liu, Z., Zhao, D., Sun, M., and Chang, E. Y. (2015). Network representation learning with rich text information. In International Joint Conference on Artificial Intelligence. 2111–2117
- Yang et al. (2016) Yang, Z., Cohen, W., and Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning. 40–48
- Ying et al. (2018a) Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018a). Graph convolutional neural networks for web-scale recommender systems. arXiv preprint arXiv:1806.01973
- Ying et al. (2018b) Ying, R., You, J., Morris, C., Ren, X., Hamilton, W. L., and Leskovec, J. (2018b). Hierarchical graph representation learning withdifferentiable pooling. arXiv preprint arXiv:1806.08804
- Yu et al. (2018) Yu, W., Cheng, W., Aggarwal, C. C., Zhang, K., Chen, H., and Wang, W. (2018). Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2672–2681
- Zhou et al. (2017) Zhou, C., Liu, Y., Liu, X., Liu, Z., and Gao, J. (2017). Scalable graph embedding for asymmetric proximity. In AAAI Conference on Artificial Intelligence. 2942–2948