Attributed Network Embedding for Incomplete Attributed Networks
Abstract
Attributed networks are ubiquitous since a network often comes with auxiliary attribute information e.g. a social network with user profiles. Attributed Network Embedding (ANE) has recently attracted considerable attention, which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information. The resulting node embeddings can then facilitate various network downstream tasks e.g. link prediction. Although there are several ANE methods, most of them cannot deal with incomplete attributed networks with missing links and/or missing node attributes, which often occur in realworld scenarios. To address this issue, we propose a robust ANE method, the general idea of which is to reconstruct a unified denser network by fusing two sources of information for information enhancement, and then employ a random walks based network embedding method for learning node embeddings. The experiments of link prediction, node classification, visualization, and parameter sensitivity analysis on six realworld datasets validate the effectiveness of our method to incomplete attributed networks.
Attributed Network Embedding for Incomplete Attributed Networks
Chengbin Hou , Shan He , Ke Tang
Southern University of Science and Technology University of Birmingham
chengbin.hou10@foxmail.com s.he@cs.bham.ac.uk tangk3@sustech.edu.cn
1 Introduction
A network/graph, which consists of a set of nodes/vertices and links/edges, is a widely used data representation. In the realworld scenarios, it often comes with auxiliary/side information [?; ?; ?]. An attributed network can naturally include such auxiliary information as node attributes to better describe complex systems [?; ?; ?]. For example, for a citation network, one may transform paper title into its attributes using NLP techniques [?]; for a social network, one may transform user profiles into its attributes using onehot encoding [?]; and even for a pure network, one may encode node degrees as its attributes [?].
Network Embedding (NE) a.k.a. Network Representation Learning has become an emerging topic in Data Mining, Machine Learning, and Network Science [?; ?; ?]. Typically, NE aims to learn low dimensional node embeddings while preserving one or more network properties [?]. The resulting node embeddings^{1}^{1}1Essentially, node embeddings are just the data points in a low dimensional vector space, so that the offtheshelf distance metrics and Machine Learning techniques can be easily applied. can then facilitate various network downstream analytic tasks [?] such as link prediction [?; ?; ?] and node classification [?; ?; ?].
There have been many successful Purestructure based Network Embedding (PNE) methods [?; ?; ?; ?; ?; ?]. However, PNE methods cannot utilize widely accessible attribute information, which is highly correlated with structural information (so called homophily) [?; ?]. Attributed Network Embedding (ANE), which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information, has recently attracted considerable attention [?; ?; ?].
1.1 Incomplete Attributed Networks
Nevertheless, most of existing ANE methods have not considered the incomplete attributed networks with missing links and/or missing node attributes.
The incomplete structural information i.e. missing links can be observed in many realworld networks: in a social network, some abnormal users e.g. criminals may intentionally hide their friendships, and some newly registered users may have none or very limited friendships; in a terroristattack network where each node denotes an attack and two linked attacks are committed by the same organization, it is wellknown that many anonymous attacks are not clearly resolved yet [?]; and so on and so forth.
The incomplete attribute information i.e. missing node attributes may exist in some realworld networks e.g. in a social network, many users nowadays are unwilling to provide personal information due to worrying about personal privacy. Furthermore, it becomes harder to crawl complete attributed networks due to the development of anticrawler techniques, especially while crawling data from the wordleading companies such as Facebook and Tencent.
1.2 The Challenges
The incomplete attributed networks bring several challenges to existing NE methods. Firstly, most PNE methods such as DeepWalk [?] and Node2Vec [?], may obtain less accurate node embeddings, because they can only utilize (incomplete) structural information. Secondly, some ANE methods such as GCN [?] and SAGE [?], relied on links to aggregate node attributes, are likely to fail especially for those nodes with none or few links. Thirdly, the ANE methods based on matrix factorization such as TADW [?] and AANE [?] may not converge due to factorizing over too sparse matrix caused by missing links and/or missing attributes. And finally, the ANE methods based on dense deep neural networks like ASNE [?] may lack training samples due to missing links, since they require links to build training samples.
To sum up, the existing methods have not considered incomplete attributed networks, and hence, they do not have the mechanism to compensate missing information.
1.3 Our Idea
To tackle the challenges, a mechanism is designed to compensate incomplete structural information with available (but may also be incomplete) attribute information, and vise versa. In general, our idea is to 1) reconstruct a unified denser network in which all nodes gain much richer relationships by fusing two sources of information via transition matrices for information enhancement; 2) employ a weighted random walks based PNE method to learn node embeddings based on the reconstructed network. In particular, the information enhancement step is designed to compensate missing information with each other. The proposed method, Attributed Biased Random Walks (ABRW), is illustrated in Figure 1.
1.4 Contributions
The contributions are summarized as follows:

We justify and investigate a largely ignored realworld problem of embedding incomplete attributed networks.

We propose an ANE method for incomplete attributed networks by learning embeddings on the reconstructed denser network after information enhancement. Several experiments show that our method consistently outperforms the stateoftheart methods in most cases.

An opensource framework including several network embedding methods is available at https://github.com/houchengbin/OpenANE for benefiting future research and industrial applications.
2 Notations and Definitions
Let be a given attributed network where denotes a set of nodes; denotes a set links; the weight associated to each link is a scalar ; the attributes associated to each node are in a row vector ; are the subscripts. Note that the proposed method can accept either directed or undirected and either weighted or unweighted attributed networks.
Definition 1.
Structural Information Matrix : The structural information refers to network linkage information, which is encoded in matrix . There are several popular choices to encode structural information [?] such as the first order proximity that gives the information of immediate/onehop neighbors of a node, the second order proximity that gives the information of twohop neighbors of a node, etc. In this work, the first order proximity is used to define a.k.a. the adjacency matrix.
Definition 2.
Attribute Information Matrix : The attribute information refers to network auxiliary information associated with each node, which is encoded in matrix where each row corresponds to the node attribute information for node . To obtain the vector presentation , one may employ word embedding technique if it is textural auxiliary information [?], and one may employ onehot encoding technique if it is categorical auxiliary information [?].
Definition 3.
Attributed Network Embedding: It aims to find a mapping function where and each row vector is the node embedding vector. The pairwise similarity of node embeddings should reflect the pairwise similarity of the nodes in original attributed network.
3 The Proposed Method
3.1 Preprocessing
A transition matrix a.k.a. Markov matrix or stochastic matrix is a square matrix where each entry is a nonnegative real number. And in this work, each row of transition matrix gives discrete probability distribution to indicate the probability of a walker to the next node from node .
Structural Transition Matrix : This matrix is used to sample the next node from current node based on the discrete probability distribution given by row vector i.e. row of . To calculate , we have:
(1) 
where is a function operating on each row of such that each row becomes a probability distribution. Note that, the structural transition matrix might not be a strict transition matrix, since the isolated node leads to allzero row. One may assign the uniform distribution to those rows, nevertheless, we retain allzero rows to avoid the meaningless (or misleading) links in the later reconstructed network.
Attribute Similarity Matrix : This matrix stores the similarity measurements of attribute information between every pair of nodes in a network. Recall that the given attribute information is where each row corresponds to the node attribute information for node . To calculate , we have:
(2) 
where is a function to measure the similarity between every pair of rows of . In this work, we adopt cosine similarity as the measure. Previous work [?] has shown that cosine similarity is a good measure in both continuous and binary vector spaces, although one may try other similarity measure e.g. Jaccard similarity.
A sparse operator : The aim of this operator is to make attribute similarity matrix sparser. In practice, is often a very dense matrix with few zeros, which would lead to a high computational cost in subsequent sampling stage. Our sparse operator is defined as:
(3) 
where the sparse operator operates on each row of , so as to preserve the largest values but set the rest to zeros. This can avoid building links for those dissimilar (or not such similar) nodes in the later reconstructed network.
Attribute Transition Matrix : It is similarly defined as the structural transition matrix as shown in Eq. (1), except it is from attribute information perspective.
3.2 Information Enhancement
In order to compensate the incomplete information with each other, we obtain a biased transition matrix by fusing the above two transition matrices for information enhancement, which results in a denser network with much richer relationships.
Biased Transition Matrix : This transition matrix aims to fuse two sources of information. The information enhancement is achieved by the following equation:
(4) 
where , and are the rows of the corresponding transition matrices. In cases of isolated nodes, the row vector is all zeros and we directly assign attribute information to for compensation. For other cases, we apply a balancing factor to tradeoff two sources of information.
The reconstructed network: The reconstructed network is then established based on the biased transition matrix after information enhancement. And it comes with several properties: 1) it reflects both structural and attribute information; 2) it is a weighted and directed network, which is more informative than an unweighted and undirected network; and 3) it does not contain isolated nodes and each node in the reconstructed network gains much richer relationships.
3.3 Learning Node Embeddings
The problem is now transformed into learning node embeddings on the reconstructed network (without attributes). There have been many successful PNE methods. But considering scalability, we follow random walks based PNE methods e.g. DeepWalk to learn node embeddings. Because of the weighted reconstructed network, differently from DeepWalk, we apply weighted random walks to each node in a network, so as to generate a list of node sequences.
For each node sequence, a fixedsize window is used to slide along it. For each window, several training pairs are generated such that is the center node and is the remaining/neighboring nodes. By doing so, we obtain a list of training pairs for all sequences, which is then fed into SkipGram Negative Sampling (SGNS) model [?] for training node embeddings. For each pair , we maximize the following objective function:
(5) 
where is the Sigmoid function, is the node embedding vector for node , is the number of negative samples, and is the negative sample from the unigram distribution [?]. The aim of maximizing Eq. (5) is to make embedding vectors similar if they cooccur, and dissimilar if they are negative samples.
The overall objective is to sum over all i.e. . Intuitively, the more frequently a pair of nodes cooccurs, the more similar they are.
3.4 Algorithm Implementation
For better reproducibility and understanding, we summarize the core implementation details in Algorithm 1.
To save memory usage and further reduce time complexity while generating a list of walks, as shown in lines 9 and 10, we directly adopt the corresponding probability distribution based on biased transition matrix , so as to avoid explicitly reconstructing the network; and then, we employ alias sampling method to efficiently simulate a random walk.
Once we obtain a list of walks/sequences by Algorithm 1, we follow [?] to employ the welldeveloped Python library Gensim [?] and its efficient API Word2Vec for learning node embeddings. According to Section 3.3, some key parameters used in the API are: model SGNS, window size , negative samples , and the exponent used in negative sampling distribution .
3.5 Complexity Analysis
Regarding algorithm 1, for lines 1 and 2, the time consuming operations are in Eq. (3), which aims to seek the topk most similar nodes for a node. Instead of fully sorting all elements, we employ the introselect algorithm to find the element in the topk position without fully sorting other elements, which has average speed and worse case performance . Besides, for lines 313, the overall complexity is and note that, alias sampling in line 10 only requires time [?]. Algorithm 1 finally returns walks. The sliding window with length along each walk with length gives training pairs, and the overall pairs are . To train node embeddings, we maximize Eq. (5) by feeding all training pairs. The complexity for each pair is and the overall complexity is .
4 Experiments
The attributed networks tested in the experiments are summarized in table 1. MIT, Stanford, and UIllinois are three Facebook social networks for each university, and there are seven properties associated with each Facebook user: status flag, gender, major, second major, dorm, year, and high school [?]. We take out ”year” as classes, and the remaining six properties are converted into attributes using onehot encoding. The missing values are encoded by allzero. For the citation networks, we use the data preprocessed by [?]: the attributes for Cora and Citeseer are in binary vectors, but are in continuous vectors for Pubmed.
Datasets  Nodes  Links  Attributes  Classes 

MIT  6402  251230  2804  32 
Stanford  11586  568309  3306  37 
UIllinois  30795  1264421  2921  34 
Citeseer  3327  4732  3703  6 
Cora  2708  5429  1433  7 
Pubmed  19717  44338  500  3 
To simulate missing links i.e. incomplete structural information, we randomly remove a certain percentage of links for each of the six networks. To also investigate missing attributes i.e. incomplete attribute information, we introduce the three social networks with inherent missing attributes.
4.1 Baseline Methods and Settings
All the methods compared in the experiments are in unsupervised fashion i.e. no label is required during embedding.

DeepWalk [?]: It is one of the most successful PNE methods based on random walks, which considers only structural information.

AttrPure: It considers only attribute information by applying SVD for dimensionality reduction on the attributed similarity matrix as introduced in Eq (2).

TADW [?]: It jointly models attribute and structural information as a biconvex optimization problem under the framework of matrix factorization.

AANE [?]: It is similar to TADW, but the problem is solved in a distributed manner.

SAGEMean [?]: The idea is to aggregate attribute information from the neighboring nodes and then, take the elementwise mean over them. For fair comparison, we adopt its unsupervised version.

SAGEGCN: GCN was first proposed by [?], which is designed for semisupervised learning. For fair comparison, we adopt the unsupervised generalized GCN by [?].
We adopt the original source code of TADW, AANE, SAGEGCN and SAGEMean. For hyperparameters, we follow the suggestions by the original papers: 1) for all methods, node embedding dimension ; 2) for DeepWalk and ABRW, walks per node , walk length , window size , and topk value ; 3) for TADW, AANE and ABRW, the balancing factors are set to 0.2, 0.05 and 0.8 respectively; 4) for SAGEGCN and SAGEMean, learning rate, dropout rate, batch size, normalization, weight decaying rate, and epochs are set to [search: 0.01, 0.001, 0.0001], 0.5, 128, true, 0.001, and 100 respectively. We repeat the experiments for ten times and report their averages.
4.2 Link Prediction
Link prediction task is intended to predict missing links or potential links. We randomly preremove links as positive samples, and generate the equal number of nonexisting links as negative samples, and finally, all samples serve as ground truth. We further randomly remove different percentages of links and use the remaining links while learning embeddings. We employ cosine similarity as the measure to predict links and report AUC scores as shown in Figure 2.
For two citation networks Citeseer and Cora, our method outperforms all baseline methods for all different percentages of links preserved. DeepWalk receives the worst results due to that it cannot utilize attribute information, which however, is helpful for Citeseer and Cora, since we observe the impressive results by AttrPure. For all ANE methods except AANE (may not converge sometimes), they get better results when the available structural information is increasing.
For two social networks MIT and UIllinois, surprisingly, DeepWalk on MIT receives the best results when the percentages of links beyond , but our method obtains the best results (or equally good results as DeepWalk) for all other cases on MIT and UIllinois. We can explain such contrast results of DeepWalk on the social networks w.r.t. citation networks from two aspects: 1) The social networks have far more links than that on the citation networks as shown in Table 1; 2) The social networks have inherent missing attributes as mentioned above, which leads to less helpful attribute information and hence, the degenerated performances of ANE methods. Note that the results of AttrPure on the social networks are much worse than that on the citation networks.
4.3 Node Classification
Node classification task is intended to assign the existing labels to the nodes without labels e.g. classify social network users into groups for precision marketing. We randomly pick nodes with labels for training a classifier and then, the remaining ones serve as ground truth. Besides, we also randomly remove different percentages of links for embedding. We take onevsrest Logistic Regression as the classifier and report microF1 scores as shown in Figure 3.
For Citeseer, attribute information dominates structural information, since AttrPure significantly outperforms DeepWalk. Even in such extreme case, our method can obtain superior results for most cases, which implies our method can robustly utilize attribute information, even if the structural information is not such helpful when only few links preserved for embedding. For Cora, our method receives the best results for all cases, and TADW, SAGEGCN and SAGEMean obtain comparable results when there are sufficient links. Moreover, TADW on Citeseer and Cora drops sharply when the percentage of links is around due to factorizing too sparse structure information matrix leads to divergence.
For two social networks, the general tendency and findings are similar to link prediction task on the same datasets. Note that our method outperforms DeepWalk with a large margin for the percentages of links below , since our method can utilize attribute information for compensating the highly incomplete structural information.
4.4 2D Visualization
Visualization task further reduces the dimensionality of node embeddings to 2D by PCA, and then assigns different colors to nodes according to their labels. This task paves a way to intuitively explain why the resulting node embeddings can benefit downstream tasks. Here we show the 2D visualization results trained on MIT with randomly missing links and inherently missing attributes. MIT attributed social network has 32 classes/years as shown in Table 1, and there are 4932 nonisolated users (out of 6402) from year 2004 to 2009.
According to the results in Figure 2 and 3, the best four methods on MIT are selected for demonstration as shown in Figure 4. Regarding the overview of node embeddings, ABRW and DeepWalk show better results (more distinct clusters) than TADW and SAGEGCN, which is consistent with the results in Figure 2 and 3. But, DeepWalk obtains degraded results if a network becomes sparse as discussed above, and 2D visualization on Cora is also provided via hyperlink.
Besides, we observe an interesting phenomenon, namely temporal trend, by our method as shown in the six righthand side subfigures of the ABRW overview. It is in accordance with human common sense that students within a university have very close relationships if they graduate in the same year. And more interestingly, students also have relatively closer relationships if they graduate in nearby years (usually within 4 years). However if the year window is more than 4 years, the students graduated in 2009 have a big gap w.r.t. those students graduated in 2005 and 2004.
4.5 Parameter Sensitivity
We conduct link prediction ( of links and the equal number of nonexisting links serve as ground truth) to analyze the sensitivity of hyperparameters and topk.
The hyperparameter is the balancing factor: gives bias to attribute information, whereas gives bias to structural information. For three citation networks, they all receive comparable results, and hence, the choices of can be flexible. The reason behind is that structural and attribute information are informative enough respectively, which may be observed from Figure 2 and 3 that both AttrPure and DeepWalk (with sufficient links) can obtain satisfactory results. For three social networks, the larger gives the better results. The reason behind is that structural information is more informative than attribute information, since DeepWalk (with sufficient links) significantly outperforms AttrPure.
The hyperparameter topk is used to preserve the topk attribute similar nodes and remove the rest of dissimilar nodes. For three citation networks, the results vary about w.r.t. different choices of topk; whereas, for three social networks, the results are quite stable. This is because the social networks have far more links than the citation networks as shown in Table 1, so that the richer structural information can better offset the inappropriate neighboring nodes added by attribute information while reconstructing the network.
5 Related Works
There are two large categories of related works: 1) PNE methods by considering pure structural information, 2) ANE methods by considering structural and attribute information.
Purestructure based NE (PNE) method: Although this category of methods ignores attribute information, they can still obtain node embeddings based on structural information. DeepWalk [?] applies truncated random walks to obtain node sequences which are then fed into Word2Vec model so as to embed nodes closer if they cooccur more frequently. Node2Vec [?] can be viewed as the extension due to it employs more flexible truncated walks to capture network structure. Besides, there are many other PNE methods such as LINE [?] and HOPE [?]. Nevertheless, these methods are not ideal for incomplete attributed networks, since they can only utilize incomplete structural information.
Convex Optimization based ANE method: TADW [?] and AANE [?] fall into this category. They first transform structural and attribute information into two matrices respectively, and then, formulate ANE problem as a biconvex optimization problem where the objective is to jointly minimize some distance measure between structural information matrix and embedding matrix, and also, between attribute information matrix and embedding matrix. We find that they may not converge sometimes when the structural information matrix becomes too sparse.
Graph Convolution based ANE method: Two representative methods of this category are GCN [?] and graphSAGE [?]. They first define node neighbors or receptive field based on network structure, and then, aggregate neighboring attribute information for further computing. These methods are not robust, as different levels of incompleteness change the definition of node neighbors, and hence their attributes to be aggregated.
Deep Neural Networks based ANE method: ASNE [?] is the representative method in this category. It uses carefullydesigned stacked neural networks to learn a mapping where the input and output are two node embeddings of a pair of linked nodes respectively. In other words, one link gives one training data, and obviously, incomplete structural information gives less training data, which is not desired for training a deep neural networks model.
6 Conclusion and Future Work
To address the challenges of embedding incomplete attributed networks, we propose an ANE method, namely Attributed Biased Random Walks (ABRW). The idea is to reconstruct a unified denser network by fusing structural and attribute information for information enhancement, and then employ a weighted random walks based network embedding method for learning node embeddings. Several experiments confirm the effectiveness of our method to incomplete attributed networks. Besides, ABRW can be viewed as a novel general framework to learn embeddings of any objects (not necessary to form a network) with multiple sources of information as long as the information can be encoded in transition matrices (what ABRW only requires), which will serve as future work.
References
 [Cai et al., 2018] Hongyun Cai, Vincent W Zheng, and Kevin Chang. A comprehensive survey of graph embedding: Problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering, 2018.
 [Cao et al., 2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891–900, New York, NY, USA, 2015. ACM.
 [Cui et al., 2018] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 2018.
 [Gao and Huang, 2018] Hongchang Gao and Heng Huang. Deep attributed network embedding. In IJCAI, pages 3364–3370, 2018.
 [Goyal and Ferrara, 2018] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. KnowledgeBased Systems, 151:78–94, 2018.
 [Grover and Leskovec, 2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855–864. ACM, 2016.
 [Hamilton et al., 2017a] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024–1034, 2017.
 [Hamilton et al., 2017b] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 40(3):52–74, 2017.
 [Huang et al., 2017] Xiao Huang, Jundong Li, and Xia Hu. Accelerated attributed network embedding. In SDM, pages 633–641. SIAM, 2017.
 [Kipf and Welling, 2017] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 [Levy and Goldberg, 2014] Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In NIPS, pages 2177–2185, 2014.
 [Liao et al., 2018] Lizi Liao, Xiangnan He, Hanwang Zhang, and TatSeng Chua. Attributed social network embedding. IEEE Transactions on Knowledge and Data Engineering, 2018.
 [Lin et al., 2012] Wangqun Lin, Xiangnan Kong, Philip S Yu, Quanyuan Wu, Yan Jia, and Chuan Li. Community detection in incomplete information networks. In WWW, pages 341–350. ACM, 2012.
 [Lü and Zhou, 2011] Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150–1170, 2011.
 [McPherson et al., 2001] Miller McPherson, Lynn SmithLovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1):415–444, 2001.
 [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013.
 [Ou et al., 2016] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105–1114. ACM, 2016.
 [Pan et al., 2016] Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. Triparty deep network representation. In IJCAI, pages 1895–1901, 2016.
 [Perozzi et al., 2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, pages 701–710. ACM, 2014.
 [Řehůřek and Sojka, 2010] Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.
 [Strehl et al., 2000] Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. Impact of similarity measures on webpage clustering. In Workshop on artificial intelligence for web search (AAAI 2000), volume 58, page 64, 2000.
 [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, pages 1067–1077, 2015.
 [Traud et al., 2012] Amanda L Traud, Peter J Mucha, and Mason A Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165–4180, 2012.
 [Tsur and Rappoport, 2012] Oren Tsur and Ari Rappoport. What’s in a hashtag?: Content based prediction of the spread of ideas in microblogging communities. In WSDM, pages 643–652. ACM, 2012.
 [Wang et al., 2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In KDD, pages 1225–1234, New York, NY, USA, 2016. ACM.
 [Wei et al., 2017] Xiaokai Wei, Linchuan Xu, Bokai Cao, and Philip S. Yu. Cross view link prediction by learning noiseresilient representation consensus. In WWW, pages 1611–1619, Republic and Canton of Geneva, Switzerland, 2017. International World Wide Web Conferences Steering Committee.
 [Yang et al., 2015] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. Network representation learning with rich text information. In IJCAI, pages 2111–2117. AAAI Press, 2015.
 [Yang et al., 2016] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semisupervised learning with graph embeddings. In ICML, pages 40–48, 2016.