ActiveHNE: Active Heterogeneous Network Embedding
Abstract
Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or diverse relationships between nodes. Existing HNE methods are typically unsupervised. To maximize the profit of utilizing the rare and valuable supervised information in HNEs, we develop a novel Active Heterogeneous Network Embedding (ActiveHNE) framework, which includes two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN). In DHNE, we introduce a novel semisupervised heterogeneous network embedding method based on graph convolutional neural network. In AQHN, we first introduce three active selection strategies based on uncertainty and representativeness, and then derive a batch selection method that assembles these strategies using a multiarmed bandit mechanism. ActiveHNE aims at improving the performance of HNE by feeding the most valuable supervision obtained by AQHN into DHNE. Experiments on public datasets demonstrate the effectiveness of ActiveHNE and its advantage on reducing the query cost.
ActiveHNE: Active Heterogeneous Network Embedding
Xia Chen , Guoxian Yu , Jun Wang , Carlotta Domeniconi , Zhao Li , Xiangliang Zhang
College of Computer and Information Sciences, Southwest University, Chongqing, China
Department of Computer Science, George Mason University, VA, USA
Alibaba Group, Hangzhou, China
CEMSE, King Abdullah University of Science and Technology, Thuwal, SA
{xchen, gxyu, kingjun}@swu.edu.cn, carlotta@cs.gmu.edu, {lizhao.lz}@alibabainc.com, xiangliang.zhang@kaust.edu.sa
1 Introduction
Networks are pervasive in a wide variety of realworld scenarios, ranging from popular social networks, to citation graphs and gene regulatory networks. Network embedding (NE), also known as network representation learning (NRL), enables us to capture the intrinsic information of the network data by embedding it into a lowdimensional space. Effective NE approaches can facilitate downstream network analysis tasks, such as node classification, community discovery, and link prediction [?].
Heterogeneous information networks (HINs), which involve diverse node types and/or diverse relationships between nodes, are ubiquitous in realworld scenarios [?]. Although NE for homogeneous networks with single type of nodes and single type of relationships has been extensively studied [?; ?; ?; ?], the rich structure of HINs presents a major challenge for heterogeneous networks embedding (HNE), since nodes in different types should be treated differently (Challenge 1) [?; ?; ?; ?; ?].
Most of the current HNE approaches are unsupervised. One can improve the performance of HNE by properly leveraging supervised information (Challenge 2). However, label acquisition is usually difficult and expensive due to the involvement of human experts (Challenge 3). For Challenge 3, active learning (AL), a technique widely used to acquire labels of nodes during learning, can be adopted to save cost. The selection of labeled data for model training can have significant influence on the prediction stage. AL is expected to find the most valuable nodes to label with reduced query cost [?]. However, since nodes in a heterogeneous network are not independently and identically distributed (noni.i.d.), but connected with links, AL with networks should account for data dependency. In addition, for HINs, the different node types should also be considered.
Based on the high efficiency of graph convolution networks (GCNs) [?] in utilizing label information, we propose a novel Active Heterogeneous Network Embedding framework (called ActiveHNE) to address the above three challenges. ActiveHNE includes two components, Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN), as illstrated in Figure 1. They are described below.

In DHNE, we introduce a semisupervised discriminative heterogeneous network embedding method based on graph convolutional neural networks. Since different types of nodes and relationships should be treated differently, we first decompose the original HIN into homogeneous networks and bipartite networks. For each convolutional layer, DHNE separately learns the deep semantic meanings of nodes in each obtained network, and then concatenates the output vectors of each node from all networks.

In AQHN, besides the network centrality, we introduce two active selection strategies, namely convolutional information entropy and convolutional information density for HINs with respect to uncertainty and representativeness. In particular, these strategies take advantage of the dependency among nodes and the heterogeneity of HINs by using local convolution, whose filter parameters are defined by the node importance (meassured by the number of node types of neighbors and the degree). Then, we iteratively query the most valuable batch of nodes by combining the three strategies using the multiarmed bandit mechanism [?].
This work makes the following contributions. (i) We formalize the active heterogeneous network embedding problem, whose objective is to seek the most valuable nodes to query and to improve the performance of HNE using the queried labels. (ii) We present a novel heterogeneous graph convolutional neural network model for node embedding and node classification. (iii) Considering the data dependency among nodes and the heterogeneity of networks, we propose a new active learning method to select the most valuable nodes to label by leveraging local convolution and the multiarmed bandit mechanism. Extensive experimental study on three realworld HINs demonstrate the effectiveness of ActiveHNE on embedding HINs, and on saving the query cost.
2 Related Work
Most of the previous approaches on HNE are unsupervised [?; ?; ?; ?]. Recently, methods have been proposed to leverage metapaths, either specified by users or derived from additional supervision [?; ?; ?]. However, the choice of metapaths strongly depends on the task at hands, thus limiting their ability of generalization [?]. In addition, they enrich the neighborhood of nodes, resulting in a denser network and in higher training costs [?].
Graph neural networks (GNNs) are another widely studied approach to leverage supervision [?]. GNNs have the ability to extract multiscale localized spatial features, and compose them to construct highly expressive representations. Among all GNN approaches, graph convolution networks (GCNs) play a central role in capturing structural dependencies [?; ?]. A comprehensive survey of the literature shows that the majority of current GNNs are designed for homogeneous networks only. GNNs are rarely explored for heterogeneous networks [?], and they are trained based on discretionary supervision.
One can improve the embedding performance by acquiring the labels of the most valuable nodes via AL. However, AL on noni.i.d. network data is seldom studied. In addition, the diversity of node types in HINs makes the query criterion of AL even harder to design. Although attempts have been made to improve the embedding performance by incorporating AL, they neither consider the dependence between nodes, nor the heterogeneity of networks [?; ?; ?].
3 The ActiveHNE Framework
In this section, we present our Active Heterogeneous Network Embedding framework, called ActiveHNE. The architecture of ActiveHNE is given in Figure 1. ActiveHNE consists of two components: Discriminative Heterogeneous Network Embedding (DHNE) and Active Query in Heterogeneous Networks (AQHN), which are elaborated in the following subsections.
3.1 Discriminative Heterogeneous Network Embedding (DHNE)
It’s difficult to perform convolutions on networks due to the lack of an Euclidean representation space. In addition, HINs involve different types of nodes and relationships, each requiring its own processing, and further increasing the challenge of computing convolutions. To address this issue, we first divide the original HIN into homogeneous networks and bipartite networks (the latter involving two types of nodes). Let be the collection of obtained homogeneous networks and bipartite networks, and denotes the adjacency matrices corresponding to . The spectral graph convolution theorem defines the convolution in the Fourier domain based on the normalized graph Laplacian , where is the identity matrix and denotes the degree matrix [?]. For each convolutional layer in a layerwise convolutional neural network, DHNE separately convolves and learns the deep semantic meanings of nodes in each obtained network, and then concatenates the output vectors of each node from all networks.
Since the nodes’ degree distribution in an HIN may vary greatly, and the interaction between two connected nodes may be directed, an asymmetric matrix , instead of the symmetric , is more suitable to define the Fourier domain. is the transition probability matrix.
In this paper, we separately convolve on each obtained network using the transition probability matrix as Fourier basis. Specifically, let , where and are the eigenvector matrix and the diagonal matrix of eigenvalues of , respectively. The convolution on each obtained network is defined as follows:
(1) 
where is the input signal of the network ( and denote the number of nodes and the number of features of each node in , respectively). Eq. (1) gives the product of the signal with a filter in the Fourier domain. denotes the Fourier transform of .
To convolve the local neighbors of the target node, we define as a polynomial filter up to order [?; ?] as follows:
(2) 
where is a vector of polynomial coefficients. Thus, we have:
(3) 
The convolution on only depends on the nodes that are at most steps away from the target node. The filter parameters can be shared over the whole network . Moreover, we generalize Eq. (3) to filters for feature maps, i.e., we map the original feature dimension to . Then, we define the convolution operations on the network as follows:
(4) 
where and denote the matrix of filter parameters (the trainable weight matrix) and the convolved signal matrix (output signals), respectively. We use for as the activation function.
So far, we have performed the convolutions separately on each individual network. Then, we concatenate in order the vectors of the convoluted signals to obtain the final output signals for each node, according to the network it belongs to. For a node that is not an element of a network, we use a zero vector to represent the corresponding output signals. Let denote the concatenated convoluted signals of nodes in , we define the layerwise convolution on as follows:
(5) 
where , , and denote the activations (input signals), the matrix of filter parameters (the trainable weight matrix), and the convolved signal matrix (output signals) in the th layer, respectively. is the embedding dimension of the th layer, and is the number of networks. Specifically, and .
After layers of convolutions and concatenations, we obtain the final output vectors of all nodes as . To obtain a discriminative embedding, we leverage supervision (i.e., label information) by adding a fully connected layer to predict the labels of nodes as follows:
(6) 
where is the hiddentooutput weight matrix). is the probability indicator matrix between nodes and labels in the dimensional label space. The activation function in the last layer is the softmax function, which is defined as . Finally, the supervised loss function is defined as the crossentropy error over all labeled nodes as follows:
(7) 
where stores the probability that the th node belongs to class . As such, Eqs. (6) and (7) enable a semisupervised model for discriminative node embedding. The label of node can be predicted as .
3.2 Active Query in Heterogeneous Networks (AQHN)
In DHNE, we perform a semisupervised heterogeneous network embedding, which requires the participation of label information. However, label acquisition is usually difficult and expensive due to the involvement of human experts. More importantly, different supervision may lead to different embedding performance. To train a more effective DHNE, we propose an active query component, AQHN, to acquire the most valuable supervision within a given budget.
Uncertainty and representativeness are widely used criteria to select samples for query in AL. Uncertainty selects the sample that the current classification model is least certain, while representativeness selects the sample that can well represent the overall input patterns of unlabeled data. Empirical studies have shown that combining the two criteria can make more efficient selection strategies [?]. In the following, we first introduce three active selection strategies (Network Centrality, Convolutional Information Entropy, and Convolutional Information Density) for HINs based on uncertainty and representativeness. Then, we leverage the multiarmed bandit mechanism [?] to combine these strategies to adaptively and iteratively select the most valuable batch of nodes to query.
Selection Strategy
Network centrality (NC). NC (e.g., degree centrality and closeness centrality) [?] is an effective measure to evaluate the representativeness of nodes. In this paper, we simply use degree centrality, which is defined as , to evaluate the centrality of nodes. includes all the direct neighbors of . Other measures of network centrality in HINs will be studied later.
Nodes in an HIN are noni.i.d. and are connected by links, which reflect the dependency among nodes. Inspired by the idea of spectral graph convolution that defines the convoluted signal as a linear weighted sum of its neighbor signals, we propose two novel active strategies to select nodes for query in HINs based on a convolution of neighbors. We first define the convolution parameters (i.e., the weight parameters) and then the two selection strategies. Let to quantify the importance of node . is the hyperbolic tangent function. Here and represent the number of neighbor nodes of and the number of node types of these neighbors. and are the total number of nodes and node types in the whole network, respectively. A larger value of or implies that more complex information is conveyed by , and thus may be more important to its neighbor nodes. In the following, we use as the weight parameters for convolving neighbors.
Convolutional Information Entropy (CIE). Information Entropy is a widely used metric to evaluate uncertainty. In this paper, we evaluate the uncertainty of node using CIE as follows:
(8) 
The uncertainty of is a weighted sum of the uncertainties of its neighbors and itself.
Convolutional Information Density (CID). The representativeness of nodes in the embedding space is also crucial to measure the value of nodes. We apply means clustering on the embedding to calculate the information density of nodes, due to its high efficiency. The number of clusters for means is simply set to the number of class labels. CID of based on its neighbors is quantified as follows:
(9) 
where is the distance metric (i.e., Euclidean distance) in the embedding space, is the center vector of the cluster to which belongs. is the embedding of the th node. and belong to the same space.
MultiArmed Bandit for Active Node Selection
We select the most valuable nodes by leveraging the above three selection strategies. In particular, we study the batch mode setting, in which we query nodes in each iteration. First, we select top nodes with the highest , , and scores as the initial candidates of each selection strategy in each iteration, respectively. To jointly select the most valuable nodes from all selection strategies, one can evaluate the score of each node by using the weighted sum of scores of each strategy, where the weights capture the importance of corresponding strategies. Then, the problem of active node selection is transformed into the estimation of the importance of each strategy. But the importance of each strategy is timesensitive and thus difficult to be specified [?; ?]. We introduce a novel method to adaptively learn the dynamic weight parameters based on the multiarmed bandit mechanism. The wellknown multiarmed bandit (MAB) problem is a simplified version of the reinforcement learning problem [?; ?], which explores what a player should do given a bandit machine with arms and a budget of iterations. In each iteration, an agent plays one of the arms to receive a reward. The objective is to maximize the cumulative reward. Combinatorial MAB (CMAB) [?], an extension of MAB, allows to play multiple arms in each iteration.
Based on the idea of the CMAB, we can view each selection strategy as an arm, and approximate the importance of each strategy by estimating the expected reward (i.e., utility) of the corresponding arm. Let be the initial candidate set of arm in iteration , and be the actually queried set of nodes in that iteration. Intuitively, the actual reward of arm can be defined as:
(10) 
where is the available labeled set of nodes in iteration . is the set of queried nodes that are dominated by arm in iteration . is the classifier trained on , and is the classification performance of . We observe that for the current iteration can’t be computed since the groundtruth of is unavailable. The empirical reward is typically used to estimate the expected reward of arms. But computing the reward of each arm in each iteration is very timeconsuming; as such, we estimate the reward of each arm using the local embedding changes of nodes caused by the arm.
We first define the local embedding changes caused by arm in iteration as follows:
(11) 
where is the distance metric (e.g., Euclidean distance), is the neighbors of , and is the node embedding of in iteration . To achieve a fair comparison and avoid bias, the empirical reward of arm in iteration is estimated as , where denotes the local embedding changes caused by all arms (or ). Note that, in iteration , . The reason is that there may be overlap between different . Due to the fact that the importance of each selection strategy changes over time, we use the average of the last two empirical rewards to estimate the current expected reward as follows:
(12) 
To mitigate the explorationexploitation dilemma of CMAB, the combinatorial upper confidence bound algorithm [?] estimates expected rewards based on the empirical rewards and the number of times an arm is explored. In the same way, we adjust as , where denotes the total number of nodes queried by arm . This adjustment can boost the expected reward of underexplored arms to avoid dismissing a potentially optimal strategy without sufficient evidences.
After this, to avoid selecting extreme or highly controversial nodes, we estimate the expected reward of unqueried nodes in iteration using the weighted Borda count as follows:
(13) 
where is the rank order of node in arm in iteration (sorted in descending order of scores). Finally, the top nodes (from ) with the highest are selected as the query batch in iteration .
4 Experiments
4.1 Experimental Setup
Datasets: we evaluate our ActiveHNE on three realworld HINs extracted from DBLP^{1}^{1}1https://dblp.unitrier.de/db/, Cora [?], and MovieLens^{2}^{2}2https://grouplens.org/datasets/movielens/. The extracted DBLP consists of 14K papers, 20 conferences, 14K authors, and 9K terms, with a total of 171K links. The extracted MovieLens includes 9.7K movies, 12K writers, 4.9K directors, 0.6K users, and 1.5K tags, with a total of 140K links. The extracted Cora has 25K authors, 19K papers, and 12K terms, with 146K links.
Baselines: we compare ActiveHNE against the following stateoftheart methods and a variant of ActiveHNE that randomly selects nodes to query (in a kind of naive AL setting):

GCN [?]: a semisupervised network embedding model, with no consideration of networks heterogeneity. To adapt GCN in AL setting, nodes are randomly selected for query (in naive AL setting).

metapath2vec [?] and HHNE [?]: two unsupervised HNE methods also adapted in the naive AL setting.

AGE [?] and ANRMAB [?]: two active network embedding methods without considering the dependence between nodes and the heterogeneity of networks.

DHNE: a variant of ActiveHNE that randomly selects nodes to query in naive AL setting.
For the proposed DHNE, we simply set , and leave the investigation on to further work. We train DHNE using a network with two convolutional layers and one fully connected layer as described in Section 3.1, with a maximum of 200 epochs (training iterations) using Adam. The dimensionality of the two convolutional filters is 16 and , respectively. We use an regularization factor for all the three layers. The remaining parameters are fixed as in GCN [?]. For metapath2vec and HHNE, we apply the commonly used metapath schemes “APA” and “APCPA” on DBLP and Cora, and we use “DMTMD” and “DMUMD” on MovieLens to guide metapathbased random walks. The walk length and the number of walks per node are set to 80 and 40 as in HHNE, respectively. To evaluate the classification performance of metapath2vec and HNNE, we train a logistic regression classifier using the respective embedding of nodes.
For a fair comparison, we use the proposed DHNE as the basic embedding and classification method for all active learning methods (AGE and ANRMAB) in the experiments. The nonAL methods (i.e., DHNE, GCN, metapath2vec, HHNE), randomly select the nodes to label in each iteration. Following the experimental settings in [?], we randomly divide the labeled nodes into three parts: the training set (25% of the labeled nodes), the validation set (25% of the labeled nodes for hyperparameter optimization in DHNE), and the remaining as the testing set. For AL settings, the training set is used as the unlabeled pool (). ActiveHNE can work with the zerostart setting (i.e., no labeled nodes, , at the beginning of active learning) using . AGE and ANRMAB operate in the same manner as ActiveHNE. We run each method ten times and report the average results.
4.2 Comparison against Stateoftheart Methods
Figure 2 shows the accuracy of all the comparing methods on the three datasets, as a function of the number of iterations. We set the batch size for Cora and MovieLens, and for DBLP, to display the difference in accuracy with respect to the number of iterations.
From Figure 2, we can make the following observations:
(i) Active vs. naiveactive.
ActiveHNE, an active version of DHNE, significantly outperforms naiveactive methods (DHNE, GCN, HHNE and metapath2vec), which randomly select nodes for query. It proves that AL is conducive to improve embedding for classification purpose.
(ii) ActiveHNE vs. other active methods. ActiveHNE outperforms other ALassisting methods (ANRMAB and AGE) on MovieLens and Cora, and has comparable performance with ANRMAB on DBLP. Since these three methods use the same embedding module and only differ on the active learning strategy, the superior performance of ActiveHNE validates the effectiveness of our designed active query strategy. Although ANRMAB and AGE are based on the same DHNE. They lose to DHNE in most cases. That is because these methods don’t consider the heterogeneity and dependency of nodes in HINs. These results demonstrate the effectiveness of our proposed AQHN for DHNE.
(iii) DHNE vs. other network embedding methods. DHNE significantly outperforms the three representative network embedding methods (GCN, HHNE and metapath2vec), when they are all in naiveAL setting. This observation shows the superiority of DHNE in embedding HINs, and it also justifies the rationality of dividing HINs into homologous networks and bipartite networks.
4.3 Effectiveness of Individual Selection Strategy
In Section 3.2, we use three node selection strategies: NC, CIE and CID. The latter two are our proposed novel strategies. To validate their effectiveness, we introduce five variants:
The same settings in Figure 2 are used, and results are shown in Figure 3. We can conclude the following:
(i) ActiveHNE achieves the best accuracy among its variants. Although ActiveHNEcie also obtains the best accuracy as ActiveHNE on Cora, it significantly loses to ActiveHNE on MovieLens. These results support the rationality and effectiveness of ActiveHNE in combining three active selection strategies, since one particular strategy cannot fit all datasets.
(ii) ActiveHNEcie and ActiveHNEcid
achieve a better accuracy than ActiveHNEie and ActiveHNEid, respectively. This result corroborates the effectiveness of our proposed CIE and CID in selecting the most uncertain nodes and most representative nodes.
To intuitively show the importance of each selection strategy during the AL process, we report the reward changes of different strategies in Figure 4, where the initial reward of each strategy is one. From Figure 4, we can see that the reward of NC gradually reduces as the number of iterations increases. The reason is that the embedding model doesn’t perform well in the initial stage of AL because of scarce labels, and CIE and CID depend on the outputs of the embedding model while NC does not. At the beginning, NC contributes to reduce the effect of the bias induced by CIE and CID. As the number of iterations increases, CIE and CID become more reasonable and thus the importance of NC decreases. We observe that the sum of the rewards of NC, CIE, and CID is greater than or equal to one. This is because the nodes selected by the strategies overlap. Thus, the fact that the reward of NC goes down doesn’t mean that the rewards of CIE and CID go up.
4.4 Runtime Analysis
ActiveHNE consists of two components, DHNE and AQHN. We separately compare the runtimes of DHNE and AQHN with those of the other methods (GNC, metapath2vec, and HHNE for network embedding; AGE and ANRMAB for active learning).
Table 1 reports the empirical runtimes of the methods on a server with Intel Xeon E52678v3, 256GB RAM, and Ubuntu 16.04.4. Table 1 shows:
– for the NE part, DHNE is slower than GCN, but significantly faster than metapath2vec and HHNE. DHNE divides the original HINs into multiple subnetworks, and trains the model parameters on each subnetwork. As such, it optimizes more parameters than GCN, despite the fact that both use graph convolutionbased network embedding.
– for the AL part, we admit that AQHN is slower than AGE and ANRMAB, but actively acquired more effective nodes for achieving better embedding results, as shown in Figure 2.
MovieLens  Cora  DBLP  Total  

NE  DHNE  
GCN  
metapath2vec  
HHNE  
AL  AQHN  
AGE  
ANRMAB 
5 Conclusion
In this paper, we studied how to achieve active discriminative heterogeneous network embedding by optimally acquiring and using labels of network nodes. The proposed framework ActiveHNE extends graph convolution networks to heterogeneous networks by dividing the given network into multiple homogeneous and bipartite subnetworks, and performing convolutions on these networks. Three different query strategies based on the convoluted nodes are combined to query the labels of selected nodes, which are fed back for the next round of discriminative network embedding. ActiveHNE achieves superior performance to other methods both in terms of accuracy and efficiency. The code of ActiveHNE will be made publicly available.
References
 [Cai et al., 2017a] Hongyun Cai, Vincent W. Zheng, and Kevin ChenChuan Chang. Active learning for graph embedding. CoRR, abs/1705.05085, 2017.
 [Cai et al., 2017b] Hongyun Cai, Vincent W. Zheng, and Kevin ChenChuan Chang. A comprehensive survey of graph embedding: Problems, techniques and applications. TKDE, 30(9):1–1, 2017.
 [Chang et al., 2015] Shiyu Chang, Han Wei, Jiliang Tang, Guo Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. Heterogeneous network embedding via deep architectures. In KDD, pages 119–128, 2015.
 [Chen et al., 2013] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multiarmed bandit: General framework, results and applications. In ICML, pages 1–9, 2013.
 [Chen et al., 2018] Hongxu Chen, Hongzhi Yin, Weiqing Wang, Hao Wang, Quoc Viet Hung Nguyen, and Xue Li. Pme: projected metric embedding on heterogeneous networks for link prediction. In KDD, pages 1177–1186, 2018.
 [Defferrard et al., 2016] Michal Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3844–3852, 2016.
 [Dong et al., 2017] Yuxiao Dong, Nitesh V. Chawla, Ananthram Swami, Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD, pages 135–144, 2017.
 [Freeman, 1978] Linton C. Freeman. Centrality in social networks conceptual clarification. Social Networks, 1(3):215–239, 1978.
 [Fu et al., 2017] Taoyang Fu, WangChien Lee, and Zhen Lei. Hin2vec: Explore metapaths in heterogeneous information networks for representation learning. In CIKM, pages 1797–1806, 2017.
 [Goyal and Emilio, 2018] Palash Goyal and Ferrara Emilio. Graph embedding techniques, applications, and performance: A survey. KnowledgeBased Systems, 151(1):78–94, 2018.
 [Gui et al., 2017] Huan Gui, Jialu Liu, Fangbo Tao, Jiang Meng, and Jiawei Han. Largescale embedding learning in heterogeneous event data. In ICDM, pages 907–912, 2017.
 [Huang et al., 2014] Sheng Jun Huang, Jin Rong, and Zhi Hua Zhou. Active learning by querying informative and representative examples. TPAMI, 36(10):1936–1949, 2014.
 [Kipf and Welling, 2017] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, pages 1–14, 2017.
 [Li et al., 2018] Gao Li, Yang Hong, Zhou Chuan, Wu Jia, Pan Shirui, and Hu. Yue. Active discriminative network representation learning. In IJCAI, pages 2142–2148, 2018.
 [Perozzi et al., 2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, pages 701–710, 2014.
 [Settles, 2009] Burr Settles. Active learning literature survey. Technical report 1648, University of WisconsinMadison, 2009.
 [Shi et al., 2017] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. A survey of heterogeneous information network analysis. TKDE, 29(1):17–37, 2017.
 [Shi et al., 2018a] Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. Aspem: Embedding learning by aspects in heterogeneous information networks. In SDM, pages 144–152, 2018.
 [Shi et al., 2018b] Yu Shi, Qi Zhu, Fang Guo, Zhang Chao, and Jiawei Hang. Easing embedding learning by comprehensive transcription of heterogeneous information networks. In KDD, pages 2190–2199, 2018.
 [Sun et al., 2009] Yizhou Sun, Jiawei Han, Jing Gao, and Yintao Yu. itopicmodel: Information networkintegrated topic modeling. In ICDM, pages 493–502, 2009.
 [Sutton and Barto, 1998] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. TNN, 9(5):1054–1054, 1998.
 [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, pages 1067–1077, 2015.
 [Vermorel and Mohri, 2005] Joanns Vermorel and Mehryar Mohri. Multiarmed bandit algorithms and empirical evaluation. In ECML, pages 437–448, 2005.
 [Wang et al., 2016] Daixin Wang, Cui Peng, and Wenwu Zhu. Structural deep network embedding. In KDD, pages 1225–1234, 2016.
 [Wang et al., 2019] Xiao Wang, Yiding Zhang, and Chuan Shi. Hyperbolic heterogeneous information network embedding. In AAAI, pages 1–8, 2019.
 [Wu et al., 2019] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. In arXiv:1901.00596v1, 2019.
 [Xu et al., 2017] Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S. Yu. Embedding of embedding: Joint embedding for coupled heterogeneous networks. In WSDM, pages 741–749, 2017.
 [Zhang et al., 2017] Ye Zhang, Matthew Lease, and Byron C. Wallace. Active discriminative text representation learning. In AAAI, pages 3386–3392, 2017.
 [Zhang et al., 2018] Yizhou Zhang, Yun Xiong, Xiangnan Kong, Shanshan Li, Jinhong Mi, and Yangyong Zhu. Deep collective classification in heterogeneous information networks. In WWW, pages 399–408, 2018.
 [Zhou et al., 2018] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: A review of methods and applications. In arXiv:1812.08434, 2018.