motif2vec: Motif Aware Node Representation Learning for Heterogeneous Networks
Abstract
Recent years have witnessed a surge of interest in machine learning on graphs and networks with applications ranging from vehicular network design to IoT traffic management to social network recommendations. Supervised machine learning tasks in networks such as node classification and link prediction require us to perform feature engineering that is known and agreed to be the key to success in applied machine learning. Research efforts dedicated to representation learning, especially representation learning using deep learning, has shown us ways to automatically learn relevant features from vast amounts of potentially noisy, raw data. However, most of the methods are not adequate to handle heterogeneous information networks which pretty much represents most real world data today. The methods cannot preserve the structure and semantic of multiple types of nodes and links well enough, capture higherorder heterogeneous connectivity patterns, and ensure coverage of nodes for which representations are generated. In this paper, we propose a novel efficient algorithm, motif2vec that learns node representations or embeddings for heterogeneous networks. Specifically, we leverage higherorder, recurring, and statistically significant network connectivity patterns in the form of motifs to transform the original graph to motif graph(s), conduct biased random walk to efficiently explore higher order neighborhoods, and then employ heterogeneous skipgram model to generate the embeddings. Unlike previous efforts that uses different graph metastructures to guide the random walk, we use graph motifs to transform the original network and preserve the heterogeneity. We evaluate the proposed algorithm on multiple realworld networks from diverse domains and against existing stateoftheart methods on multiclass node classification and link prediction tasks, and demonstrate its consistent superiority over prior work.
Author Terms heterogeneous information networks, network embedding, network representation learning, feature learning, motifs
I Introduction
Recent years have witnessed a surge of interest in machine learning on graphs and networks with applications ranging from vehicular network design to IoT traffic management to drug discovery to social network recommendations. Graphbased data representation enables us to understand objects with respect to the neighboring world instead of just observing them in isolation. Thus, there is an increasing trend of representing data, that is not naturally connected, as graphs. Examples include item graph constructed from users’ behavior history that is originally sequential in nature [35], product review graph constructed from reviews written by users for stores [34], credit card fraud network constructed from fraudulent and nonfraudulent transaction activity data [33], etc.
Supervised machine learning tasks over nodes and links in networks^{1}^{1}1We use the term network (nodes, links) and graph (vertices, edges) interchangeably throughout the paper. such as node classification and link prediction require us to perform feature engineering that is known and agreed to be the key to success in applied machine learning. However, feature engineering is challenging and tedious since the traditional process relies on domain knowledge, intuition, data manipulation, and manual intervention. Research efforts dedicated to representation learning, i.e., learning representations of the data that make it easier to extract useful information when training classifiers or other predictors, has shown us ways to automatically learn relevant features from vast amounts of potentially noisy, raw data. Of particular interest to the academic and industry research community has been representation learning using deep learning [2] that are formed by the composition of multiple nonlinear transformations with the goal of yielding more useful representations. There has been a series of work over the past demi decade that focuses on graph node representation or graph embedding algorithms [7]. The common goal of these works is to obtain a lowdimensional feature representation of each node of the graph such that the method is scalable and the vector representation preserves some structure and connectivity pattern between individual nodes in the graph. The graph embedding methods are broadly classified into three categories namely factorization based, random walk based, and deep learning based with applications in network compression, visualization, clustering, link prediction, and node classification [7]. Among the three categories, random walk based graph embedding techniques have emerged to be the most popular since they help approximate many network properties, are useful when network is too large to measure in its entirety, and can work with partially observable network. The popular randomwalk based methods include DeepWalk [18], node2vec [8], LINE [29], HARP [5], etc.
However, most of these methods are designed for homogeneous networks and are inadequate to handle heterogeneous information networks, i.e., networks with multiple types of nodes and links, which pretty much represents most real world data today. Contemporary information networks like Facebook, DBLP, Yelp, Flickr, etc. contain multitype interacting components. For example, social network Facebook has different types of objects (nodes) such as users, posts, photos as well as different kinds of associations (links) such as useruser friendship, personphoto tagging relationship, postpost replying relationship, etc. Researchers today acknowledge that heterogeneous networks fuse more information and support richer semantic representation of the real world [22][24]. They also emphasize that data mining approaches designed for homogeneous graphs are not wellsuited to handle heterogeneous graphs. For example, classification in homogeneous networks is traditionally done on objects of the same entity type, makes strong assumptions on the network structure, and assumes that data is independently and identically distributed (i.i.d.). Contrarily, classification in heterogeneous networks need to simultaneously classify multiple types of objects which may be organized arbitrarily and may violate the i.i.d assumption. Thus, there is an innate need to develop graph embedding methods for heterogeneous networks.
Dong et al. formally introduced the problem and proposed a novel algorithmic solution metapath2vec [6] that leverages metapath, the most popular graph metastructure for heterogeneous network mining [24]. A more recent work proposed metagraph2vec [37] that leverages metagraph in order to capture richer structural contexts and semantics between distant nodes. Other heterogeneous network embedding methods include PTE [28] that is a semisupervised representation learning method for text data; HNE [4] that learns representation for each modality of the network separately and then unifies them into a common space using linear transformations; LANE [11] that generates embeddings for attributed networks; and ASPEM [23] that captures the incompatibility in heterogeneous networks by decomposing the input graph into multiple aspects and learns embeddings independently for each aspect. None of PTE, HNE, LANE, or ASPEM is aligned to the generic task of taskindependent heterogeneous network embedding learning. The heterogeneity in PTE stems from links in a text network while the raw input belongs to the same object type; HNE works on a heterogeneous graph with image and text where the simultaneous interactions among multityped objects are decomposed into several scattered pairwise interactions in a singletyped network; LANE defines heterogeneity as diverse information sources (namely, network topology and node label information) that need to be jointly learnt; while ASPEM models heterogeneous network incompatibility and learns embeddings for each aspect independently. Among the related art, metapath2vec and metagraph2vec consider the general problem of learning node representations for heterogeneous networks. However, the methods cannot preserve the structure and semantics of multitype nodes and links well enough, capture higherorder heterogeneous connectivity patterns, and ensure coverage of nodes for which representations are generated, as demonstrated in Section IV.
In this paper, we propose a novel efficient algorithm motif2vec that learns node representations or embeddings for heterogeneous information networks. Specifically, we leverage higherorder, recurring, and statistically significant network connectivity patterns in the form of motifs to learn higher quality embeddings. Motifs are one of the most common higherorder data structures for understanding complex networks and have been popularly recognized as fundamental units of network [3]. It has been successfully used in many network mining tasks such as clustering [32][36], anomaly detection [31], and convolution [21]. However, no prior work has investigated the scope and impact of motifs in learning node embeddings for heterogeneous networks. Rossi et al. introduced the problem of higherorder network representation learning using motifs for homogeneous networks [20]. But the method cannot be extended to handle heterogeneous networks. HONE [20] does not combine the best of both worlds random walk based method that accounts for local neighborhood structure and motifaware method that accounts for higherorder global network connectivity patterns, as we do. In addition, HONE (as well other existing methods) do not include the original network in the learning process, as we do. The latter ensures higher coverage of connected nodes.
Our algorithm motif2vec transforms the original graph to motif graph(s), conduct biased random walk to efficiently explore higher order neighborhoods, and then employ heterogeneous skipgram model to generate the embeddings. Related efforts in heterogeneous network node embedding, namely, metapath2vec [6] and metagraph2vec [37] are limited to only exploring neighborhoods, nodes, and links participating in the metastructure of interest. motif2vec leverages motifs to transform the original graph to a motif representation and conduct regular random walks on the entire transformed graph. We evaluate our algorithm on multiple realworld networks from diverse domains and against existing stateoftheart techniques on multiple machine learning tasks and demonstrate its consistent superiority over prior work. To summarize, we make the following contributions:

We propose motif2vec, an efficient and effective novel algorithm for representation learning in heterogeneous information networks. Specifically, we leverage higherorder, recurring, and statistically significant network connectivity patterns in the form of motifs to learn higher quality embeddings.

Our method preserves both local and global higherorder structural relationships as well as semantic correlations in a heterogeneous network. Unlike existing efforts, our method does not focus on refining the random walk to achieve the goal. Instead, we present a graph transformation method that enable us to capture subgraph pattern significances.

We empirically evaluate our algorithm for multiple heterogeneous network mining tasks, namely multiclass classification and link prediction on multiple realworld datasets from different domains and demonstrate its consistent superiority over stateoftheart baselines.
Ii Preliminaries
We introduce our problem definition and related concepts and notations before presenting our framework in Section III.
Definition II.1
Heterogeneous Information Network: A heterogeneous information network is defined as a directed graph G = (V, E, , ) in which each node v V is associated with mapping function (v): V and each link e E is associated with mapping function (v): E . and denote the sets of node types and link types in G, 1 and 1.
Examples of heterogeneous information network include the popular DBLP bibliographic network and Yelp social information network. Figure 1 presents the network schema, i.e., metatemplate for an information network, for each of the example instances. In Figure 1(a), multiple types of objects such as authors, papers, conference venues, author organizations, and paper keywords are connected by multiple types of relationships such as authorship (author paper), affiliation (author organization), etc. In Figure 1(b), multiple types of objects such as users, businesses, business locations, user reviews, and review terms are connected by multiple types of relationships such as checkin (user business), etc.
We define our representation learning task on such a heterogeneous network.
Definition II.2
Heterogeneous Network Representation Learning: Given a heterogeneous network G, the goal of representation learning is to learn a function f: V that maps nodes in G to ddimensional features in vector space and learns , d such that network structure and semantic heterogeneity is preserved.
We leverage motifs to design our heterogeneous network representation learning method.
Definition II.3
Heterogeneous Network Motif: A network motif = (, , , ) is an isomorphic induced directed subgraph consisting of a subset of k nodes from directed heterogeneous network G with , , , , such that:
(i) ,
(ii) consists of all of the edges in that have both endpoints in ,
(iii) iff for mapping function : , and
(iv) frequency of appearance of in is above a predefined threshold (i.e., statistically significant).
A recurring pattern is considered statistically significant if the frequency of its appearance in a graph is significantly higher than the frequency of its appearance in any randomized network.
Motifs are one of the most common higherorder data structures for understanding complex networks and have been popularly recognized as fundamental units of network [3]. It has been successfully used in many network mining tasks such as clustering [32], anomaly detection (densest subgraph sparsifiers) [31], and convolution [21]. In this work, we focus on directed motifs and directed heterogeneous network since they offer greater scope of representing rich semantics. Figure 2(a) presents all possible 3node network motifs. Figure 2(b) is a toy example showing how to find motif instance(s) in a graph [3].
In the toy example Figure 2(b), Figure 2(b)(right) depicts the graph and Figure 2(b)(left) depicts the motif. We observe that there are two instances of the motif in the graph: (i) ({a, b, c}, {a, b}) and (ii)({a, b, e}, {a, b}). The instance ({a, b, d}, {a, b}) is not included as an instance because the induced subgraph on the nodes a, b, and d is not isomorphic to the original graph.
Motifs are distinctly different from some of the other popular graph metastructures such as metapath and metagraph. We discuss this in details in Section IV.
Problem 1
Given a directed unweighted heterogeneous information network and a set of network motifs, learn lowdimensional latent representations for each node in the network such that the higherorder heterogeneous network neighborhood structure and semantics is preserved.Iii The motif2vec Framework
We present our general motif2vec framework that learns high quality node embeddings for heterogeneous networks. Our approach returns representations that help maximize the likelihood of preserving network neighborhoods of multitype nodes and links.
Skipgram Model: First, we introduce word2vec [15][16] and discuss its application to network embedding generation tasks. Mikolov et al. introduced word2vec group of models that learns the distributed representations of words in a corpus. Specifically, the skipgram model learns highquality vector representations of words from large amounts of unstructured text data. The algorithm scans the words in a document and aims to embed every word such that the word’s features can predict nearby context words. Deepwalk [18] and node2vec [8] generalized the idea to a homogeneous network by converting a network into a ordered sequence of nodes. For this, both methods sample sequences of nodes from the original network by random walk strategies.
Random Walk: A walk in a graph or directed graph is a sequence of nodes (), not necessarily distinct, such that . When the consecutive nodes in the sequence are selected at random, we generate a a random sequence of nodes known as the random walk on the graph. A random walk on a graph is a special case of a Markov chain that is timereversible. The probability of transition from node to is a function of the outdegree of node . We explore the neighborhood of a node in a graph or a directed graph using random walk. Specifically, we employ biased random walk procedure that efficiently explores nodes’ diverse neighborhoods in both breadthfirst and depthfirst search fashion [8].
Such a random walk combined with skipgram based embedding method learns feature representations for node in a homogeneous graph that predicts node s context neighborhood .
(1) 
Unlike all previous efforts belonging to this family of node embedding algorithms that employ random walk on the original graph [6][8][18][29][37], we conduct random walk on a transformed graph, known as the motif graph.
Motif Graph: The network motif literature has defined several graph features and concepts for motifs such as motif cut, motif volume, motif conductance, etc. [3]. We present one of them, namely motif graph or motif adjacency matrix, which has been used in our algorithmic framework. Given a directed heterogeneous network G = (V, E, , ) and a motif set , we compute the motif adjacency matrices . The weighted motif adjacency matrix for motif is defined as:
(2) 
The motif adjacency matrix, also known as the motif cooccurrence matrix, differs from the original graph structurally. The motif graph captures pairwise relationships between nodes in the original graph with respect to a motif. The larger the value in is, the more significant the relation between nodes and is with respect to the motif . The motif adjacency matrix can be both weighted or binary. In the later case, is either 1 or 0 indicating the existence of a relationship between nodes and for motif . The motif adjacency matrix is symmetric, and thereby undirected. All edges in the original graph may not exist in the motif graph since a motif may not appear for a given edge. The edges in a motif graph are likely to have different weights than the original graph since a motif may appear at a different frequency than another random motif for a given edge. Thus, the number of edges in a weighted motif graph is usually greater than the number of edges in the original graph.
We transform the original graph to a motif graph in order to simultaneously encode the heterogeneity in structure and semantics, and conduct random walks on the motif graph itself. Additionally, we conduct random walks on the original graph to ensure greater coverage of higherorder connected nodes, which may otherwise be missed due to their nonparticipation in popular motifs. This strategy enables our random walk to be not dependent on the type of the node or link, as in prior art for heterogeneous networks [6][37]. Note that, metastructure (metapath, metagraph, etc.) driven random walks limit the scope of a walk to explore higherorder diverse neighborhoods. The generated walk sequences are aggregated and shuffled before being fed to skipgram. Our graph transformation followed by graph metastructure independent biased random walk enable the sequences to carry both higherorder heterogeneous network structural patterns as well as heterogeneous semantic relationships. We demonstrate the superiority our novel idea empirically in Section IV.
Figure 3 illustrates our motif2vec framework. Given a heterogeneous information network and a set of motifs , the goal of the framework is to output dimensional embedding vectors for each node in . The steps in the framework presented in Figure 3 includes: motif instance discovery, random walk sequence generation, aggregation and shuffling of the generated walk sequences, skipgram neural net training, and finally embedding generation. We present the three phases of our framework next.
Network Transformation: First, we find instances of the motif(s) under consideration in the original network. This is referred to as the motif discovery task in the literature and is a computationally expensive operation. Many motif discovery algorithms have been proposed over the years, each with the intent of improving the computational aspects of the stateoftheart [12]. We use the method presented in [9] for motif discovery. Once the motif instances are received, we compute the weighted motif adjacency matrix. Thus, we transform the original network to motif graph(s) that encodes the heterogeneity in network structure and semantics.
Sequence Generation: Next, we generate random walk sequences for motif graph(s) and the original graph. We generate random walks on both transformed graph(s) and original graph. We use the method in [8] for generating sequences. We aggregate and shuffle the sequences generated from the original and the motif graph(s) before feeding them to the neural net. Thus, our generated walk sequences encompasses both local and global network heterogeneous connectivity.
Embedding Generation: Finally, we input the walk sequences from the previous step and output node embeddings. We use the skipgram neural net model architecture in [19] for learning the latent feature representations. We minimize our optimization function using SGD with negative sampling that is known to learn accurate representations efficiently [16]. Following Equation 1, we optimize embedding () for node in graph for random walk cooccurrences according to:
(3) 
where is the neighborhood of node and node is seen on a random walk staring from node .
The pseudocode for motif2vec is presented in Algorithm 1.
Iv Experiments
We evaluate the heterogeneous network node embeddings obtained through motif2vec on two standard supervised machine learning tasks: multilabel node classification and link prediction.
Iva Experimental Setup
We compare motif2vec with several recent network representation learning algorithms on multiple datasets.
IvA1 Datasets
We use three popular publicly available heterogeneous networks data from the literature:
DBLPP Dataset: It is a bibliographic network composed of three types of nodes: author (A), paper (P), and venue (V) connected by three types of links: A P, P V, and P P. We use a subset of the DBLP dataset made available by [21][26] for paper classification task. The papers are labeled to belong to 10 classes such as information retrieval, databases, networking, artificial intelligence, operating systems, etc. that are extracted from Cora [14]. There are 17,411 authors (A), 18,059 papers (P), and 300 conferences, i.e., venues (V).
AMinerCS Dataset: It is another bibliographic network graph composed of three types of nodes: author (A), paper (P), and venue (V) connected by three types of links: A P, P V, and P P. We use a version of the AMiner Computer Science (CS) dataset made available by [6] for author classification. It comprises of 1,693,531 authors (A), 3,194,405 papers (P), and 3,883 venues (V). Author research categories are labeled to belong to 8 classes such as theoretical computer science, computer graphics, human computer interaction, computer vision and pattern recognition, etc. based on the categories in Google Scholar [6]. There are 246,678 labeled authors in this dataset. We use it for author node classification task.
YelpRestaurant Dataset: We consider the data obtained from the 12 round of Yelp Dataset Challenge. We build a heterogeneous network composed of four types of nodes: users (U), businesses, i.e., restaurants (R), location (L), and category (C) connected by three types of links: U R, R C, and C L. Yelp dataset includes users who have very few reviews. In fact, about 49% of the users have only one review [38] making the dataset very sparse and hence difficult for evaluation purposes. Following the common practice by other works (e.g., [38]), we filter out users with less than twenty business reviews over fourteen years (2004  2018). There are 36,432 users (U), 18,256 restaurants (R), 5,514 locations, and 419 categories (C). We use this data for U R link prediction task.
AmazonElectronics Dataset: We consider the Amazon200k dataset [10][39] which contains ratings provided by users on electronics items in Amazon. Similar to YelpRestaurant dataset, we build a heterogeneous network composed of four types of nodes: users (U), items (I), brand (B) and category (C) connected by 3 types of links: U I, I B, and I C. There are 59,297 users (U), 21,000 items (I), 2,059 brands (B), and 683 categories (C). We use this dataset for U I link prediction.
Dataset  #Nodes  #Links 

DBLPP  35,770  131,636 
AMinerCS  4,891,819  12,506,615 
YelpRestaurant  60,621  189,423 
AmazonElectronics  83,039  284,650 
Method  MultiClass Node Classification  Link Prediction  

DBLPP  AMinerCS  YelpRestaurant  AmazonElectronics  
motif2vec  78.80  91.68  58.38  58.90 
metapath2vec  60.08  73.90  43.30  50.89 
metapath2vec++  49.40  72.31  29.21  57.02 
metagraph2vec  64.48  82.09  29.24  55.53 
metagraph2vec++  53.24  35.58  39.60  60.02 
IvA2 Baseline Methods
We compare motif2vec with recent network representation learning methods focused on heterogeneous networks. Specifically, we focus on the family of node embedding methods to which motif2vec belongs.
metapath2vec, metapath2vec++ [6]: Dong et al. study the problem of representation learning in heterogeneous networks. They propose two models: metapath2vec and metapath2vec++ that first leverages metapath based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skipgram model to generate the embeddings.
metagraph2vec, metagraph2vec++ [37]: Zhang et al. proposed a network embedding learning method for heterogeneous networks that leverages metagraph to capture richer structural contexts and semantics between distant nodes. The method uses metagraph to guide the generation of random walks and then skipgram model to learn latent embeddings of multityped heterogeneous network nodes. metagraph2vec uses homogeneous skipgram model while metagraph2vec++ uses heterogeneous skipgram model.
The authors in [6] demonstrate how the proposed method beats some of the popular stateofart at that time, namely DeepWalk [18], node2vec [8], LINE [29], Spectral Clustering [30] and Graph Factorization [1]. Thus, we exclude them from our experiments. Note that, each of this work considers homogeneous networks.
IvA3 Machine Learning Tasks
We consider two standard supervised machine learning tasks to evaluate our embedding.
Node Classification: Node classification is a downstream machine learning task that classifies nodes in a heterogeneous network into a predefined set of classes. We follow a standard classification setup and use the generated node embeddings as features for the classifier. We conduct paper node multiclass classification for DBLPP data and author node classification for AMinerCS data. The classifier, parameter values, and train/test data is fixed for the various embedding approaches to avoid any confounding factor. We use traditional SVM classifier for both DBLPP and AMinerCS datasets, without any parameter tuning.
Link Prediction: Link prediction is a widely popular machine learning task in heterogeneous networks that predicts links that are likely to be added to the network in the near future. We leverage node embedding features to predict links. We partition the links in a network to train and test instances in order to hide a fraction of the existing links during embedding learning. The probability of a link appearing between two nodes in a network is calculated by computing cosine similarity between the respective feature vector embeddings. In our experiments, if the embeddingbased similarity score between a pair of nodes is higher than a threshold, we infer that an edge could exist between the two nodes. In order to penalize embeddings that generate a high similarity value for any random pair of nodes, we generate an equal number of fake links in the test set. These fake links correspond to links that do not exist in the original network. The intuition is that these embeddings are expected to return a similarity score less than the threshold. We evaluate our embeddings for link prediction on YelpRestaurant and AmazonElectronics datasets.
IvA4 Evaluation Metric
In traditional classification task, accuracy is a popular evaluation metric and we consider that. We perform the standard 70:30 split for train and test data, and measure the percentage of correct predictions for the test instances during multiclass classification. For link prediction, we split data into 70:30 such that the links present in the test set are removed from the original network on which embeddings are learnt. We measure the percentage of correct predictions, i.e., presence of links, for the test instances. We refer to our link prediction evaluation metric as accuracy too.
IvA5 Settings
For all embedding methods, we use the exact same parameters listed below. The parameter settings used are in line with typical values used in prior art [8].
The embedding vector dimension : 128
The walk length : 80
The number of walks per node : 10
The context size for optimization : 10
Random walk return parameter : 1
Random walk inout parameter : 1
The optimization is run for a single epoch. Each of our reported numbers is an average of five runs. All codes are implemented in Python All experiments are conducted on a Linux machine with 2.60GHz Intel processor, 28 CPU cores, and 800GB RAM.
IvB Results: Accuracy
Table II presents our experimental results for the different machine learning tasks: multiclass node classification and link prediction, different datasets: DBLPP, AMinerCS, YelpRestaurants, and AmazonElectronics, different embedding methods: motif2vec, metapath2vec, metapath2vec++, metagraph2vec, and metagraph2vec++ under the same configuration parameter settings, as detailed in Section 4.1. We observe that our algorithm motif2vec consistently and significantly outperforms the baseline methods for both tasks and across all four datasets.
For paper node classification task on DBLPP data, motif2vec beats the best baseline by 22%. For author research category classification task on AMinerCS data, motif2vec beats the best baseline by 24%. For link prediction task on YelpRestaurant data, motif2vec achieves 34% improvement, while for link prediction on AmazonElectronics data, motif2vec achieves 3% improvement over the second best baseline method. We observe that metagraph2vec++ is the best algorithm for AmazonElectronics dataset. However, metagraph2vec and metagraph2vec++ are fairly inconsistent, as is evident from their accuracy numbers for the remaining datasets in the table. The authors in [6] and [37] introduce the “++” version of metapath2vec and metagraph2vec since the heterogeneous skipgram model is expected to accommodate the heterogeneity in network better. However, the results presented by the authors in both works fail to showcase the steady benefits of heterogeneous skipgram. We do not propose an extended version of motif2vec in this paper.
Motif  Accuracy (in %) 

A 3node motif ()  78.50 
A 4node motif ()  78.80 
A 5node motif ()  78.43 
All 3node motifs ()  78.00 
All 4node motifs ()  77.75 
In summary, our method learns consistently and significantly better (achieving relative improvements as high as 24% and 34% over benchmarks for classification and prediction respectively) heterogeneous network node embeddings than existing stateoftheart methods. This is primarily because transforming the original complex graph to a motif graph helps accommodate heterogeneous network structure and semantic heterogeneity effectively.
IvC motif vs. metapath vs. metagraph
Figure 5 illustrates example 3node, 4node, and 5node motifs for the machine learning task on DBLPP and AMinerCS datasets. Figure 6 presents the metapath and metagraph for node classification task on DBLPP and AMinerCS datasets.
Motifs are crucial to understanding the structure, semantics, and functions of meaningful patterns in complex networks. Thus, interesting motifs in a heterogeneous bibliographic network may include (motifs in Figure 5): (i) authors collaborating on the same paper ( and ), (ii) papers published at the same venue ( and ), (iii) authors collaborating on a paper that gets published at a venue (), (iv) authors collaborating on papers (), (v) author publishing papers at the same venue (), (vi) authors collaborating on papers that get published at a venue (), and (vii) authors publishing papers at the same venue ().
Table III presents the effectiveness of each motif in generating higher quality embeddings useful for classification. In our framework, we can combine multiple motifs for learning more effective node representations. Some combinations of motifs are more useful than the others. In Table III, all 3node motifs and all 4node motifs do not return the highest classification accuracy since at least one nonuseful motif in each set pulls the classification accuracy down. Determining the best combination of motifs for higher quality embedding learning is combinatorially expensive and not the focus of this paper. In our experiments, we consider only one motif, i.e., in order to ensure a fair comparison with the baseline methods which consider one metapath and one metagraph with the same semnatics (as ) respectively.
Authors in [6] surveyed metapath related efforts and found that the most popularly used metapath schemes in bibliographics networks are and . denotes coauthorship relationship while represents authors publishing papers at the same venue. We consider as the metapath for experiments involving DBLPP and AMinerCS dataset since it can be generalized to diverse tasks in a heterogeneous bibliographic network [6]. Authors in [37] extend metapaths to metagraphs in order to capture rich contexts and semantic relations between nodes better. The augmentation of path to the directed metapath helps the metastructure encode semantic relations between distant nodes. Thus, we choose the same metagraph as the one shown in [37] for heterogeneous bibliographic network mining.
Figure 7(a) and Figure 7(b) present the motif, metapath, and metagraph for link prediction on YelpRestaurant and AmazonElectronics datasets respectively. Our choice of metapath and metagraph for YelpRestaurant and AmazonElectronics datasets is inspired by [39]. Similar to our setup for DBLPP and AminerCS datasets, we choose one motif from the set of possible motifs for motif2vec in order to ensure a fair comparison with metapath2vec and metagraph2vec. Note that, each of our motif, metapath, and metagraph in Figure 7(a) and Figure 7(b) consist of three node types though the original schema has four node types. This is because metapath and metagraph cannot handle four types of nodes, as discussed next. Experimental results replacing node type location (L) with node type category (C) and node type brand (B) with node type category (C) for YelpRestaurant and AmazonElectronics dataset respectively are similar to the results in Table II.
Advantages of motif over metapath and metagraph: Motifs are capable of capturing greater context and leveraging richer semantics than both metapaths and metagraphs. This is because both metapaths and metagraphs are commonly used in a symmetric way thereby facilitating a recursive guidance for random walkers [6][22][24][37]. Thus, they cannot build meaningful metastructures for heterogeneous network schemas like YelpRestaurant and AmazonElectronics having four node types. Figure 5 reveals how a lot more interesting heterogeneous patterns can be captured by motif than by metapath and metagraph. Figure 8 showcases two example bifan motifs, known to occur frequently in complex networks, for YelpRestaurant dataset that a metapath or a metagraph cannot capture. In addition, metapath2vec and metagraph2vec are designed to operate only on a single metapath and a single metagraph respectively, unlike motif2vec.
IvD Efficiency
It is important to investigate the efficiency of motif2vec in today’s age of big graph data. The computationally expensive steps in the algorithm are: motif instance discovery, sequence generation, and embedding learning. The motif instance discovery literature is about two decades old and boasts of many efficient algorithms [12]. In this work, we consider the widely adopted formalization of the motif discovery task, namely subgraph isomorphism, and use the fast method presented in [9] (NetworkX library) for motif discovery. For sequence generation and embedding learning using skipgram, we employ the parallelization tricks suggested by priorart [8][16].
Figure 9 illustrates the time taken by each of the individual steps: motif instance extraction, weighted graph creation, random walk simulation, and skipgram neural net training for two datasets, one from each machine learning task, under consideration. As expected, the time to extract the motif instances dominate motif2vec algorithm’s endtoend execution time, followed by the time taken to generate random walk sequences. Thus, our algorithm is limited by the scalability of motif instance extraction in the worst case.
motif2vec efficiency for 4.9M nodes, 12.5M edge graph: We conduct experiments on AMinerCS dataset heterogeneous network consisting of 4.9M nodes and 12.5 million edges to highlight the effectiveness of our method for big graphs, in spite of the computational expenses associated with motif instance extraction and random walk simulation. Given a heterogeneous network with welldefined semantics and relations (see Figure 4) and a motif of interest (see motif in Figure 5), we implement our own heuristic motif instance extraction method that is guided by the pattern in the motif to identify the matching subgraphs in the overall network. While NetworkX’s module for motif instance extraction returns all possible matching subgraphs which are far more in number than what is relevant to our task, our heuristic method prunes the candidate space and speeds up the extraction process. We skip the details of our heuristic due to lack of space.
For AminerCS dataset, we run experiments with 24 threads with each of them utilizing a CPU core, under the same parameter settings (Section IV). The random walk sequence generation step uses OpenMP to automatically decide the number of cores, as seen in the highperformance graph analytics SNAP repository[8]. For AminerCS dataset, motif2vec took 8 hours 55 minutes for endtoend execution, of which:
26 minutes is taken by our heuristic motif instance extraction method,
43 minutes is taken by random walk simulation method,
7 hours 42 minutes is taken by word2vec/skipgram neural net model training.
V Related Work
Network representation learning: Network representation learning is a wellstudied research problem owing to the ubiquitous nature of networks in the realworld and applications such as node classification, link prediction, visualization, clustering, etc. Various approaches have been studied in the literature to address this problem [7]. The early approaches aimed to learn node representations by factorizing the graph adjacency matrix as performed in recommender systems [1][17], and are computationally expensive. The randomwalk based methods have emerged to be the most popular and includes DeepWalk [18], node2vec [8], LINE [29], HARP [5], etc. Most of the efforts are designed for homogeneous networks and are inadequate to handle heterogeneous networks. Heterogeneous network representation learning methods include metapath2vec [6], metagraph2vec [37], PTE [28], HNE [4], LANE [11], and ASPEM [23]. Except metapath2vec and metagraph2vec, none of these methods is aligned to our problem of generic unsupervised taskindependent network embedding learning preserving the heterogeneity in structure and semantics, as discussed in Section I. Of late, deep learning based approaches have become popular for node representation learning. Further research focused on interpreting the embedding learned by these models can be useful.
Heterogeneous information network: Heterogeneous information networks are graphs that have various types of nodes and links fusing more information and containing richer semantics. Heterogeneous information networks are used to model most realworld networks today. In the literature, researchers have published various tasks related to heterogeneous networks such as similarity search [26], clustering [27], prediction [25], classification [13], etc. Each method is designed for a specific heterogeneous network mining application. In our work, we learn node representations that are effective for both classification and prediction. We also demonstrate how motif2vec outperforms existing heterogeneous network representation learning methods [6][37] that cannot capture higherorder heterogeneous connectivity patterns or preserve the structure and semantics of multiple types of nodes/links.
Network motifs: Network motifs are simple basic building blocks of complex networks. Motifs have originated from domains such as biochemistry and ecology where they are used for studying networks such as gene regulation, neuron synaptic connection, etc. It has been successfully used in many computer science network mining tasks such as clustering [32], anomaly detection [31], and convolution [21]. Rossi et al. addressed the problem of higherorder network representation learning using motifs for homogeneous networks [20]. But the method cannot be extended to handle heterogeneous networks. HONE [20] also does not combine the advantages of randomwalk based method and motifaware method, as we do.
Vi Conclusion
In this paper, we study the problem of node representation learning for heterogeneous information networks. We propose a novel efficient algorithm, motif2vec that leverage higherorder, recurring, and statistically significant network connectivity patterns in the form of network motifs to learn latent representations preserving heterogeneity in network structure and semantics. Unlike existing graph embedding methods for heterogeneous networks that employ some form of graph metastructure to guide heterogeneous semantics aware random walks through the network, we employ motifs to transform the graph to a motif graph, which in turn, encode the heterogeneity. Our method preserves both local and global structural relationships in addition to rich semantic correlations in a network. We empirically demonstrate how the proposed algorithm consistently and significant outperforms stateoftheart baselines on diverse realworld datasets. An important input to our algorithm is the choice of motif(s) from the set of all possible motifs. In the future, we intend to explore the possibility of automatically learning the motif weights for a network or for a task. It will also be interesting to study how our algorithm extends to handle the dynamics of evolving heterogeneous networks.
References
 [1] Amr Ahmed, Nino Shervashidze, Shravan M. Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. Distributed largescale natural graph factorization. In Proceedings of the 22nd WWW International Conference on World Wide Web, 2013.
 [2] Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
 [3] Austin R. Benson, David F. Gleich, and Jure Leskovec. Higherorder organization of complex networks. Science Magazine, 353(6295):163–166, 2016.
 [4] Shiyu Chang, Wei Han, Jiliang Tang, GuoJun Qi, Charu C. Aggarwal, and Thomas S. Huang. Heterogeneous network embedding via deep architectures. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [5] Haochen Chen, Bryan Perozzi, Yifan Hu, and Steven Skiena. HARP: hierarchical representation learning for networks. In Proceedings of the 32nd AAAI International Conference on Artificial Intelligence.
 [6] Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [7] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. KnowledgeBased Systems, 151:78–94, 2018.
 [8] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [9] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference.
 [10] Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with oneclass collaborative filtering. In Proceedings of the 25th WWW International Conference on World Wide Web.
 [11] Xiao Huang, Jundong Li, and Xia Hu. Label informed attributed network embedding. In Proceedings of the 10th ACM WSDM International Conference on Web Search and Data Mining.
 [12] Yusuf Kavurucu. A comparative study on network motif discovery algorithms. International Journal of Data Mining and Bioinformatics, 11(2), 2015.
 [13] Xiangnan Kong, Bokai Cao, Philip S. Yu, Ying Ding, and David J. Wild. Meta pathbased collective classification in heterogeneous information networks. In Proceedings of the 25th ACM CIKM International Conference on Information and Knowledge Management.
 [14] Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127–163, 2000.
 [15] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In ICLR Workshop, 2013.
 [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS International Conference on Neural Information Processing Systems.
 [17] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
 [18] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [19] Radim Řehůřek and Petr Sojka. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks.
 [20] Ryan A. Rossi, Nesreen K. Ahmed, and Eunyee Koh. Higherorder network representation learning. In Companion of the WWW International Conference on World Wide Web.
 [21] Aravind Sankar, Xinyang Zhang, and Kevin ChenChuan Chang. Motifbased convolutional neural network on graphs. CoRR, abs/1711.05697, 2017.
 [22] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 29(1):17–37, 2017.
 [23] Yu Shi, Huan Gui, Qi Zhu, Lance M. Kaplan, and Jiawei Han. Aspem: Embedding learning by aspects in heterogeneous information networks. In Proceedings of the SIAM SDM International Conference on Data Mining.
 [24] Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery. Morgan & Claypool Publishers, 2012.
 [25] Yizhou Sun, Jiawei Han, Charu C. Aggarwal, and Nitesh V. Chawla. When will it happen?: relationship prediction in heterogeneous information networks. In Proceedings of the 15th ACM WSDM International Conference on Web Search and Web Data Mining.
 [26] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. Pathsim: Meta pathbased topk similarity search in heterogeneous information networks. PVLDB, 4(11):992–1003, 2011.
 [27] Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. Integrating metapath selection with userguided object clustering in heterogeneous information networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [28] Jian Tang, Meng Qu, and Qiaozhu Mei. PTE: predictive text embedding through largescale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [29] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: largescale information network embedding. In Proceedings of the 24th WWW International Conference on World Wide Web.
 [30] Lei Tang and Huan Liu. Leveraging social media networks for classification. Journal of Data Mining and Knowledge Discovery, 23(3):447–478, 2011.
 [31] Charalampos E. Tsourakakis. Motifdriven graph analysis. In Proceedings of 54th Annual Allerton Conference on Communication, Control, and Computing, 2016.
 [32] Charalampos E. Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. Scalable motifaware graph clustering. In Proceedings of the 26th WWW International Conference on World Wide Web, pages 1451–1460, 2017.
 [33] Véronique Van Vlasselaer, Cristián Bravo, Olivier Caelen, Tina EliassiRad, Leman Akoglu, Monique Snoeck, and Bart Baesens. APATE: A novel approach for automated credit card transaction fraud detection using networkbased extensions. Decision Support Systems, 75:38–48, 2015.
 [34] Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. Review graph based online store review spammer detection. In Proceedings of the 11th IEEE ICDM International Conference on Data Mining.
 [35] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. Billionscale commodity embedding for ecommerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
 [36] Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higherorder graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
 [37] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Metagraph2vec: Complex semantic path augmented heterogeneous network embedding. In Advances in Knowledge Discovery and Data Mining  Proceedings of the 22nd PAKDD International PacificAsia Conference.
 [38] Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. Explicit factor models for explainable recommendation based on phraselevel sentiment analysis. In Proceedings of the 37th ACM SIGIR International Conference on Research and Development in Information Retrieval.
 [39] Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. Metagraph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.