HONE: HigherOrder Network Embeddings
Abstract.
Learning a useful representation from graph data lies at the heart and success of many machine learning tasks such as entity resolution, link prediction, user modeling, anomaly detection, and many others. Recent methods mainly focus on learning graph representations based on random walks. These methods are unable to capture higherorder dependencies and connectivity patterns that are crucial to understanding and modeling complex networks. In this work, we formulate higherorder network representation learning and describe a general framework for learning HigherOrder Network Embeddings (HONE) from graph data based on lowerorder subgraph patterns called graphlets (network motifs, orbits). The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results demonstrate the effectiveness of learning higherorder network representations. In all cases, HONE outperforms recent embedding methods that are unable to capture higherorder structures. In particular, HONE achieves a mean relative gain in AUC of (and up to gain) across all methods and over a wide variety of networks from different application domains.
Abstract.
This paper describes a general framework for learning HigherOrder Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results demonstrate the effectiveness of learning higherorder network representations. In all cases, HONE outperforms recent embedding methods that are unable to capture higherorder structures with a mean relative gain in AUC of (and up to gain) across a wide variety of networks and embedding methods.
1. Introduction
Roles represent node (or edge (Ahmed et al., 2017a)) connectivity patterns such as hub/starcenter nodes, staredge nodes, nearcliques or bridge nodes connecting different regions of the graph. Intuitively, two nodes belong to the same role if they are structurally similar (with respect to their general connectivity/subgraph patterns) (Rossi and Ahmed, 2015b). Informally, roles are sets of nodes that are more structurally similar to nodes inside the set than outside, whereas communities are sets of nodes with more connections inside the set than outside. Roles are complimentary but fundamentally different to the notion of communities. Communities capture cohesive/tightlyknit groups of nodes and nodes in the same community are close together (small graph distance) (Fortunato, 2010), whereas roles capture nodes that are structurally similar with respect to their general connectivity and subgraph patterns and are independent of the distance/proximity to one another in the graph (Rossi and Ahmed, 2015b). Hence, two nodes that share similar roles can be in different communities and even in two disconnected components of the graph. The goal of role learning in graphs is to not only group structurally similar nodes into sets but also to embed them close together in some dimensional space (Rossi and Ahmed, 2015b).
Many network representation learning methods attempt to capture the notion of roles (structural similarity) (Rossi and Ahmed, 2015b) using random walks that are fundamentally tied to node identity and not general structural/subgraph patterns (network motifs) of nodes.
As such, two nodes with similar embeddings are guaranteed to be near one another in the graph (a property of communities (Fortunato, 2010)) since they appear near one another in a random walk.
In this work, we propose higherorder network representation learning and describe a general framework called HigherOrder Network Embeddings (HONE) for learning such higherorder embeddings based on network motifs. The term motif is used generally and may refer to graphlets or orbits (graphlet automorphisms) (Pržulj, 2007; Ahmed et al., 2015). The approach leverages all available motif counts (and more generally statistics) by deriving a weighted motif graph from each network motif and uses these as a basis to learn higherorder embeddings that capture the notion of structural similarity (roles) (Rossi and Ahmed, 2015b). The HONE framework expresses a new class of embedding methods based on a set of motifbased matrices and their powers. In this work, we investigate HONE variants based on the weighted motif graph, motif transition matrix, motif Laplacian matrix, as well as other motifbased matrices. The experiments demonstrate the effectiveness of HONE as we achieve a mean relative gain in AUC of across a variety of different networks and embedding methods.
Contributions: This work makes three important contributions. First, we introduce the problem of higherorder network representation learning. Second, we propose a general class of methods for learning higherorder network embeddings based on network motifs. Third, we demonstrate the effectiveness of learning higherorder network representations.
2. HigherOrder Network Embeddings
This section proposes a new class of embedding models called HigherOrder Network Embeddings (HONE) and a general framework for deriving them. The class of higherorder network embedding methods is defined as follows:
Definition 1 (HigherOrder Network Embeddings).
Given a network (graph) , a set of network motifs , the goal of higherorder network embedding (HONE) is to learn a function that maps nodes to dimensional embeddings using network motifs .
The particular family of higherorder node embeddings presented in this work is based on learning a function that maps nodes to dimensional embeddings using (powers of) weighted motif graphs derived from a motif matrix function . However, many other families of higherorder node embedding methods exist in the class of higherorder network embeddings (Definition 1). Most importantly, since network motifs lie at the heart of higherorder network embeddings (Definition 1), they are guaranteed to capture the notion of roles (based on general subgraph/connectivity patterns of nodes) (Rossi and Ahmed, 2015b) as opposed to the complimentary but fundamentally different notion of communities (based on proximity/small graph distance, and cohesive/tightlyknit/dense groups of nodes) (Fortunato, 2010).
2.1. Network Motifs
The HONE framework can use graphlets or orbits. Recall that the term motif is used generally in this work and may refer to graphlets or orbits (graphlet automorphisms) (Pržulj, 2007; Ahmed et al., 2015).
Definition 2 (Graphlet).
A graphlet is an induced subgraph consisting of a subset of vertices from together with all edges whose endpoints are both in this subset .
A graphlet is defined as an induced subgraph with vertices. Alternatively, the nodes of every graphlet can be partitioned into a set of automorphism groups called orbits (Pržulj, 2007). It is important to consider the position of an edge in a graphlet, for instance, an edge in the 4node path (Figure 1) has two different unique positions, namely, the edge in the center of the path, or an edge on the outside of the 4node path. Each unique edge position in a graphlet is called an automorphism orbit, or just orbit. More formally,
Definition 3 (Orbit).
An automorphism of a vertex graphlet is defined as a permutation of the nodes in that preserves edges and nonedges. The automorphisms of form an automorphism group denoted as . A set of nodes of graphlet define an orbit iff (i) for any node and any automorphism of , ; and (ii) if then there exists an automorphism of and a such that .
In this work, we use all (24)vertex connected edge orbits and denote this set as . For an example, Figure 1 shows the connected edge orbits with up to 4nodes.
2.2. Weighted Motif Graphs
Given a network with nodes, edges, and a set of network motifs, we form the weighted motif adjacency matrices:
(1) 
where
The weighted motif graphs differ from the original graph in two important and fundamental ways. First, the edges in each motif graph is likely to be weighted differently. This is straightforward to see as each network motif can appear at a different frequency than another arbitrary motif for a given edge. Intuitively, the edge motif weights when combined with the structure of the graph reveal important structural properties with respect to the weighted motif graph. Second, the motif graphs are likely to be structurally different (Figure 2). For instance, if edge exists in the original graph , but for some arbitrary motif , then . Hence, the motif graphs encode relationships between nodes that have a sufficient number of motifs. To generalize the above weighted motif graph formulation, we replace the edge constraint that says an edge exists between and if the number of instances of motif that contain nodes and is 1 or larger, by enforcing an edge constraint that requires each edge to have at least motifs. In other words, different motif graphs can arise using the same motif by enforcing an edge constraint that requires each edge to have at least motifs. This is an important property of the above formulation.
2.3. Motif Matrix Functions
To generalize HONE for any motifbased matrix formulation, we define as a function over a weighted motif adjacency matrix . Using we derive
(2) 
The term motifbased matrix refers to any motif matrix derived from .

Weighted Motif Graph: Given a network and a network motif , form the weighted motif adjacency matrix whose entries are the cooccurrence counts of nodes and in the motif = number of instances of that contain nodes and . In the case of using HONE directly with a weighted motif adjacency matrix , then
(3) The number of paths weighted by motif counts from node to node in steps is given by
(4) 
Motif Transition Matrix: The random walk on a graph weighted by motif counts has transition probabilities
(5) where is the motif degree of node . The random walk motif transition matrix for an arbitrary weighted motif graph is defined as:
(6) where is a diagonal matrix with the motif degree of each node on the diagonal called the diagonal motif degree matrix and is the vector of all ones. is a rowstochastic matrix with where is a column vector corresponding to the th row of . For directed graphs, the motif outdegree is used. However, one can also leverage the motif indegree or total motif degree (among other quantities). The motif transition matrix represents the transition probabilities of a nonuniform random walk on the graph that selects subsequent nodes with probability proportional to the connecting edge’s motif count. Therefore, the probability of transitioning from node to node depends on the motif degree of relative to the total sum of motif degrees of all neighbors of . The probability of transitioning from node to in steps is given by
(7) 
Motif Laplacian: The motif Laplacian for a weighted motif graph is defined as:
(8) where is the diagonal motif degree matrix defined as . For directed graphs, we can use either inmotif degree or outmotif degree.

Normalized Motif Laplacian: Given a graph weighted by the counts of an arbitrary network motif , the normalized motif Laplacian is defined as
(9) where is the identity matrix and is the diagonal matrix of motif degrees. In other words,
(10) where is the motif degree of node .

Random Walk Normalized Motif Laplacian: Formally, the random walk normalized motif Laplacian is
(11) where is the identity matrix, is the motif degree diagonal matrix with , and is the weighted motif adjacency matrix for an arbitrary motif . Observe that where is the motif transition matrix of a random walker on the weighted motif graph.
Notice that all variants are easily formulated as functions in terms of an arbitrary motif weighted graph .
2.4. Local KStep Motifbased Embeddings
We describe the local higherorder node embeddings learned for each network motif and step where . The term local refers to the fact that node embeddings are learned for each individual motif and kstep independently. We define step motifbased matrices for all motifs and steps as follows:
(12) 
where
(13) 
Note for the proposed motif Laplacian HONE variants ensures is a valid motif Laplacian matrix. However, the motif transition probability matrix remains a valid transition matrix when taking powers of it and therefore we can simply use where . Depending on the motifbased matrix formulation (Section 2.3), we renormalize each step motif matrix appropriately. Alternatively, we can define where we first use the motif matrix function and then derive powers of the resulting motifbased matrix . Hence,
(14) 
These kstep motifbased matrices can densify quickly and therefore the space required to store the kstep motifbased matrices can grow fast as increases. For large graphs, it is often impractical to store the kstep motifbased matrices for any reasonable . To overcome this issue, we avoid explicitly constructing the kstep motifbased matrices entirely. Hence, no additional space is required and we never need to store the actual step motifbased matrices for . We discuss and show this for any step motifbased matrix later in this subsection.
Given a kstep motifbased matrix for an arbitrary network motif , we find an embedding by solving the following optimization problem:
(15) 
where is a generalized Bregman divergence (and quantifies in the HONE embedding model ) with matching linear or nonlinear function and is constraints (e.g., nonnegativity constraints , , orthogonality constraints , ). The above optimization problem finds lowrank embedding matrices and such that . The function allows nonlinear relationships between and . Different choices of and yield different HONE embedding models and depend on the distributional assumptions on . For instance, minimizing squared loss with an identity link function yields singular value decomposition corresponding to a Gaussian error model (Golub and Van Loan, 2012). Other choices of and yield other HONE embedding models with different error models such as Poisson, Gamma, or Bernoulli distributions (Collins et al., 2002).
Recall from above that we avoid explicitly computing and storing the kstep motifbased matrices from Eq. 12 as they can densify quickly as increases and therefore are impractical to store for any large graph and reasonable . This is accomplished by defining a linear operator corresponding to the step motifbased matrices that can run in at most times the linear operator corresponding to the (step) motifbased matrix. In particular, many algorithms used to compute lowrank approximations of large sparse matrices (Rokhlin et al., 2009; Halko et al., 2011) do not need access to the explicit matrix, but only the linear operator corresponding to action of the input matrix on vectors. For a matrix , let denote the upper bound on the time required to compute for any vector . We note where always holds and is a useful bound when is sparse. Therefore, the time required to compute a rank approximation of is where .
Now, we can define a linear operator corresponding to the step motifbased matrices that can run in at most times the linear operator corresponding to the (step) motifbased matrix. We show this for the case of any weighted motif adjacency matrix . Let be the time required to compute , for any vector . Then, to compute , we can do the following. Let and iteratively compute for . This shows that . This implies that we can compute a rank embedding of the step motif adjacency matrix in time at most which is at most
(16) 
where . This implies that the time to compute the rank embedding grows only linearly with . Therefore, no additional space is required and we never need to derive/store the actual kstep motifbased matrices for . Moreover, as shown above, the time complexity grows linearly with and is therefore efficient. The time complexity in Eq. 16 is for singular value decomposition/eigendecomposition and hence finds the best rank approximation (Golub and Van Loan, 2012). However, linear operators can also be defined for other optimization techniques that can be used to compute a rank approximation such as stochastic gradient descent, block/cyclic coordinate descent, or alternating least squares. Thus, the time complexity for computing rank embeddings using these optimization techniques will also only increase by a factor of .
Afterwards, the columns of are normalized by a function as follows:
(17) 
In this work, is a function that normalizes each column of using the Euclidean norm. The HONE framework is flexible for use with other norms as well and the appropriate norm should be chosen based on the data characteristics and application.
2.5. Learning Global HigherOrder Embeddings
How can we learn a higherorder embedding for an arbitrary graph that automatically captures the important motifs? Obviously, simply concatenating the previous motif embeddings into a single matrix and using this for prediction assumes that each motif is equally important. However, it is obvious that some motifs are more important than others and the choice of which motifs to use depends on the graph structure and its properties (Pržulj, 2007; Ahmed et al., 2015). Therefore, instead of assuming all motifs contribute equally to the embedding, we learn a global higherorder embedding that automatically captures the important motifs in the embedding without requiring an expert to hand select the most important motifs to use.
For this, we first concatenate the kstep embedding matrices for all motifs and all steps:
(18) 
where is a matrix. Notice that at this stage, we could simply output as the final motifbased node embeddings and use it for a downstream prediction task such as classification, link prediction, or regression. However, using directly essentially treats all motifs equally while it is known that some motifs are more important than others and the specific set of important motifs widely depends on the underlying graph structure. Therefore, by learning node embeddings from we automatically capture the important structure in the data pertaining to certain motifs and avoid having to specify the important motifs for a particular graph by hand.
Given from Eq. 18, we learn a global higherorder network embedding by solving the following:
(19) 
where is a matrix of higherorder node embeddings and is a matrix of the latent step motif embeddings. Each row of is a dimensional embedding of a node. Similarly, each column of is an embedding of a latent kstep motif feature (i.e., column of ) in the same dimensional space. In Eq. 19 we use Frobenius norm which leads to the following minimization problem:
(20) 
A similar minimization problem using Frobenius norm is solved for Eq. 15. To solve these minimization problems, we use a fast parallel cyclic coordinate descentbased (CCD) optimization scheme (Yu et al., 2012; Rossi and Zhou, 2016). We have also investigated other approaches for solving the above HONE objective function including an autoencoder (Rumelhart et al., 1986; Hinton and Salakhutdinov, 2006) and alternating least squares (ALS) (Zhou et al., 2008) and found similar results. In addition, it is straightforward to represent the (step) motifbased matrices as a tensor and derive embeddings jointly using Higher Order SVD (Tucker decomposition) (Tucker, 1966), among other higherorder tensor factorization schemes (Kolda and Bader, 2009).
2.6. Attribute Diffusion
Attributes can also be diffused and incorporated into the higherorder node embeddings. One approach is to use the motif transition probability matrix as follows:
(21) 
where is an attribute matrix and is the diffused feature matrix after steps. Here can be replaced by any of the previous motifbased matrices derived from any motif matrix formulation in Section 2.3. More generally, we define linear attribute diffusion for HONE as:
(22) 
More complex attribute diffusion processes can also be formulated such as the normalized motif Laplacian attribute diffusion defined as
(23) 
where is the normalized motif Laplacian:
(24) 
The resulting diffused attribute vectors are effectively smoothed by the attributes of related nodes governed by the particular diffusion process.
2.7. Accumulation Motif Variants
There are also summationbased motif variants.
(26) 
where is a weighted graph that counts the number of paths of length up to . More interestingly, let
(27) 
which for instance when =2, indicates the probability of randomly walking from node to node in 2 steps. Alternatively, we can generalize the above as follows:
(28)  
where is a decay factor that penalizes more distant connections.
3. Analysis
Define as the density of .
Claim 3.1 ().
Let denote an arbitrary vertex motif adjacency matrix where , then .
This is straightforward to see as the motif adjacency matrix constructed from the edge frequency of any motif with more than nodes can be viewed as an additional constraint over the initial adjacency matrix . Therefore, in the extreme case, if every edge contains at least one occurrence of motif then . However, if there exists at least one edge that does not contain an instance of then . Therefore, .
3.1. Time Complexity
Let , , the maximum degree, the number of motifs, the number of steps, = number of dimensions for each local motif embedding (Section 2.4), and dimensionality of the final node embeddings (Section 2.5).
Lemma 3.1 ().
The total time complexity of HONE is
(29) 
Proof. The time complexity of each step is provided below. For the specific HONE embedding model, we assume is squared loss, is the identity link function, and no hard constraints are imposed on the objective function in Eq. 15 and Eq. 19.
Weighted motif graphs: To derive the network motif frequencies, we use recent provably accurate estimation methods (Rossi et al., 2018b; Ahmed et al., 2016). As shown in (Rossi et al., 2018b; Ahmed et al., 2016), we can achieve estimates within a guaranteed level of accuracy and time by setting a few simple parameters in the estimation algorithm. The time complexity to estimate the frequency of all network motifs up to size 4 is in the worst case where is a small constant. Hence, represents the maximum sampled degree and can be set by the user (Rossi et al., 2018b; Ahmed et al., 2016).
After obtaining the frequencies of the network motifs, we derive a sparse weighted motif adjacency matrix for each of the network motifs. The time complexity for each weighted motif adjacency matrix is at most and this is repeated times for a total time complexity of where is a small constant. This gives a total time complexity of for this step and thus linear in the number of edges.
Motif matrix functions: The time complexity of all motif matrix functions in Section 2.3 is . Since for , the total time complexity is in the worst case. By Claim 3.1, , where and thus the actual time is likely to be much smaller especially given the rarity of some network motifs in sparse networks such as 4cliques and 4cycles.
Embedding each kstep motif graph: For a single weighted motifbased matrix, the time complexity per iteration of cyclic/block coordinate descent (Kim et al., 2014; Rossi and Zhou, 2016) and stochastic gradient descent (Yun et al., 2014; Oh et al., 2015) is at most where . Recall from Section 2.4 that we avoid explicitly computing and storing the kstep motifbased matrices by defining a linear operator corresponding to the step motifbased matrices with a time complexity that is at most times the linear operator corresponding to the step motifbased matrix. Therefore, the total time complexity for learning node embeddings for all step motifbased matrices is:
(30) 
Global higherorder node embeddings: Afterwards, all step motif embedding matrices are horizontally concatenated to obtain (Eq. 18). Each node embedding matrix is and there are of them. Thus, it takes time to concatenate them to obtain . Notice that and therefore this step is linear in the number of nodes . Furthermore, the time complexity for normalizing all columns of is for any normalization function where each column of is a dimensional vector.
Given a dense tallandskinny matrix of size where , the next step is to learn the higherorder node embedding matrix and the latent motif embedding matrix . Notice that unlike the higherorder node embeddings above that were derived for each sparse motifbased matrix (for all steps and motifs), the matrix is dense with . The time complexity per iteration of cyclic/block coordinate descent (Kim et al., 2014; Rossi and Zhou, 2016) and stochastic gradient descent (Yun et al., 2014; Oh et al., 2015) is and therefore linear in the number of nodes.
3.2. Space Complexity
Lemma 3.2 ().
The total space complexity of HONE is
(31) 
Proof. The weighted motif adjacency matrices take at most space. Similarly, the space complexity of the motifbased matrices derived from any motif matrix function is at most . Recall that the space required for some motifbased matrices where the motif being encoded is rare will be much less than (Claim 3.1). The space complexity of each step motif embedding is and therefore it takes space for all and embedding matrices. Storing the higherorder node embedding matrix takes space and the step motif embedding matrix is . Therefore, the total space complexity for and is .
4. Experiments
We investigate five methods from the proposed higherorder network representation learning framework.
4.1. Experimental Setup
We compare the proposed HONE variants to five stateoftheart methods including node2vec (Grover and Leskovec, 2016), DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015), GraRep (Cao et al., 2015), and Spectral clustering (Tang and Liu, 2011). All methods output dimensional node embeddings where . For LINE, we use 2ndorderproximity and the number of samples 60 million (Tang et al., 2015). For GraRep, we set and perform a grid search over (Cao et al., 2015). For DeepWalk, we use , , and (Perozzi et al., 2014). For node2vec, we use the same hyperparameters (, , , ) and grid search over as mentioned in (Grover and Leskovec, 2016). For the HONE variants, we set and select the number of steps automatically via a grid search over using of the labeled data. We use all edge orbits (graphlet automorphisms) (Pržulj, 2007) that contain 24 nodes and set for the local motif embeddings unless otherwise mentioned. All methods use logistic regression (LR) with an L2 penalty. The model is selected using 10fold crossvalidation on of the labeled data. Experiments are repeated for 10 random seed initializations. All data was obtained from NetworkRepository (Rossi and Ahmed, 2015a) and is publicly available for download at http://networkrepository.com.
4.2. Comparison
We compare methods from the proposed higherorder network embedding (HONE) framework to other recent embedding methods. Given a partially observed graph with a fraction of missing edges, the link prediction task is to predict these missing edges. We generate a labeled dataset of edges. Positive examples are obtained by removing of edges randomly, whereas negative examples are generated by randomly sampling an equal number of node pairs that are not connected with an edge . For each method, we learn embeddings using the remaining graph that consists of only positive examples. Using the embeddings from each method, we then learn a model to predict whether a given edge in the test set exists in or not. To construct edge features from the node embeddings, we use the mean operator defined as . For the experiments, we selected networks from a wide range of domains with fundamentally different structural characteristics. This ensures the key findings observed in this work are more useful/general and apply to networks from a wide variety of domains with different structural characteristics (Canning et al., 2018).
sochamster 
rttwittercop 
socwikiVote 
techroutersrf 
facebookPU 
infopenflights 
socbitcoinA 
Rank 


HONE (Eq. 3)  1  
HONE (Eq. 6)  2  
HONE (Eq. 8)  3  
HONE (Eq. 9)  5  
HONE (Eq. 11)  4  
Node2Vec (Grover and Leskovec, 2016)  6  
DeepWalk (Perozzi et al., 2014)  7  
LINE (Tang et al., 2015)  8  
GraRep (Cao et al., 2015)  9  
Spectral (Tang and Liu, 2011)  10 
Node2Vec 
DeepWalk 
LINE 
GraRep 
Spectral 


HONE  12.91%  14.14%  17.52%  42.43%  19.61% 
HONE  12.86%  14.10%  17.49%  42.39%  19.56% 
HONE  12.29%  13.51%  16.89%  41.60%  18.94% 
HONE  12.19%  13.42%  16.80%  41.53%  18.85% 
HONE  12.33%  13.57%  16.94%  41.74%  19.01% 
The AUC results are provided in Table 1. In all cases, the HONE methods outperform the other embedding methods with an overall mean gain of (and up to gain) across a wide variety of graphs with different characteristics. Overall, the HONE variants achieve an average gain of over node2vec, over DeepWalk, over LINE, over GraRep, and over Spectral clustering across all networks. In all cases, the gain achieved by the proposed HONE variants is significant at . We also derive a total ranking of the embedding methods over all graph problems based on mean relative gain. Results are provided in the last column of Table 1. Overall, the HONE variants always outperform the five baseline methods across all networks from a wide variety of domains with fundamentally different structural characteristics. Among the five HONE variants in Table 1, we find that HONE and HONE perform the best overall. We also note that GraRep outperforms the other baseline methods on 5 of the 7 graphs and when socbitcoinA is removed GraRep is ranked outperforming all other baseline methods. Furthermore, we also provide the mean gain of the HONE methods over each baseline averaged over all graphs in Table 2. In other words, an entry in Table 2 represents the mean gain of a HONE method (row of Table 2) relative to a baseline method (column) averaged over all graphs used for evaluation.
We also investigated using the concatenated kstep embedding matrix directly for link prediction without the additional step described in Section 2.5. The results were removed for brevity, however, we summarize the findings below. In most cases, we observed the performance to be better when global higherorder node embeddings (Section 2.5) are used as opposed to using the local node embeddings from Section 2.4 directly for prediction. We also explored using different optimization schemes for learning the embeddings. However, we found only minor differences in AUC on most of the graphs investigated. For instance, on rttwittercopen with HONEP, we found alternating least squares (ALS) and CCD to perform best with AUC followed by using an autoencoder with .
4.3. Diffusion Variants
This subsection investigates HONE variants that use attribute diffusion (Section 2.6). These methods perform attribute diffusion using the kstep motif matrices (Section 2.6) and concatenate the resulting diffused features. Unless otherwise mentioned, we use linear diffusion defined in Eq. 22 with the default hyperparameters (Section 4.1). Note the initial matrix described in Section 2.6 represents node motif counts derived by applying relational aggregates (sum, mean, and max) over each nodes local neighborhood and then scaled using Euclidean norm. We compare the HONE methods with attribute diffusion to the HONE methods without diffusion. Results are reported in Table 3. The relative gain between each pair of HONE methods is computed for each graph and Table 3 reports the mean gain for each pair of HONE methods. Overall, we observe that HONE with attribute diffusion improves predictive performance in general. We also investigated other attribute diffusion variants from Section 2.6 and noticed similar results on a few graphs tested.
HONE 
HONE 
HONE 
HONE 
HONE 


HONE  0.76%  1.30%  1.38%  1.24%  1.08% 
HONE  1.58%  2.12%  2.20%  2.06%  1.90% 
HONE  0.62%  1.15%  1.23%  1.09%  0.93% 
HONE  1.37%  1.91%  1.99%  1.85%  1.69% 
HONE  1.27%  1.81%  1.88%  1.74%  1.58% 
4.4. Runtime & Scalability
To evaluate the runtime performance and scalability of the proposed framework, we learn node embeddings for ErdösRényi graphs of increasing size (from 100 to 10 million nodes) such that each graph has an average degree of 10. In Figure 3, we observe that HONE is fast and scales linearly as the number of nodes increases. In addition, we also compare the runtime performance of HONE against node2vec (Grover and Leskovec, 2016) since it performed best among the baselines (Table 1). For the HONE variant, we use HONEP with . Default parameters are used for each method. In Figure 3, HONE is shown to be significantly faster and more scalable than node2vec as the number of nodes increases. In particular, node2vec takes 1.8 days (45.3 hours) for 10 million nodes, while HONE finishes in only 19 minutes as shown in Figure 3. Strikingly, this is times faster than node2vec.
5. Related Work
Related research is categorized below.
Higherorder network analysis: This paper introduces the problem of higherorder network embedding and proposes a general computational framework for learning such higherorder node embeddings. There has been one recent approach that used network motifs as base features for network representation learning (Rossi et al., 2018a). However, that approach is fundamentally different from the proposed framework as it focuses on learning inductive relational functions that represent compositions of relational operators applied to a base feature. Other methods use highorder network properties (such as graphlet frequencies) as features for graph classification (Vishwanathan et al., 2010), community detection (Arenas et al., 2008; Benson et al., 2016), and visualization and exploratory analysis (Ahmed et al., 2015). However, this work focuses on network representation learning using network motifs (e.g., orbit frequencies). In particular, the goal of this work is to learn higherorder node embeddings from the graph for use in a downstream prediction task.
Node embeddings: There has been a lot of interest recently in learning node (and edge (Ahmed et al., 2017a)) embeddings from largescale networks automatically (Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Rossi et al., 2018a; Ahmed et al., 2018; Cavallari et al., 2017; Ribeiro et al., 2017). See (Rossi et al., 2012) for an early survey on graph representation learning. Recent node embedding methods (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Ribeiro et al., 2017; Cavallari et al., 2017) have largely been based on the popular skipgram model (Mikolov et al., 2013; Cheng et al., 2006) originally introduced for learning vector representations of words in text. These methods all use random walks to gather a sequence of node ids which are then used to learn node embeddings (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Ribeiro et al., 2017; Cavallari et al., 2017). In particular, DeepWalk (Perozzi et al., 2014) applied the successful word embedding framework called word2vec (Mikolov et al., 2013) to embed the nodes such that the cooccurrence frequencies of pairs in short random walks are preserved. Recently, node2vec (Grover and Leskovec, 2016) adapted DeepWalk (Perozzi et al., 2014) by introducing hyperparameters that tune the depth and breadth of the random walks. GraRep (Cao et al., 2015) is a generalization of LINE (Tang et al., 2015) that incorporates node neighborhood information beyond 2hops. These approaches are becoming increasingly popular and have been shown to outperform a number of existing methods. Graph Convolutional Networks (GCNs) adapt CNNs to graphs using the simple Laplacian and spectral convolutions with a form of aggregation over the neighbors (Henaff et al., 2015; Defferrard et al., 2016; Kipf and Welling, 2017; Niepert et al., 2016). These node embedding methods may also benefit from ideas developed in this work including the weighted motif Laplacian matrices described in Section 2.3. Other work has focused on incremental methods for spectral clustering (Chen et al., 2015). Similar techniques can be used to derive incremental methods for updating HONE; however, this is outside the scope of this paper.
There is also another related body of work focused on attributed graphs. Recently, Huang et al. (Huang et al., 2017b) proposed a label informed embedding method for attributed networks. This approach assumes the graph is labeled and uses this information to improve predictive performance. However, this paper does not focus on attributedbased embeddings and therefore is significantly different. First and foremost, while the class of HONE models are able to support attributed graphs via attributed diffusion, this work does not focus on such graphs. Moreover, HONE does not require attributes or class labels on the nodes.
Heterogeneous networks (Shi et al., 2014) have also been recently considered (Chang et al., 2015; Dong et al., 2017) as well as attributed networks with labels (Huang et al., 2017b, a). Huang et al. (2017b) proposed an approach for attributed networks with labels whereas Yang et al. (2015) used text features to learn node representations. Liang et al. (2017) proposed a semisupervised approach for networks with outliers. Bojchevski et al. (2017) proposed an unsupervised rankbased approach. There has also been some recent work on semisupervised network embeddings (Yang et al., 2016; Kipf and Welling, 2017) and methods for improving the learned representations (Weston et al., 2008; Scarselli et al., 2009; Wang et al., 2016). A few work have begun to explore the problem of learning node embeddings from temporal networks (Rossi et al., 2013; Saha et al., 2018; Rahman et al., 2018). All of these approaches approximate the dynamic network as a sequence of discrete static snapshot graphs. More recently, methods have been proposed that use temporal random walks to avoid the information loss of previous discrete approximation methods (Nguyen et al., 2018). This work is different from the problem discussed in this paper.
Rolebased embeddings: Many recent node embedding methods have attempted to capture roles (Rossi and Ahmed, 2015b) by preserving the notion of structural equivalence (Everett, 1985) or the relaxed notion of structural similarity (Rossi and Ahmed, 2015b). Examples of node roles include nodes acting as hubs, bridges (acting as gatekeepers), nearcliques, and staredges. Over the previous decade, there have been many rolebased embedding (role discovery) methods that automatically learn node embeddings from graphs; see (Rossi and Ahmed, 2015b) for a survey. These methods are some of the earliest such embedding (representation learning) methods for graphs. More recently, most approaches have been based on traditional random walks and thus are unable to capture roles (structural equivalence or structural similarity) and instead capture the notion of communities (Perozzi et al., 2014; Grover and Leskovec, 2016; Ribeiro et al., 2017; Cavallari et al., 2017). In particular, these methods embed nodes in a similar way that are close to one another in the graph and therefore are largely capturing the notion of communities as opposed to roles. Instead, nodes that are structurally similar (i.e., share similar general connectivity/subgraph patterns) should be embedded in a similar way, independent of their proximity to one another in the graph. Recently, an approach called role2vec was proposed that learns rolebased node embeddings by first mapping each node to a type via a function and then uses the notion of attributed (typed) random walks to derive rolebased embeddings for the nodes that capture structural similarity (Ahmed et al., 2018). This approach was shown to generalize many existing random walkbased methods.
Graph embeddings: Methods such as DeepWalk and node2vec learn embeddings for nodes in a graph. On the other hand, there have recently been methods that learn embeddings for entire graphs (Duvenaud et al., 2015; Lee et al., 2018). These methods can be used for graphlevel tasks like graph classification. In particular, methods such as Random Walk Kernel (Vishwanathan et al., 2010), Deep Graph Kernel (Yanardag and Vishwanathan, 2015), and SkipGraph (Lee and Kong, 2017) make use of random walks to learn embeddings for entire graphs. More recently, a graph attention model was proposed in (Lee et al., 2018) and used for graph classification. Other work has also focused on developing graph embeding methods for attributed molecular graphs (Coley et al., 2017).
Improving autocorrelation: This work is also related to recent methods for improving autocorrelation and classification performance by creating new links (Gallagher et al., 2008) and even relevance in search engines (Lassez et al., 2008). Intuitively, HONE naturally estimates weights between new previously unobserved edges based on kstep motif patterns. In particular, new links are explicitly created between nodes in the kstep motif matrices.
6. Conclusion
In this work, we proposed HigherOrder Network Embeddings (HONE), a new class of embedding methods that use network motifs to learn embeddings based on higherorder connectivity patterns. We describe a general computational framework for learning such higherorder network embeddings that is flexible with many interchangeable components. The experimental results demonstrated the effectiveness of learning higherorder network representations as HONE achieves a mean relative gain in AUC of across all other methods and networks from a wide variety of application domains. Future work will investigate the framework using other useful motifbased matrix formulations.
Footnotes
 copyright: none
 ccs: Computing methodologies Artificial intelligence
 ccs: Computing methodologies Machine learning
 ccs: Mathematics of computing Graph algorithms
 ccs: Mathematics of computing Combinatorics
 ccs: Mathematics of computing Graph theory
 ccs: Information systems Data mining
 ccs: Theory of computation Graph algorithms analysis
 ccs: Computing methodologies Logical and relational learning
 Nodes in different disconnected components will never appear in a random walk together and therefore will not be assigned similar embeddings despite the fact these nodes may play the same roles with respect to general structural patterns such as network motifs.
 For convenience, denotes a weighted adjacency matrix for an arbitrary motif.
References
 Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, and Nick Duffield. 2015. Efficient Graphlet Counting for Large Networks. In ICDM. 10.
 Nesreen K. Ahmed, Ryan A. Rossi, Theodore L. Willke, and Rong Zhou. 2017a. Edge Role Discovery via HigherOrder Structures. In PAKDD. 291–303.
 Nesreen K. Ahmed, Ryan A. Rossi, Rong Zhou, John Boaz Lee, Xiangnan Kong, Theodore L. Willke, and Hoda Eldardiry. 2017b. A Framework for Generalizing Graphbased Representation Learning Methods. In arXiv:1709.04596. 8.
 Nesreen K. Ahmed, Ryan A. Rossi, Rong Zhou, John Boaz Lee, Xiangnan Kong, Theodore L. Willke, and Hoda Eldardiry. 2018. Learning Rolebased Graph Embeddings. In arXiv:1802.02896.
 Nesreen K. Ahmed, Theodore L. Willke, and Ryan A. Rossi. 2016. Estimation of Local Subgraph Counts. In BigData. 586–595.
 Alex Arenas, Alberto Fernandez, Santo Fortunato, and Sergio Gomez. 2008. Motifbased communities in complex networks. Journal of Physics A: Mathematical and Theoretical 41, 22 (2008), 224001.
 Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higherorder organization of complex networks. Science 353, 6295 (2016), 163–166.
 Aleksandar Bojchevski and Stephan Günnemann. 2017. Deep Gaussian Embedding of Attributed Graphs: Unsupervised Inductive Learning via Ranking. arXiv:1707.03815 (2017).
 James P. Canning, Emma E. Ingram, Sammantha NowakWolff, Adriana M. Ortiz, Nesreen K. Ahmed, Ryan A. Rossi, Karl R. B. Schmitt, and Sucheta Soundarajan. 2018. Network Classification and Categorization. In International Conference on Complex Networks (CompleNet).
 Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In CIKM. ACM, 891–900.
 Sandro Cavallari, Vincent W Zheng, Hongyun Cai, Kevin ChenChuan Chang, and Erik Cambria. 2017. Learning community embedding with community detection and node embedding on graphs. In CIKM. 377–386.
 Shiyu Chang, Wei Han, Jiliang Tang, GuoJun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In SIGKDD. 119–128.
 PinYu Chen, Baichuan Zhang, Mohammad Al Hasan, and Alfred O Hero. 2015. Incremental method for spectral clustering of increasing orders. In arXiv:1512.07349.
 Winnie Cheng, Chris Greaves, and Martin Warren. 2006. From ngram to skipgram to concgram. Int. J. of Corp. Linguistics 11, 4 (2006), 411–433.
 Connor W Coley, Regina Barzilay, William H Green, Tommi S Jaakkola, and Klavs F Jensen. 2017. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Info. & Mod. (2017).
 Michael Collins, Sanjoy Dasgupta, and Robert E Schapire. 2002. A generalization of principal components analysis to the exponential family. In NIPS. 617–624.
 Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
 Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In SIGKDD.
 David K. Duvenaud, Dougal Maclaurin, Jorge AguileraIparraguirre, Rafael Bombarell, Timothy Hirzel, Alan AspuruGuzik, and Ryan P. Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In NIPS.
 M.G. Everett. 1985. Role similarity and complexity in social networks. Social Networks 7, 4 (1985), 353–359.
 S. Fortunato. 2010. Community detection in graphs. Phy. Rep. 486, 35 (2010).
 Brian Gallagher, Hanghang Tong, Tina EliassiRad, and Christos Faloutsos. 2008. Using ghost edges for classification in sparsely labeled networks. In SIGKDD.
 Gene H Golub and Charles F Van Loan. 2012. Matrix computations. JHU Press.
 Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In SIGKDD. 855–864.
 Nathan Halko, PerGunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288.
 Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graphstructured data. arXiv:1506.05163 (2015).
 Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504–507.
 Xiao Huang, Jundong Li, and Xia Hu. 2017a. Accelerated attributed network embedding. In SDM.
 Xiao Huang, Jundong Li, and Xia Hu. 2017b. Label informed attributed network embedding. In WSDM.
 Jingu Kim, Yunlong He, and Haesun Park. 2014. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization 58, 2 (2014), 285–319.
 Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In ICLR.
 Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455–500.
 JeanLouis Lassez, Ryan Rossi, and Kumar Jeev. 2008. Ranking Links on the Web: Search and Surf Engines. IEA/AIE (2008), 199–208.
 John Boaz Lee and X. Kong. 2017. SkipGraph: Learning graph embeddings with an encoderdecoder model. In ICLR OpenReview.
 John Boaz Lee, Ryan Rossi, and Xiangnan Kong. 2018. Graph Classification using Structural Attention. In SIGKDD.
 Jiongqian Liang, Peter Jacobs, and Srinivasan Parthasarathy. 2017. SEANO: Semisupervised Embedding in Attributed Networks with Outliers. In arXiv:1703.08100.
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In ICLR Workshop. 10.
 Giang Hoang Nguyen, John Boaz Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungchul Kim. 2018. ContinuousTime Dynamic Network Embeddings. In WWW BigNet.
 Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. In arXiv:1605.05273.
 Jinoh Oh, WookShin Han, Hwanjo Yu, and Xiaoqian Jiang. 2015. Fast and robust parallel SGD matrix factorization. In SIGKDD. ACM, 865–874.
 Bryan Perozzi, Rami AlRfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. 701–710.
 Nataša Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinfo. 23, 2 (2007), e177–e183.
 Mahmudur Rahman, Tanay Kumar Saha, Mohammad Al Hasan, Kevin S Xu, and Chandan K Reddy. 2018. DyLink2Vec: Effective Feature Representation for Link Prediction in Dynamic Networks. arXiv:1804.05755 (2018).
 Leonardo F.R. Ribeiro, Pedro H.P. Saverese, and Daniel R. Figueiredo. 2017. Struc2Vec: Learning Node Representations from Structural Identity. In SIGKDD.
 Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. 2009. A randomized algorithm for principal component analysis. SIAM J. Matrix Anal. Appl. 31, 3 (2009).
 Ryan A. Rossi and Nesreen K. Ahmed. 2015a. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. 4292–4293. http://networkrepository.com
 Ryan A. Rossi and Nesreen K. Ahmed. 2015b. Role Discovery in Networks. Transactions on Knowledge and Data Engineering 27, 4 (April 2015), 1112–1131.
 Ryan A. Rossi, Brian Gallagher, Jennifer Neville, and Keith Henderson. 2013. Modeling Dynamic Behavior in Large Evolving Graphs. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. 667–676.
 Ryan A. Rossi, Luke K. McDowell, David W. Aha, and Jennifer Neville. 2012. Transforming graph data for statistical relational learning. Journal of Artificial Intelligence Research 45, 1 (2012), 363–441.
 Ryan A. Rossi and Rong Zhou. 2016. Parallel Collective Factorization for Modeling Large Heterogeneous Networks. In Social Network Analysis and Mining. 30.
 Ryan A. Rossi, Rong Zhou, and Nesreen K. Ahmed. 2018a. Deep Inductive Network Representation Learning. In Proceedings of the 3rd International Workshop on Learning Representations for Big Networks (WWW BigNet).
 Ryan A. Rossi, Rong Zhou, and Nesreen K. Ahmed. 2018b. Estimation of Graphlet Counts in Massive Networks. In IEEE Transactions on Neural Networks and Learning Systems (TNNLS). 1–14.
 David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning internal representations by backpropagating errors. Nature 323 (1986), 533–536.
 Tanay Kumar Saha, Thomas Williams, Mohammad Al Hasan, Shafiq Joty, and Nicholas K Varberg. 2018. Models for Capturing Temporal Smoothness in Evolving Networks for Learning Latent Representation of Nodes. In arXiv:1804.05816.
 Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80.
 Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. 2014. HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks. TKDE 26, 10 (2014), 2479–2492.
 Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Largescale Information Network Embedding. In WWW. 1067–1077.
 Lei Tang and Huan Liu. 2011. Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23, 3 (2011), 447–478.
 Ledyard R Tucker. 1966. Some mathematical notes on threemode factor analysis. Psychometrika 31, 3 (1966), 279–311.
 S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. JMLR 11 (2010), 1201–1242.
 Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In SIGKDD. 1225–1234.
 Jason Weston, Frédéric Ratle, and Ronan Collobert. 2008. Deep learning via semisupervised embedding. In ICML. 1168–1175.
 Pinar Yanardag and S. V. N. Vishwanathan. 2015. Deep Graph Kernels. In SIGKDD.
 Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. 2015. Network Representation Learning with Rich Text Information.. In IJCAI.
 Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Revisiting semisupervised learning with graph embeddings. arXiv:1603.08861 (2016).
 HsiangFu Yu, ChoJui Hsieh, Si Si, and Inderjit S. Dhillon. 2012. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. In IEEE International Conference of Data Mining.
 Hyokun Yun, HsiangFu Yu, ChoJui Hsieh, SVN Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: Nonlocking, stOchastic Multimachine algorithm for Asynchronous and Decentralized matrix completion. VLDB 7, 11 (2014), 975–986.
 Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Largescale parallel collaborative filtering for the netflix prize. LNCS 5034 (2008).