On Spectral Graph Embedding: A NonBacktracking Perspective and Graph Approximation
Abstract
Graph embedding has been proven to be efficient and effective in facilitating graph analysis. In this paper, we present a novel spectral framework called NOnBacktracking Embedding (NOBE), which offers a new perspective that organizes graph data at a deep level by tracking the flow traversing on the edges with backtracking prohibited. Further, by analyzing the nonbacktracking process, a technique called graph approximation is devised, which provides a channel to transform the spectral decomposition on an edgetoedge matrix to that on a nodetonode matrix. Theoretical guarantees are provided by bounding the difference between the corresponding eigenvalues of the original graph and its graph approximation. Extensive experiments conducted on various realworld networks demonstrate the efficacy of our methods on both macroscopic and microscopic levels, including clustering and structural hole spanner detection.
[1]itemsep=0pt,partopsep=0pt,parsep=topsep=3pt \setitemize[1]itemsep=0pt,partopsep=0pt,parsep=topsep=3pt \setdescriptionitemsep=0pt,partopsep=0pt,parsep=topsep=3pt
1 Introduction
Graph representations, which describe and store entities in a nodeinterrelated way [25] (such as adjacency matrix, Laplacian matrix, incident matrix, etc), provide abundant information for the great opportunity of mining the hidden patterns. However, this approach poses two principal challenges: 1) one can hardly apply offtheshelf machine learning algorithms designed for general data with vector representations, and adapt them to the graph representations and 2) it’s intractable for large graphs due to limited space and time constraints. Graph embedding can address these challenges by representing nodes using meaningful lowdimensional latent vectors.
Due to its capability for assisting network analysis, graph embedding has attracted researchers’ attention in recent years [26, 21, 9, 27]. The goal of a good graph embedding algorithm should be preserving both macroscopic structures (e.g., community structure) and microscopic structures (e.g., structural hole spanner) simultaneously. However, an artless graph embedding algorithm will lead to unsatisfactory lowdimensional embeddings in which meaningful information may lose or be indistinguishable. For example, the pioneering work [26] mainly focusing on locally preserving the pairwise distance between nodes can result in the missing of dissimilarity. As a result, it may fail to preserve the community membership, as shown later in the experimental results in Table 3. Some works [9, 21] attempt to preserve highorder proximity between nodes by considering truncated random walk. Since conventional truncated random walk is a Markov chain without any examining the special structure of networks, key nodes information (e.g., structural hole spanners, outliers) will be unrecognizable. As far as we know, present approaches cannot achieve the graph embedding goal well.
In this paper, to fulfill the goal of preserving macroscopic and microscopic structures, we propose a novel graph embedding framework NOBE and its graph approximation algorithm NOBEGA. The main contributions of this paper are summarized as follows:

We develop an elegant framework NOnBacktracking Embedding (NOBE), which jointly exploits a nonbacktracking random walk and spectral graph embedding technique (in Section 3). The benefits of NOBE are: 1) From an edge perspective, we encode the graph data structure to an oriented line graph by mapping each edge to a node, which facilitates to track the flow traversing on edges with backtracking prohibited. 2) By figuring out node embedding from the oriented line graph instead of the original graph, community structure, structural holes and outliers can be well distinguished (as shown in Figures 1 and 2).

Graph approximation technique NOBEGA is devised by analyzing the pattern of nonbacktracking random walk and switching the order of spectral decomposition and summation (in Section 3.3). It reduces the complexity of NOBE with theoretical guarantee. Specifically, by applying this technique, we found that conventional spectral method is just a reduced version of our NOBE method.

In section 4, we also design a metric RDS based on embedding community structure to evaluate the nodes’ topological importance in connecting communities, which facilitates the discovery of structural hole (SH) spanners. Extensive experiments conducted on various networks demonstrate the efficacy of our methods in both macroscopic and microscopic tasks.
2 Related Work
Our work is mainly related to graph embedding and nonbacktracking random walk. We briefly discuss them in this section.
2.1 Graph Embedding
Several approaches aim at preserving firstorder and secondorder proximity in nodes’ neighborhood. [27] attempts to optimize it using semisupervised deep model and [26] focuses on largescale graphs by introducing the edgesampling strategy. To further preserve global structure of the graph, [22] explores the spectrum of the commute time matrix and [21] treats the truncated random walk with deep learning technique. Spectral method and singular value decomposition are also applied to directed graphs by exploring the directed Laplacian matrix [7] or by finding the general form of different proximity measurements [19]. Several works also consider joint embedding of both node and edge representations [28, 1] to give more detailed results. By contrast, our work can address graph embedding using a more expressive and comprehensive spectral method, which gives more accurate vector representations in a more explainable way yet with provable theoretical guarantees.
2.2 Nonbacktracking Random Walk
Nonbacktracking strategy is closely related to Ihara’s zeta function, which plays a central role in several graphtheoretic theorems [13, 3]. Recently, in machine learning and data mining fields, some important works have been focusing on developing nonbacktracking walk theory. [14] and [24] demonstrate the efficacy of the spectrum of nonbacktacking operator in detecting communities, which overcomes the theoretic limit of classic spectral clustering algorithm and is robust to sparse networks. [17] utilizes nonbacktracking strategy in influence maximization, and the nice property of locally treelike graph is fully exploited to complete the optimality proof. The study of eigenvalues of nonbacktracking matrix of random graphs in [3] further confirms the spectral redemption conjecture proposed in [14] that above the feasibility threshold, community structure of the graph generated from stochastic block model can be accurately discovered using the leading eigenvectors of nonbacktracking operator. However, to the best of our knowledge, there is no work done on analyzing the theory of nonbacktracking random walk for graph embedding purposes.
3 Methodology
In this section, we firstly define the problem. Then, our NOBE framework is given in detail. At last, we present graph approximation technique, followed by a discussion. To facilitate the distinction, scalars are denoted by lowercase letters (e.g., ), vectors by bold lowercase letters (e.g., ,), matrices by bold uppercase letters (e.g., ,) and graphs by calligraphic letters (e.g., ). The basic symbols used in this paper are also described in Table 1.
Symbol  Definition 

Original graph (edge set omitted as )  
Node, edge set and its corresponding volume in  
Degree of node (without ambiguity denoted as )  
the neighbor set of node in  
Adjacency, weighted adjacency, diagonal degree matrices  
Oriented line graph  
Nonbacktracking transion matrix  
Directed Laplacian matrix  
Perron vector  
Diagonal matrix with entries  
3.1 Problem Formulation
Generally, a graph is represented as , where is set of nodes and is set of edges (, ). When the weighted adjacency matrix (representing the strength of connections between nodes) is presented, edge set can be omitted as . Note that when is undirected and unweighted, we use instead of . Since most machine learning algorithms can not conduct on this matrix effectively, our goal is to learn lowdimensional vectors which can be fitted to them. Specifically, we focus on graph embedding to learning lowdimensional vectors, and simultaneously achieve two objectives: decoupling nodes’ relations and dimension reduction. Graph embedding problem is formulated as:
(Graph Embedding) Given a graph , for a fixed embedding dimension , the purpose of graph embedding is to learn a mapping function , for .
3.2 NonBacktracking Graph Embedding
We proceed to present NOBE. Inspired by the idea of analyzing flow dynamics on edges, we first embed the graph into an intermediate space from a nonbacktracking edge perspective. Then, summation over the embedding on edges is performed in the intermediate space to generate accurate node embeddings. In the following, we only elaborate the detail of embedding undirected unweighted graphs, while the case for weighted graphs is followed accordingly. We first define the concept of a nonbacktracking transition matrix, which specifies the probabilities that the edges directed from one node to another with backtracking prohibited.
(NonBacktracking Transition Matrix) Given an undirected unweighted graph , we define its nonbacktracking transition matrix as a matrix, which can be regarded as a random walk on directed edges of graph with backtracking prohibited. Mathematically,
(1) 
where , , , and , are edges with directions taken into consideration. By encoding a graph into a nonbacktracking transition matrix, it allows the diffusion dynamics to be considered. Notice that, different from the original nonbacktracking operator [14], we also take the diffusion probability of the edges into account by the definition of . In this way, it enables us to capture more information for complex topological structure of the graph.
Further, nonbacktracking random walk is a nonMarkovian chain, which uses nonbacktracking transition matrix as its transition probability matrix. To make the analysis more tractable, we transform the nonMarkovian process to a Markovian process by introducing an oriented line graph.
(Oriented Line Graph) Given an undirected unweighted graph , its oriented line graph is a directed weighted graph, whose node set is the set of oriented edges in , and weighted adjacency matrix is the nonbacktracking transition matrix.
Figure 2 illustrates the intuition behind the oriented line graph. It can be seen that the oriented line graph has the potential ability to characterize community boundary and emphasize structural hole spanners. An intuitive graph embedding approach is to perform spectral decomposition on nonbacktracking transition matrix . However, is an asymmetric matrix, so it is not guaranteed to have real eigenvalues and eigenvectors. Also, from the definition of , some terms in is invalid at , for . We propose the Proposition 3.2 to make full use of in spectral graph embedding.
If the minimum degree of the connected graph is at least 2, then the oriented line graph is valid and strongly connected. {proof} The proof is given in the Appendix A.1. Proposition 3.2 also means that under this condition the nonbacktracking transition matrix is irreducible and aperiodic. In particular, according to the PerronFrobenius Theorem [11], it implies that for a strongly connected oriented line graph with nonnegative weights, matrix has a unique left eigenvector with all entries positive. Let us denote as the largest real eigenvalue of matrix . Then,
For directed weighted graphs, nodes’ importance in topological structure is not determined by the degree of nodes as in an undirected graph, since directed edges coming in or going out of a node may be blocked or be rerouted back immediately in the next path. Hence, as discussed in lots of literatures [8], we use Perron vector to denote node importance in the oriented line graph. Our objective is to ensure that linked nodes in oriented line graph should be embedded into a close location in the embedding space.
Suppose we want to embed nodes in the oriented line graph into one dimensional vector . Regarding to each edge in the oriented line graph , by considering its weights, our goal is to minimize . Taking source nodes’ importance indicated by into consideration and summing the loss over all edges, we define our loss function as
(2) 
Specifically, the Eq. (2) can be written in a matrix form by the following proposition. {proposition} Eq (2) has the following form
(3) 
where is called combinatorial Laplacian for directed graphs, and is a diagonal matrix with . {proof} The proof is given in the Appendix A.2.
Following the idea of [8], we consider the Rayleigh quotient for directed graphs as follows:
The denominator of Rayleigh quotient takes the amount of weight distribution in the directed graph indicated by into account. Therefore, we add as a constraint, which can eliminate the arbitrary rescaling caused by and . By solving Eq. (3) with this constraint, we get the following eigenvector problem:
(4) 
It is now clear that our task of this stage becomes selecting smallest eigenvectors of to form a vector representation for directed edges from nonbacktracking perspective. By using the following proposition, we can further reduce the matrix into a more concise and elegant form. {proposition} Both the sums of rows and columns of the nonbacktracking transition matrix equal one. That is, , where 1 is a column vector of ones. {proof} The proof is given in the Appendix A.3. From the Proposition 3.2, we know that 1 is a Perron vector of . By normalizing (subject to ), we can further have for each node . Then, we have
(5) 
where , and can be thought of as a normalized Laplacian matrix for oriented line graphs, compared to traditional Laplacian matrix for undirected graphs. can be regarded as a symmetrication process on a matrix. Specifically, this process will be equivalent to neutralize the weights between zero and if is not null, since and cannot be nonzero at the same time.
According to Eq 4 and Eq 5, we could obtain dimensional embedding vectors of directed edges by computing smallest nontrivial eigenvectors of . By summing these results over related embedding vectors, we can obtain node embeddings of graph . Here we introduce two sum rules: insum and outsum. Suppose we have got a onedimensional vector of the embedding of edges denoted by . For any node , we define the rule of insum by , which sums of all the incoming edges’ embeddings associated with . We define the rule of outsum by , which sums of all the outgoing edges’ embeddings associated with . Our graph embedding algorithm is described in algorithm 1.
3.3 Graph Approximation
In the previous part, we present a spectral graph embedding algorithm NOBE, which can preserve both macroscopic and microscopic structures of the original graph. The main procedure of NOBE uses a twostep operation sequentially: eigenvector decomposition and summation of incoming edge embeddings. The first step is conducted on a matrix , which is equivalent to compute the several largest eigenvectors on , denoted as . In this section, we will show how to speedup the algorithm by reversing these two steps. By graph approximation technique, we present an eigenvector decomposition algorithm acting on a matrix with provable approximation guarantees.
Suppose that is an eigenvector of of dimensions on directed edges, then based on the definition of insum and outsum, and are vectors of dimensions after performing insum and outsum operations. If these exists a matrix and a matrix , such that
(6) 
This implies that if matrix adequately approximates , without operating on matrix , one can perform spectral decomposition directly on , which is much smaller than , to get and . We can view as an aggregating version of matrix , which means that contains almost the same amount of information as for our embedding purpose. Next, to compose matrix, for any node , we consider its outsum operation when applying matrix on it.
There exists a matrix , such that for arbitrary node , , , if , otherwise 0. Moreover, can be approximated as . {proof} The proof is given in the Appendix A.5. Likewise, for insum operation, . After removing constant factors and transforming this formula into a matrix form, we have
(7) 
where is the identity matrix and . By switching the order of spectral decomposition and summation, the approximation target is achieved. Now, our graph approximation algorithm NOBEGA is just directly selecting the second to the largest eigenvectors of as our embedding vectors. As these eigenvectors of dimensions have insum and outsum embedding parts, consistent with NOBE, we simply choose insum part as the final node embeddings.
To prove the approximation guarantee of NOBEGA, we first introduce some basic notations from spectral graph theory. For a matrix , we write , if is positive semidefinite, Similarly, we write , if , which is also equivalent to , for all . For two graphs and with the same node set, we denote if their Laplacian matrix . Recall that , where denotes an item in weighted adjacency matrix of . It is clear that dropping edges will decrease the value of this quadratic form. Now, we define the approximation between two graphs based on the difference of their Laplacian matrices. {Definition} (capproximation graph) For some , a graph is called a approximiation graph of graph , if Based on the Definition 3.3, we present the Theorem 3.3, which further shows the relationship of and its approximation graph in terms of their eigenvalues. {theorem} If is a approximation graph of graph , then
where is the kth smallest eigenvalue of the corresponding graph. {proof} The proof is given in the Appendix A.4. To relax the strict conditions in Definition 3.3, we define a probabilistic version of the approximation graph by using elementwise constraints. {Definition} (approximation graph) For some , a graph is called a approximiation graph of graph , if is satisfied with probability at least . A probabilistic version of Theorem 3.3 follows accordingly. At last, we claim that matrix approximates matrix well by the following Theorem, which means the approximation of NOBEGA is adequate. {theorem} Suppose that the degree of the original graph obeys Possion distribution with parameter , i.e., . Then, for some small , graph is a approximation graph of the graph , where is , and is . {proof} The proof is given in the Appendix A.6.
3.4 Time Complexity and Discussion
A sparse implementation of our algorithm in Matlab is publicly available
Classic spectral method based on is just a reduced version of NOBE (proof in Appendix A.7). For sparse and degreeskewed networks, nodes with a large degree will affect many related eigenvectors. Therefore, the previous leading eigenvector that corresponds to community structure will be lost in the bulk of worthless eigenvectors, and hence fail to preserve meaningful structures [14]. However, our spectral framework with nonbacktracking strategy can overcome this issue.
4 Experimental Results
In this section, we first introduce datasets and compared methods used in the experiments. After that, we present empirical evaluations on clustering and structural hole spanner detection in detail.
4.1 Dataset Description
All networks used here are undirected, which are publicly available on SNAP dataset platform [15]. They vary widely from a range of characteristics such as network type, network size and community profile. They include three social networks: karate (real), youtube (online), enronemail (communication); three collaboration networks: cahepth, dblp, cacondmat (bipartite of authors and publications); three entity networks: dolphins (animals), usfootball (organizations), polblogs (hyperlinks). The summary of the datasets is shown in Table 2. Specifically, we apply a community detection algorithm, i.e., RanCom [12], to show the detected community number and maximum community size.
4.2 Compared Methods
We compare our methods with the stateoftheart algorithms. The first three are graph embedding methods. The others are SH spanner detection methods . We summarize them as follows:

NOBE, NOBEGA: Our spectral graph embedding method and its graph approximation version.

node2vec [9]: A feature learning framework extending Skipgram architecture to networks.

LINE [26]: The version of combining firstorder and secondorder proximity is used here.

Deepwalk [21]: Truncated random walk and language modeling techniques are utilized.

HAM [10]: A harmonic modularity function is proposed to tackle the SH spanner detection.

Constraint [5]: A constraint introduced to prune nodes with certain connectivity being candidates.

Pagerank [20]: Nodes with highest pagerank score will be selected as SH spanners.

Betweenness Centrality (BC) [4]: Nodes with highest BC will be selected as SH spanners.

HIS [16]: Designing a twostage information flow model to optimize the provided objective function.

AP_BICC [23]: Approximate inverse closeness centralities and articulation points are exploited.
Characteristics  #Community  #Max members  

Datasets  # Node  # Edge  RankCom  RankCom 
karate  34  78  2  18 
dolphins  62  159  3  29 
usfootball  115  613  11  17 
polblogs  1,224  19,090  7  675 
cahepth  9,877  25,998  995  446 
cacondmat  23,133  93,497  2,456  797 
emailenron  36,692  183,831  3,888  3,914 
youtube  334,863  925,872  15,863  37,255 
dblp  317,080  1,049,866  25,633  1,099 
4.3 Performance on Clustering
Modularity  Permanence  

Datasets  Clustering Methods  NOBE  NOBEGA  node2vec  LINE  Deepwalk  NOBE  NOBEGA  node2vec  LINE  Deepwalk 
karate  kmeans  0.449(1)  0.449(1)  0.335(5)  0.403(3)  0.396(4)  0.350(1)  0.350(1)  0.335(4)  0.182(5)  0.350(1) 
AM  0.449(1)  0.449(1)  0.335(4)  0.239(5)  0.430(3)  0.356(1)  0.350(2)  0.205(5)  0.232(4)  0.311(3)  
dolphins  kmeans  0.510(2)  0.522(1)  0.460(3)  0.187(5)  0.401(4)  0.250(2)  0.268(1)  0.196(3)  0.166(5)  0.187(4) 
AM  0.514(2)  0.522(1)  0.458(3)  0.271(5)  0.393(4)  0.233(2)  0.249(1)  0.132(4)  0.189(5)  0.189(3)  
usfootbal  kmeans  0.610(2)  0.611(1)  0.605(3)  0.562(4)  0.464(5)  0.321(1)  0.321(1)  0.304(3)  0.311(2)  0.039(5) 
AM  0.612(1)  0.609(2)  0.589(3)  0.492(4)  0.464(5)  0.330(1)  0.330(1)  0.279(4)  0.307(3)  0.039(5)  
cahepTh  kmeans  0.639(1)  0.609(2)  0.597(3)  0.01(5)  0.424(4)  0.412(1)  0.337(3)  0.379(2)  0.948(5)  0.261(4) 
AM  0.635(1)  0.614(2)  0.606(3)  0.05(5)  0.453(4)  0.435(1)  0.416(2)  0.406(3)  0.949(5)  0.338(4)  
condmat  kmeans  0.515(1)  0.495(3)  0.515(1)  0(5)  0.357(4)  0.330(1)  0.288(3)  0.330(1)  0.984(5)  0.197(4) 
AM  0.528(1)  0.502(3)  0.520(2)  0(5)  0.370(4)  0.391(1)  0.327(3)  0.388(2)  0.994(5)  0.249(4)  
enronemail  kmeans  0.219(2)  0.221(1)  0.213(3)  0(5)  0.178(4)  0.096(2)  0.153(1)  0.080(3)  0.985(5)  0.049(4) 
AM  0.215(3)  0.220(1)  0.218(2)  0(5)  0.207(4)  0.120(3)  0.194(1)  0.180(2)  0.996(5)  0.108(4)  
polblogs  kmeans  0.428(1)  0.428(1)  0.357(3)  0.200(4)  0.084(5)  0.138(1)  0.136(2)  0.066(3)  0.569(5)  0.187(4) 
AM  0.428(1)  0.427(2)  0.376(3)  0.266(4)  0.065(5)  0.138(1)  0.132(2)  0.096(3)  0.509(5)  0.176(4)  
Clustering is an important unsupervised application used for automatically separating data points into clusters. Our graph embedding method is used for embedding nodes of a graph into vectors, on which clustering method can be directly employed. Two evaluation metrics considered are summarized as follows:

Modularity [18]: Modularity is a widely used quantitative metric that measures the likelihood of nodes’ community membership under the perturbation of the Null model. Mathematically, where is the indicator function. indicates the community node belongs to. In practice, we add a penalty if a clearly wrong membership is predicted.

Permanence [6]: It is a vertexbased metric, which depends on two factors: internal clustering coefficient and maximum external degree to other communities. The permanence of a node that belongs to community is defined as follows: where is the internal degree. is the maximum degree that node links to another community. is the internal clustering coefficient. Generally, positive permanence indicates a good community structure. To penalize apparently wrong community assignment, is set to , if .
For the clustering application, we summary the performance of our methods, i.e., NOBE and NOBEGA, against three stateoftheart embedding methods on seven datasets in terms of modularity and permanence in Table 3. Two types of classic clustering methods are used, i.e., kmeans and agglomerative method (AM). From these results, we have the following observations:

In terms of modularity and permanence, NOBE and NOBEGA outperform other graph embedding methods over all datasets under both kmeans and AM. Positive permanence scores on all datasets indicate that meaningful community structure is discovered. Specifically, node2vec obtains competing results on condmat under kmeans. As for LINE, it fails to predict useful community structure on most large datasets except karate and usfootball. Deepwalk gives mediocre results on most datasets and bad results on usfootball, polblogs and enronemail under kmeans.

Figure 3 reports the overall performance. NOBE and NOBEGA achieves superior embedding performance for the clustering application. Moreover, it practically demonstrates that NOBEGA approximates NOBE very well on various kinds of networks despite the difference on their link density, node preferences and community profiles. To our surprise, on some datasets NOBEGA even achieve slightly better performance than NOBE. We conjecture that this improvement arises because of the introducing of the randomization and the preferences of evaluation metrics. Specifically, in terms of modularity, the percentage of improvement margin of NOBE is over node2vec, over LINE and over Deepwalk. Regarding to permanence, the percentage of the improvement margin of NOBE is over node2vec and over Deepwalk.
4.4 Performance on Structural Hole Spanner Detection
Comparative Methods  
Datasets  #SH Spanners  Influence Model  NOBE  HAM  Constraint  PageRank  BC  HIS  AP_BICC 
karate  3  LT  0.595  0.343  0.295  0.159  0.159  0.132  0.295 
IC  0.003  0.002  0.002  0.001  0.001  0.001  0.002  
SH spanners  [3 20 14]  [3 20 9]  [1 34 3]  [34 1 33]  [1 34 33]  [32 9 14]  [1 3 34]  
youtube  78  LT  4.664  3.951  2.447  1.236  1.226  3.198  1.630 
IC  4.375  2.452  1.254  0.662  0.791  2.148  0.799  
dblp  42  LT  8.734  5.384  0.404  0.357  0.958  0.718  0.550 
IC  7.221  3.578  0.229  0.190  0.821  0.304  0.495 
Generally speaking, in a network, structural hole (SH) spanners are the nodes bridging between different communities, which are crucial for many applications such as diffusion controls, viral marketing and brain functional analysis [2, 16, 5]. Detecting these bridging nodes is a nontrivial task. To exhibit the power of our embedding method in placing key nodes into accurate positions, we first employ our method to embed the graph into lowdimensional vectors and then detect structural hole spanners in that subspace. We compare our method with SH spanner detection algorithms that are directly applied on graphs. To evaluate the quantitative quality of selected SH spanners, we use a evaluation metric called Structural Hole Influence Index (SHII) proposed in [10]. This metric is designed by simulating information diffusion processes under certain information diffusion models in the given network.

Structural Hole Influence Index (SHII) [10]: Regarding a SH spanner candidate , we compute its SHII score by performing the influence maximization process several times. For each time, to activate the influence diffusion process, we randomly select a set of nodes from the community that belongs to. Node and node set is combined as seed set to propagate the influence. After the propagation, SHII score is obtained by computing the relative difference between the number of activated nodes in the community and in other communities: where is the set of communities. is the indicator function which equals one if node is influenced, otherwise .
For each SH spanner candidate, we run the information diffusion under linear threshold model (LT) and independent cascade model (IC) 10000 times to get average SHII score. To generate SH spanner candidates from embedded subspace, in which our embedding vectors lie, we devise a metric for ranking nodes:

Relative Deviation Score (RDS): Suppose that for each node , its lowdimensional embedding vector is represented as . We apply kmeans to separate nodes into appropriate clusters with denoting cluster set. For a cluster , the mean of its points is . The Relative Deviation Score, which measures how far a data point is deviating from its own community attracted by other community, is defined as:
where denotes the cluster belongs to. And indicates the radius of cluster .
In our lowdimensional space, nodes with highest RDS will be selected as candidates of SH spanners. We summarize our embedding method against other SH spanner detection algorithms in Table 4. Due to space limit, we omit the results of other embedding methods as they totally fail on this task. The number of SH spanners shown in the second column is chosen based on the network size and community profile. Actually, too many SH spanners will lead to the propagation activating the entire network. We outperform all SH spanner detection algorithms under LT and IC models on all three datasets. Specifically, on karate network, we identify three SH spanners, i.e., 3, 20 and 14, which can be regarded as a perfect group that can influence both clusters, seen from Figure 1. On average, our method NOBE achieves a significant improvement against stateoftheart algorithm HAM, which shows the power of our method in accurate embedding.
4.5 Parameter Analysis
Dimension is usually considered as a intrinsic characteristic, and often needs to be artificially predefined. With varying dimension, we report the clustering quality under AM on two datasets in Figure 4. On football network with 11 ground truth communities, NOBE, NOBEGA and node2vec achieves reasonable results on dimension or . After , NOBE, NOBEGA, node2vec and deepwalk begin to drop. Followed by a sudden drop, node2vec still increases gradually. Reported by RankCom, cahepth network has 995 communities. Nevertheless, prior to dimension , NOBE, NOBEGA, node2vec and deepwalk have already obtained community structure with good quality. The performance will slightly increase afterwards. Consistent with studies on spectral analysis of graph matrices [25], community number is a good choice for dimension. However, it’s also rather conservative, since good embedding methods could preserve great majority of graph information in much shorter vectors. The choice of a large number greater than community number should be cautious since redundant information added may deteriorate embedding results .
5 Conclusion and Outlook
This paper proposes NOBE, a novel framework leveraging the nonbacktracking strategy for graph embedding. It exploits highly nonlinear structure of graphs by considering a nonMarkovian dynamics. As a result, it can handle both macroscopic and microscopic tasks. Experiments demonstrate the superior advantage of our algorithm over stateoftheart baselines. In addition, we carry out a graph approximation technique with theoretical guarantees for reducing the complexity and also for analyzing the different flows on graphs. To our surprise, NOBEGA achieves excellent performance at the same level as NOBE.
We hope that our work will shed light on the analysis of algorithms based on flow dynamics of graphs, especially on spectral algorithms. Graph approximation can be further investigated by considering the perturbation of eigenvectors. We leave it for future work.
Acknowledgement
This work is supported in part by National Key R& D Program of China through grants 2016YFB0800700, and NSF through grants IIS1526499, and CNS1626432, and NSFC 61672313, 61672051, 61503253, and NSF of Guangdong Province 2017A030313339.
Appendix A Appendix
a.1
Proposition 3.1 If the minimum degree of the connected graph is at least 2, then the oriented line graph is valid and strongly connected. {proof} Assume that and are two arbitrary nodes in the oriented line graph . The proposition is equivalent to prove that can be reached from . Three situations should be considered:
1) if and , then is directly linked to ;
2) if and which means there is a directed edge from to . We delete the node , i.e., node , in the original graph . Since the minimum degree of is at least two. Therefore, node and node are still mutually reachable in graph . A Hamilton Path from node to node can be selected with passing through other existing nodes only once, which satisfies the nonbacktracking condition. Adding node , i.e., node , back into the graph will generate a nonbacktracking path , which means is reachable from in the oriented line graph ;
3) if are mutually unequal. Assume that we delete edges and in graph , then graph is still connected. There exists a Hamilton path connecting node and . Thus, with satisfying the nonbacktracking condition, there exists a directed path connecting node and node in the oriented line graph .
Overall, every valid node in graph can be reached, if graph has a minimum degree at least two.
a.2
Proposition 3.2 Our loss function is
To be concise, we use , denote the node set and the edge set of separately. , denote nodes in . Considering every node pair, we have the following loss function
Dividing the term into two parts by regarding each node pair as ordered and expanding the formula, we get
(8)  
Then, we do the deduction for the first part. A similar proof can be applied to the second part. Enumerating each node in the first part, we get
(9)  
Due to proposition 3.3, the sum of each row is equal to one, i.e.,
The first part of above equation becomes
Here, is a diagonal matrix with Since , so the second sum in the second term of equation 9 becomes
Then, the matrix form of the second term in equation 9 becomes
Arranging the terms in a particular order, we can easily see the third part in equation 9
Adding up all terms and removing the constant factor, we get our loss function
a.3
Proposition 3.3 Both the sums of rows and columns of the nonbacktracking transition matrix equal one. {proof} The proof is simple, since concerning each row or column, the values of nonzero items are equal. For an arbitrary row related to node , the sum of this row in the nonbacktracking matrix is
Similarly, we can get the same result for each column.
a.4
Theorem 3.1 If and are graphs such that is a approximation graph of graph , then
where is the kth smallest eigenvalue of corresponding graph.
Applying CourantFisher Theorem, we have
As is a approximation graph of , then . So, we have
Then, it becomes to
(10)  
Similarly, we can get . In other words, . Easy math will give the final result.
a.5
Lemma 3.1 There exists a matrix , such that for arbitrary node , , , if , then otherwise 0. Moreover, can be approximated as .
Considering the vector as the information contained on each directed edge in graph , from the definition of out operation and the nonbacktracking transition matrix, is equivalent to a process that first applying one step random walk with probability transition matrix to update the vector , and then conducting operation on node in graph . So, we can get
(11)  
For the first part in the second equation of equation 11, by separating the nonbacktracking part and switching the summation we have
(12)  
For the second part of equation 11, we have
(13)  
The approximation in the third step adopts an idea from mean field theory that we assume that every incoming edge to a fixed node has the same amount of probability. Thus, to give an unbiased estimation, we use the mean of other edges coming into node , i.e., , to approximate the when going through is prohibited by nonbacktracking strategy. Note that for a neighbor , . By above approximation, matrix is obtained with , and the bound of the approximation error is . To sum up equation 12 and 13, we have
The lemma holds.
a.6
Theorem 3.2 Suppose that the degree of the original graph obeys Possion distribution with parameter , i.e., . Then, for arbitrary small , graph is a approximation graph of the graph , where is , and is .
To investigate the relationship between and , we first consider the relative difference between and . According to lemma 3.1, for an arbitrary nonzero item in , we have
The maximum value of the corresponding item in is
So, the relative difference between and is
Due to the arbitrary choice of node and , we regard the values concerning and as random variables. Thus, after applying Markov inequality, we get
Note that due to the convexity of the reciprocal function, applying Jensen’s inequality only gives us the lower bound. To get an upper bound of , we set random variable and , one can use the Taylor series expansion around :
(14)  
So the upper bound that the relative difference ratio between corresponding elements in and , which is caused by the approximation, is as follows:
Set . Then, graph is approximation graph of graph .
a.7
Claim 1 Spectral method based on lazy random walk is a reduced version of our proposed algorithm NOBE. {proof}
The proof is similar to Lemma 3.1. The detail of the approximation strategy used here is different. Again, regarding the vector as the information contained on each directed edge in graph , is equivalent to a process that first applying one step random walk with probability transition matrix to update the vector , and then conducting operation on node in graph . So, we can get
(15) 