A Appendix

# On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation

## Abstract

Graph embedding has been proven to be efficient and effective in facilitating graph analysis. In this paper, we present a novel spectral framework called NOn-Backtracking Embedding (NOBE), which offers a new perspective that organizes graph data at a deep level by tracking the flow traversing on the edges with backtracking prohibited. Further, by analyzing the non-backtracking process, a technique called graph approximation is devised, which provides a channel to transform the spectral decomposition on an edge-to-edge matrix to that on a node-to-node matrix. Theoretical guarantees are provided by bounding the difference between the corresponding eigenvalues of the original graph and its graph approximation. Extensive experiments conducted on various real-world networks demonstrate the efficacy of our methods on both macroscopic and microscopic levels, including clustering and structural hole spanner detection.

\setenumerate

[1]itemsep=0pt,partopsep=0pt,parsep=topsep=3pt \setitemize[1]itemsep=0pt,partopsep=0pt,parsep=topsep=3pt \setdescriptionitemsep=0pt,partopsep=0pt,parsep=topsep=3pt

## 1 Introduction

Graph representations, which describe and store entities in a node-interrelated way [25] (such as adjacency matrix, Laplacian matrix, incident matrix, etc), provide abundant information for the great opportunity of mining the hidden patterns. However, this approach poses two principal challenges: 1) one can hardly apply off-the-shelf machine learning algorithms designed for general data with vector representations, and adapt them to the graph representations and 2) it’s intractable for large graphs due to limited space and time constraints. Graph embedding can address these challenges by representing nodes using meaningful low-dimensional latent vectors.

Due to its capability for assisting network analysis, graph embedding has attracted researchers’ attention in recent years [26, 21, 9, 27]. The goal of a good graph embedding algorithm should be preserving both macroscopic structures (e.g., community structure) and microscopic structures (e.g., structural hole spanner) simultaneously. However, an artless graph embedding algorithm will lead to unsatisfactory low-dimensional embeddings in which meaningful information may lose or be indistinguishable. For example, the pioneering work [26] mainly focusing on locally preserving the pairwise distance between nodes can result in the missing of dissimilarity. As a result, it may fail to preserve the community membership, as shown later in the experimental results in Table 3. Some works [9, 21] attempt to preserve high-order proximity between nodes by considering truncated random walk. Since conventional truncated random walk is a Markov chain without any examining the special structure of networks, key nodes information (e.g., structural hole spanners, outliers) will be unrecognizable. As far as we know, present approaches cannot achieve the graph embedding goal well.

In this paper, to fulfill the goal of preserving macroscopic and microscopic structures, we propose a novel graph embedding framework NOBE and its graph approximation algorithm NOBE-GA. The main contributions of this paper are summarized as follows:

• We develop an elegant framework NOn-Backtracking Embedding (NOBE), which jointly exploits a non-backtracking random walk and spectral graph embedding technique (in Section 3). The benefits of NOBE are: 1) From an edge perspective, we encode the graph data structure to an oriented line graph by mapping each edge to a node, which facilitates to track the flow traversing on edges with backtracking prohibited. 2) By figuring out node embedding from the oriented line graph instead of the original graph, community structure, structural holes and outliers can be well distinguished (as shown in Figures 3 and 4).

• Graph approximation technique NOBE-GA is devised by analyzing the pattern of non-backtracking random walk and switching the order of spectral decomposition and summation (in Section 3.3). It reduces the complexity of NOBE with theoretical guarantee. Specifically, by applying this technique, we found that conventional spectral method is just a reduced version of our NOBE method.

• In section 4, we also design a metric RDS based on embedding community structure to evaluate the nodes’ topological importance in connecting communities, which facilitates the discovery of structural hole (SH) spanners. Extensive experiments conducted on various networks demonstrate the efficacy of our methods in both macroscopic and microscopic tasks.

## 2 Related Work

Our work is mainly related to graph embedding and non-backtracking random walk. We briefly discuss them in this section.

### 2.1 Graph Embedding

Several approaches aim at preserving first-order and second-order proximity in nodes’ neighborhood. [27] attempts to optimize it using semi-supervised deep model and [26] focuses on large-scale graphs by introducing the edge-sampling strategy. To further preserve global structure of the graph, [22] explores the spectrum of the commute time matrix and [21] treats the truncated random walk with deep learning technique. Spectral method and singular value decomposition are also applied to directed graphs by exploring the directed Laplacian matrix [7] or by finding the general form of different proximity measurements [19]. Several works also consider joint embedding of both node and edge representations [28, 1] to give more detailed results. By contrast, our work can address graph embedding using a more expressive and comprehensive spectral method, which gives more accurate vector representations in a more explainable way yet with provable theoretical guarantees.

### 2.2 Non-backtracking Random Walk

Non-backtracking strategy is closely related to Ihara’s zeta function, which plays a central role in several graph-theoretic theorems [13, 3]. Recently, in machine learning and data mining fields, some important works have been focusing on developing non-backtracking walk theory. [14] and [24] demonstrate the efficacy of the spectrum of non-backtacking operator in detecting communities, which overcomes the theoretic limit of classic spectral clustering algorithm and is robust to sparse networks. [17] utilizes non-backtracking strategy in influence maximization, and the nice property of locally tree-like graph is fully exploited to complete the optimality proof. The study of eigenvalues of non-backtracking matrix of random graphs in [3] further confirms the spectral redemption conjecture proposed in [14] that above the feasibility threshold, community structure of the graph generated from stochastic block model can be accurately discovered using the leading eigenvectors of non-backtracking operator. However, to the best of our knowledge, there is no work done on analyzing the theory of non-backtracking random walk for graph embedding purposes.

## 3 Methodology

In this section, we firstly define the problem. Then, our NOBE framework is given in detail. At last, we present graph approximation technique, followed by a discussion. To facilitate the distinction, scalars are denoted by lowercase letters (e.g., ), vectors by bold lowercase letters (e.g., ,), matrices by bold uppercase letters (e.g., ,) and graphs by calligraphic letters (e.g., ). The basic symbols used in this paper are also described in Table 1.

### 3.1 Problem Formulation

Generally, a graph is represented as , where is set of nodes and is set of edges (, ). When the weighted adjacency matrix (representing the strength of connections between nodes) is presented, edge set can be omitted as . Note that when is undirected and unweighted, we use instead of . Since most machine learning algorithms can not conduct on this matrix effectively, our goal is to learn low-dimensional vectors which can be fitted to them. Specifically, we focus on graph embedding to learning low-dimensional vectors, and simultaneously achieve two objectives: decoupling nodes’ relations and dimension reduction. Graph embedding problem is formulated as:

{Definition}

(Graph Embedding) Given a graph , for a fixed embedding dimension , the purpose of graph embedding is to learn a mapping function , for .

### 3.2 Non-Backtracking Graph Embedding

We proceed to present NOBE. Inspired by the idea of analyzing flow dynamics on edges, we first embed the graph into an intermediate space from a non-backtracking edge perspective. Then, summation over the embedding on edges is performed in the intermediate space to generate accurate node embeddings. In the following, we only elaborate the detail of embedding undirected unweighted graphs, while the case for weighted graphs is followed accordingly. We first define the concept of a non-backtracking transition matrix, which specifies the probabilities that the edges directed from one node to another with backtracking prohibited.

{Definition}

(Non-Backtracking Transition Matrix) Given an undirected unweighted graph , we define its non-backtracking transition matrix as a matrix, which can be regarded as a random walk on directed edges of graph with backtracking prohibited. Mathematically,

 (1) P[(u→v),(x→y)]=⎧⎪⎨⎪⎩1dG(v)−1,if v=x and u≠y.0,otherwise.

where , , , and , are edges with directions taken into consideration. By encoding a graph into a non-backtracking transition matrix, it allows the diffusion dynamics to be considered. Notice that, different from the original non-backtracking operator [14], we also take the diffusion probability of the edges into account by the definition of . In this way, it enables us to capture more information for complex topological structure of the graph.

Further, non-backtracking random walk is a non-Markovian chain, which uses non-backtracking transition matrix as its transition probability matrix. To make the analysis more tractable, we transform the non-Markovian process to a Markovian process by introducing an oriented line graph.

{Definition}

(Oriented Line Graph) Given an undirected unweighted graph , its oriented line graph is a directed weighted graph, whose node set is the set of oriented edges in , and weighted adjacency matrix is the non-backtracking transition matrix.

Figure 4 illustrates the intuition behind the oriented line graph. It can be seen that the oriented line graph has the potential ability to characterize community boundary and emphasize structural hole spanners. An intuitive graph embedding approach is to perform spectral decomposition on non-backtracking transition matrix . However, is an asymmetric matrix, so it is not guaranteed to have real eigenvalues and eigenvectors. Also, from the definition of , some terms in is invalid at , for . We propose the Proposition 3.2 to make full use of in spectral graph embedding.

{proposition}

If the minimum degree of the connected graph is at least 2, then the oriented line graph is valid and strongly connected. {proof} The proof is given in the Appendix A.1. Proposition 3.2 also means that under this condition the non-backtracking transition matrix is irreducible and aperiodic. In particular, according to the Perron-Frobenius Theorem [11], it implies that for a strongly connected oriented line graph with non-negative weights, matrix has a unique left eigenvector with all entries positive. Let us denote as the largest real eigenvalue of matrix . Then,

 ϕTP=rϕT.

For directed weighted graphs, nodes’ importance in topological structure is not determined by the degree of nodes as in an undirected graph, since directed edges coming in or going out of a node may be blocked or be rerouted back immediately in the next path. Hence, as discussed in lots of literatures [8], we use Perron vector to denote node importance in the oriented line graph. Our objective is to ensure that linked nodes in oriented line graph should be embedded into a close location in the embedding space.

Suppose we want to embed nodes in the oriented line graph into one dimensional vector . Regarding to each edge in the oriented line graph , by considering its weights, our goal is to minimize . Taking source nodes’ importance indicated by into consideration and summing the loss over all edges, we define our loss function as

 (2) miny∑(e1,e2)∈E(H)ϕ(e1)[(y(e1)−y(e2))2P(e1,e2)].

Specifically, the Eq. (2) can be written in a matrix form by the following proposition. {proposition} Eq (2) has the following form

 (3) minyyTLy,

where is called combinatorial Laplacian for directed graphs, and is a diagonal matrix with . {proof} The proof is given in the Appendix A.2.

Following the idea of [8], we consider the Rayleigh quotient for directed graphs as follows:

 R(y)=yTLyyTΦy.

The denominator of Rayleigh quotient takes the amount of weight distribution in the directed graph indicated by into account. Therefore, we add as a constraint, which can eliminate the arbitrary rescaling caused by and . By solving Eq. (3) with this constraint, we get the following eigenvector problem:

 (4) (Φ−1L)y=λy.

It is now clear that our task of this stage becomes selecting smallest eigenvectors of to form a vector representation for directed edges from non-backtracking perspective. By using the following proposition, we can further reduce the matrix into a more concise and elegant form. {proposition} Both the sums of rows and columns of the non-backtracking transition matrix equal one. That is, , where 1 is a column vector of ones. {proof} The proof is given in the Appendix A.3. From the Proposition 4, we know that 1 is a Perron vector of . By normalizing (subject to ), we can further have for each node . Then, we have

 (5) Φ−1L=˜L,

where , and can be thought of as a normalized Laplacian matrix for oriented line graphs, compared to traditional Laplacian matrix for undirected graphs. can be regarded as a symmetrication process on a matrix. Specifically, this process will be equivalent to neutralize the weights between zero and if is not null, since and cannot be nonzero at the same time.

According to Eq 4 and Eq 5, we could obtain -dimensional embedding vectors of directed edges by computing smallest non-trivial eigenvectors of . By summing these results over related embedding vectors, we can obtain node embeddings of graph . Here we introduce two sum rules: in-sum and out-sum. Suppose we have got a one-dimensional vector of the embedding of edges denoted by . For any node , we define the rule of in-sum by , which sums of all the incoming edges’ embeddings associated with . We define the rule of out-sum by , which sums of all the outgoing edges’ embeddings associated with . Our graph embedding algorithm is described in algorithm 1.

### 3.3 Graph Approximation

In the previous part, we present a spectral graph embedding algorithm NOBE, which can preserve both macroscopic and microscopic structures of the original graph. The main procedure of NOBE uses a two-step operation sequentially: eigenvector decomposition and summation of incoming edge embeddings. The first step is conducted on a matrix , which is equivalent to compute the several largest eigenvectors on , denoted as . In this section, we will show how to speedup the algorithm by reversing these two steps. By graph approximation technique, we present an eigenvector decomposition algorithm acting on a matrix with provable approximation guarantees.

Suppose that is an eigenvector of of dimensions on directed edges, then based on the definition of in-sum and out-sum, and are vectors of dimensions after performing in-sum and out-sum operations. If these exists a matrix and a matrix , such that

 (6) ((gT¯¯¯¯P)in(gT¯¯¯¯P)out)≈((gTQ)in(gTQ)out)=(T(gingout)).

This implies that if matrix adequately approximates , without operating on matrix , one can perform spectral decomposition directly on , which is much smaller than , to get and . We can view as an aggregating version of matrix , which means that contains almost the same amount of information as for our embedding purpose. Next, to compose matrix, for any node , we consider its out-sum operation when applying matrix on it.

{lemma}

There exists a matrix , such that for arbitrary node , , , if , otherwise 0. Moreover, can be approximated as . {proof} The proof is given in the Appendix A.5. Likewise, for in-sum operation, . After removing constant factors and transforming this formula into a matrix form, we have

 (7) Misplaced &

where is the identity matrix and . By switching the order of spectral decomposition and summation, the approximation target is achieved. Now, our graph approximation algorithm NOBE-GA is just directly selecting the second to the largest eigenvectors of as our embedding vectors. As these eigenvectors of dimensions have in-sum and out-sum embedding parts, consistent with NOBE, we simply choose in-sum part as the final node embeddings.

To prove the approximation guarantee of NOBE-GA, we first introduce some basic notations from spectral graph theory. For a matrix , we write , if is positive semi-definite, Similarly, we write , if , which is also equivalent to , for all . For two graphs and with the same node set, we denote if their Laplacian matrix . Recall that , where denotes an item in weighted adjacency matrix of . It is clear that dropping edges will decrease the value of this quadratic form. Now, we define the approximation between two graphs based on the difference of their Laplacian matrices. {Definition} (c-approximation graph) For some , a graph is called a -approximiation graph of graph , if Based on the Definition 3.3, we present the Theorem 3.3, which further shows the relationship of and its -approximation graph in terms of their eigenvalues. {theorem} If is a -approximation graph of graph , then

 |λk(G)−λk(H)|≤max{(c−1),(1−1c)}λk(G),

where is the k-th smallest eigenvalue of the corresponding graph. {proof} The proof is given in the Appendix A.4. To relax the strict conditions in Definition 3.3, we define a probabilistic version of the -approximation graph by using element-wise constraints. {Definition} (-approximation graph) For some , a graph is called a -approximiation graph of graph , if is satisfied with probability at least . A probabilistic version of Theorem 3.3 follows accordingly. At last, we claim that matrix approximates matrix well by the following Theorem, which means the approximation of NOBE-GA is adequate. {theorem} Suppose that the degree of the original graph obeys Possion distribution with parameter , i.e., . Then, for some small , graph is a -approximation graph of the graph , where is , and is . {proof} The proof is given in the Appendix A.6.

### 3.4 Time Complexity and Discussion

A sparse implementation of our algorithm in Matlab is publicly available3. For -dimensional embedding, summation operation of NOBE requires time. Eigenvector computation, which utilizes a variant of Lanczos algorithm, requires time, where . In total, the time complexity of NOBE is . The time complexity of NOBE-GA is , where is the average degree.

Classic spectral method based on is just a reduced version of NOBE (proof in Appendix A.7). For sparse and degree-skewed networks, nodes with a large degree will affect many related eigenvectors. Therefore, the previous leading eigenvector that corresponds to community structure will be lost in the bulk of worthless eigenvectors, and hence fail to preserve meaningful structures [14]. However, our spectral framework with non-backtracking strategy can overcome this issue.

## 4 Experimental Results

In this section, we first introduce datasets and compared methods used in the experiments. After that, we present empirical evaluations on clustering and structural hole spanner detection in detail.

### 4.1 Dataset Description

All networks used here are undirected, which are publicly available on SNAP dataset platform [15]. They vary widely from a range of characteristics such as network type, network size and community profile. They include three social networks: karate (real), youtube (online), enron-email (communication); three collaboration networks: ca-hepth, dblp, ca-condmat (bipartite of authors and publications); three entity networks: dolphins (animals), us-football (organizations), polblogs (hyperlinks). The summary of the datasets is shown in Table 2. Specifically, we apply a community detection algorithm, i.e., RanCom [12], to show the detected community number and maximum community size.

### 4.2 Compared Methods

We compare our methods with the state-of-the-art algorithms. The first three are graph embedding methods. The others are SH spanner detection methods . We summarize them as follows:

• NOBE, NOBE-GA: Our spectral graph embedding method and its graph approximation version.

• node2vec [9]: A feature learning framework extending Skip-gram architecture to networks.

• LINE [26]: The version of combining first-order and second-order proximity is used here.

• Deepwalk [21]: Truncated random walk and language modeling techniques are utilized.

• HAM [10]: A harmonic modularity function is proposed to tackle the SH spanner detection.

• Constraint [5]: A constraint introduced to prune nodes with certain connectivity being candidates.

• Pagerank [20]: Nodes with highest pagerank score will be selected as SH spanners.

• Betweenness Centrality (BC) [4]: Nodes with highest BC will be selected as SH spanners.

• HIS [16]: Designing a two-stage information flow model to optimize the provided objective function.

• AP_BICC [23]: Approximate inverse closeness centralities and articulation points are exploited.

### 4.3 Performance on Clustering

Clustering is an important unsupervised application used for automatically separating data points into clusters. Our graph embedding method is used for embedding nodes of a graph into vectors, on which clustering method can be directly employed. Two evaluation metrics considered are summarized as follows:

• Modularity [18]: Modularity is a widely used quantitative metric that measures the likelihood of nodes’ community membership under the perturbation of the Null model. Mathematically, where is the indicator function. indicates the community node belongs to. In practice, we add a penalty if a clearly wrong membership is predicted.

• Permanence [6]: It is a vertex-based metric, which depends on two factors: internal clustering coefficient and maximum external degree to other communities. The permanence of a node that belongs to community is defined as follows: where is the internal degree. is the maximum degree that node links to another community. is the internal clustering coefficient. Generally, positive permanence indicates a good community structure. To penalize apparently wrong community assignment, is set to , if .

For the clustering application, we summary the performance of our methods, i.e., NOBE and NOBE-GA, against three state-of-the-art embedding methods on seven datasets in terms of modularity and permanence in Table 3. Two types of classic clustering methods are used, i.e., k-means and agglomerative method (AM). From these results, we have the following observations:

• In terms of modularity and permanence, NOBE and NOBE-GA outperform other graph embedding methods over all datasets under both k-means and AM. Positive permanence scores on all datasets indicate that meaningful community structure is discovered. Specifically, node2vec obtains competing results on condmat under k-means. As for LINE, it fails to predict useful community structure on most large datasets except karate and us-football. Deepwalk gives mediocre results on most datasets and bad results on us-football, polblogs and enron-email under k-means.

• Figure 5 reports the overall performance. NOBE and NOBE-GA achieves superior embedding performance for the clustering application. Moreover, it practically demonstrates that NOBE-GA approximates NOBE very well on various kinds of networks despite the difference on their link density, node preferences and community profiles. To our surprise, on some datasets NOBE-GA even achieve slightly better performance than NOBE. We conjecture that this improvement arises because of the introducing of the randomization and the preferences of evaluation metrics. Specifically, in terms of modularity, the percentage of improvement margin of NOBE is over node2vec, over LINE and over Deepwalk. Regarding to permanence, the percentage of the improvement margin of NOBE is over node2vec and over Deepwalk.

### 4.4 Performance on Structural Hole Spanner Detection

Generally speaking, in a network, structural hole (SH) spanners are the nodes bridging between different communities, which are crucial for many applications such as diffusion controls, viral marketing and brain functional analysis [2, 16, 5]. Detecting these bridging nodes is a non-trivial task. To exhibit the power of our embedding method in placing key nodes into accurate positions, we first employ our method to embed the graph into low-dimensional vectors and then detect structural hole spanners in that subspace. We compare our method with SH spanner detection algorithms that are directly applied on graphs. To evaluate the quantitative quality of selected SH spanners, we use a evaluation metric called Structural Hole Influence Index (SHII) proposed in [10]. This metric is designed by simulating information diffusion processes under certain information diffusion models in the given network.

• Structural Hole Influence Index (SHII) [10]: Regarding a SH spanner candidate , we compute its SHII score by performing the influence maximization process several times. For each time, to activate the influence diffusion process, we randomly select a set of nodes from the community that belongs to. Node and node set is combined as seed set to propagate the influence. After the propagation, SHII score is obtained by computing the relative difference between the number of activated nodes in the community and in other communities: where is the set of communities. is the indicator function which equals one if node is influenced, otherwise .

For each SH spanner candidate, we run the information diffusion under linear threshold model (LT) and independent cascade model (IC) 10000 times to get average SHII score. To generate SH spanner candidates from embedded subspace, in which our embedding vectors lie, we devise a metric for ranking nodes:

• Relative Deviation Score (RDS): Suppose that for each node , its low-dimensional embedding vector is represented as . We apply k-means to separate nodes into appropriate clusters with denoting cluster set. For a cluster , the mean of its points is . The Relative Deviation Score, which measures how far a data point is deviating from its own community attracted by other community, is defined as:

 RDS(v)=maxC∈C∥yv−uCv∥2/RCv∥yv−uC∥2/RC

where denotes the cluster belongs to. And indicates the radius of cluster .

In our low-dimensional space, nodes with highest RDS will be selected as candidates of SH spanners. We summarize our embedding method against other SH spanner detection algorithms in Table 4. Due to space limit, we omit the results of other embedding methods as they totally fail on this task. The number of SH spanners shown in the second column is chosen based on the network size and community profile. Actually, too many SH spanners will lead to the propagation activating the entire network. We outperform all SH spanner detection algorithms under LT and IC models on all three datasets. Specifically, on karate network, we identify three SH spanners, i.e., 3, 20 and 14, which can be regarded as a perfect group that can influence both clusters, seen from Figure 3. On average, our method NOBE achieves a significant improvement against state-of-the-art algorithm HAM, which shows the power of our method in accurate embedding.

### 4.5 Parameter Analysis

Dimension is usually considered as a intrinsic characteristic, and often needs to be artificially predefined. With varying dimension, we report the clustering quality under AM on two datasets in Figure 8. On football network with 11 ground truth communities, NOBE, NOBE-GA and node2vec achieves reasonable results on dimension or . After , NOBE, NOBE-GA, node2vec and deepwalk begin to drop. Followed by a sudden drop, node2vec still increases gradually. Reported by RankCom, ca-hepth network has 995 communities. Nevertheless, prior to dimension , NOBE, NOBE-GA, node2vec and deepwalk have already obtained community structure with good quality. The performance will slightly increase afterwards. Consistent with studies on spectral analysis of graph matrices [25], community number is a good choice for dimension. However, it’s also rather conservative, since good embedding methods could preserve great majority of graph information in much shorter vectors. The choice of a large number greater than community number should be cautious since redundant information added may deteriorate embedding results .

## 5 Conclusion and Outlook

This paper proposes NOBE, a novel framework leveraging the non-backtracking strategy for graph embedding. It exploits highly nonlinear structure of graphs by considering a non-Markovian dynamics. As a result, it can handle both macroscopic and microscopic tasks. Experiments demonstrate the superior advantage of our algorithm over state-of-the-art baselines. In addition, we carry out a graph approximation technique with theoretical guarantees for reducing the complexity and also for analyzing the different flows on graphs. To our surprise, NOBE-GA achieves excellent performance at the same level as NOBE.

We hope that our work will shed light on the analysis of algorithms based on flow dynamics of graphs, especially on spectral algorithms. Graph approximation can be further investigated by considering the perturbation of eigenvectors. We leave it for future work.

## Acknowledgement

This work is supported in part by National Key R& D Program of China through grants 2016YFB0800700, and NSF through grants IIS-1526499, and CNS-1626432, and NSFC 61672313, 61672051, 61503253, and NSF of Guangdong Province 2017A030313339.

## Appendix A Appendix

### a.1

Proposition 3.1 If the minimum degree of the connected graph is at least 2, then the oriented line graph is valid and strongly connected. {proof} Assume that and are two arbitrary nodes in the oriented line graph . The proposition is equivalent to prove that can be reached from . Three situations should be considered:

1) if and , then is directly linked to ;

2) if and which means there is a directed edge from to . We delete the node , i.e., node , in the original graph . Since the minimum degree of is at least two. Therefore, node and node are still mutually reachable in graph . A Hamilton Path from node to node can be selected with passing through other existing nodes only once, which satisfies the non-backtracking condition. Adding node , i.e., node , back into the graph will generate a non-backtracking path , which means is reachable from in the oriented line graph ;

3) if are mutually unequal. Assume that we delete edges and in graph , then graph is still connected. There exists a Hamilton path connecting node and . Thus, with satisfying the non-backtracking condition, there exists a directed path connecting node and node in the oriented line graph .

Overall, every valid node in graph can be reached, if graph has a minimum degree at least two.

### a.2

Proposition 3.2 Our loss function is

 minyyT(Φ−ΦP+PTΦ2)y.
{proof}

To be concise, we use , denote the node set and the edge set of separately. , denote nodes in . Considering every node pair, we have the following loss function

 ∑u,v∈V(H){ϕ(u)[(y(u)−y(v))2P(u,v)]+ϕ(v)[(y(v)−y(u))2P(v,u)]}.

Dividing the term into two parts by regarding each node pair as ordered and expanding the formula, we get

 (8) 12∑u∈V(H)∑(u,v)∈E(H)[ϕ(u)(y(u)2+y(v)2−2y(u)y(v))P(u,v)] + 12∑u∈V(H)∑(v,u)∈E(H)[ϕ(v)(y(u)2+y(v)2−2y(u)y(v))P(v,u)]

Then, we do the deduction for the first part. A similar proof can be applied to the second part. Enumerating each node in the first part, we get

 (9) 12∑u∈V(H)ϕ(u)y(u)2∑v:(u,v)∈E(H)p(u,v) +12∑v∈V(H)y(v)2∑u:(u,v)∈E(H)(ϕ(u)P(u,v)) −12∑v:(u,v)∈E(H)ϕ(u)(2y(u)y(v)P(u,v))

Due to proposition 3.3, the sum of each row is equal to one, i.e.,

 ∑v:(u,v)∈E(H)P(u,v)=1.

The first part of above equation becomes

 12yTΦy.

Here, is a diagonal matrix with Since , so the second sum in the second term of equation 9 becomes

 ∑u:(u,v)∈E(H)(ϕ(u)P(u,v))=ϕ(v).

Then, the matrix form of the second term in equation 9 becomes

 12yTΦy

Arranging the terms in a particular order, we can easily see the third part in equation 9

 −12∑v:(u,v)∈E(H)2y(u)ϕ(u)P(u,v)y(v)=yTΦPy.

Adding up all terms and removing the constant factor, we get our loss function

 minyyT(Φ−ΦP+PTΦ2)y.

### a.3

Proposition 3.3 Both the sums of rows and columns of the non-backtracking transition matrix equal one. {proof} The proof is simple, since concerning each row or column, the values of nonzero items are equal. For an arbitrary row related to node , the sum of this row in the non-backtracking matrix is

 ∑x∈N(v)x≠uP[(u→v),(v→x)] =∑x∈N(v)x≠u1d(v)−1 =1d(v)−1∑x∈N(v)x≠u1=1

Similarly, we can get the same result for each column.

### a.4

Theorem 3.1 If and are graphs such that is a -approximation graph of graph , then

 |λk(G)−λk(H)|≤max{(c−1),(1−1c)}λk(G),

where is the k-th smallest eigenvalue of corresponding graph.

{proof}

Applying Courant-Fisher Theorem, we have

 λk(G)=minS⊆Rndim(S)=kmaxx∈SxTLGxxTx.

As is a -approximation graph of , then . So, we have

 xTLGx≥1cxTLHx.

Then, it becomes to

 (10) λk(G) =minS⊆Rndim(S)=kmaxx∈SxTLGxxTx≥minS⊆Rndim(S)=kmaxx∈S1cxTLHxxTx =1cminS⊆Rndim(S)=kmaxx∈SxTLHxxTx=1cλk(H).

Similarly, we can get . In other words, . Easy math will give the final result.

### a.5

Lemma 3.1 There exists a matrix , such that for arbitrary node , , , if , then otherwise 0. Moreover, can be approximated as .

{proof}

Considering the vector as the information contained on each directed edge in graph , from the definition of out operation and the non-backtracking transition matrix, is equivalent to a process that first applying one step random walk with probability transition matrix to update the vector , and then conducting operation on node in graph . So, we can get

 (11) (gT¯¯¯¯P)outu=∑v∈N(u)(gT¯¯¯¯P)u→v =∑v∈N(u)(∑x∈N(u)x≠v12(d(u)−1)gx→u+∑y∈N(u)y≠u12(d(v)−1)gv→u)

For the first part in the second equation of equation 11, by separating the non-backtracking part and switching the summation we have

 (12) ∑v∈N(u)∑x∈N(u)x≠v12(d(u)−1)gx→u =∑v∈N(u)(∑x∈N(u)12(d(u)−1)gx→u−12(d(u)−1gv→u) =(∑v∈N(u)12(d(u)−1))∑x∈N(u)gx→u−12(d(u)−1∑v∈N(u)gv→u =(d(u)2(d(u)−1)−12(d(u)−1))ginu =12ginu

For the second part of equation 11, we have

 (13) ∑v∈N(u)∑y∈N(u)y≠u12(d(v)−1)gv→u =∑v∈N(u)12(d(v)−1)∑y∈N(v)y≠ugv→y =∑v∈N(u)12(d(v)−1)(goutv−gv→u) ≈∑v∈N(u)12(d(v)−1)goutv−∑v∈N(u)12(d(v)−1)1d(u)ginu

The approximation in the third step adopts an idea from mean field theory that we assume that every incoming edge to a fixed node has the same amount of probability. Thus, to give an unbiased estimation, we use the mean of other edges coming into node , i.e., , to approximate the when going through is prohibited by non-backtracking strategy. Note that for a neighbor , . By above approximation, matrix is obtained with , and the bound of the approximation error is . To sum up equation 12 and 13, we have

 (gT¯¯¯¯P)outu ≈(gTQ)outu=(12−∑v∈N(u)12(d(v)−1)1d(u))ginu +∑v∈N(u)12(d(v)−1)goutv

The lemma holds.

### a.6

Theorem 3.2 Suppose that the degree of the original graph obeys Possion distribution with parameter , i.e., . Then, for arbitrary small , graph is a -approximation graph of the graph , where is , and is .

{proof}

To investigate the relationship between and , we first consider the relative difference between and . According to lemma 3.1, for an arbitrary nonzero item in , we have

 ¯¯¯¯P[(x→u),(u→v)]=12(d(u)−1).

The maximum value of the corresponding item in is

 |Δ[(x→u),(u→v)]|=12(d(v)−1)(d(u)−1).

So, the relative difference between and is

 |Δ[(x→u),(u→v)]|¯¯¯¯P[(x→u),(u→v)]=1d(v)−1.

Due to the arbitrary choice of node and , we regard the values concerning and as random variables. Thus, after applying Markov inequality, we get

 Pr[1d(v)−1≥δ]≤1δE[1d(v)−1].

Note that due to the convexity of the reciprocal function, applying Jensen’s inequality only gives us the lower bound. To get an upper bound of , we set random variable and , one can use the Taylor series expansion around :

 (14) E[Y]=E[1X]≤ E{1E[X]−1E2[X](X−E[X]) +1E3[X](X−E[X])2} = E2[X]+Var[X]E3[X] = (λ−1)2+λ(λ−1)3 \emph{(Due % to Possion distribution)} = 1λ−1+λ(λ−1)3

So the upper bound that the relative difference ratio between corresponding elements in and , which is caused by the approximation, is as follows:

 Pr[1d(v)−1≥δ]≤1δ(1λ−1+λ(λ−1)3).

Set . Then, graph