A BagofPaths Framework for Network Data Analysis
(ArXiv preprint manuscript submitted for publication)
Abstract
This work develops a generic framework, called the bagofpaths (BoP), for link and network data analysis. The central idea is to assign a probability distribution on the set of all paths in a network. More precisely, a GibbsBoltzmann distribution is defined over a bag of paths in a network, that is, on a representation that considers all paths independently. We show that, under this distribution, the probability of drawing a path connecting two nodes can easily be computed in closed form by simple matrix inversion. This probability captures a notion of relatedness between nodes of the graph: two nodes are considered as highly related when they are connected by many, preferably lowcost, paths. As an application, two families of distances between nodes are derived from the BoP probabilities. Interestingly, the second distance family interpolates between the shortest path distance and the resistance distance. In addition, it extends the BellmanFord formula for computing the shortest path distance in order to integrate suboptimal paths by simply replacing the minimum operator by the soft minimum operator. Experimental results on semisupervised classification show that both of the new distance families are competitive with other stateoftheart approaches. In addition to the distance measures studied in this paper, the bagofpaths framework enables straightforward computation of many other relevant network measures.
keywords:
Network science, link analysis, distance and similarity on a graph, shortest path distance, resistance distance, semisupervised classification.1 Introduction
1.1 General introduction
Network and link analysis is a highly studied field, subject of much recent work in various areas of science: applied mathematics, computer science, social science, physics, chemistry, pattern recognition, applied statistics, data mining & machine learning, to name a few Barabasi2015 (); chung06 (); Estrada2012 (); Kolaczyk2009 (); Lewis09 (); Newman2010 (); Thelwall04 (); Wasserman1994 (). Within this context, one key issue is the proper quantification of the structural relatedness between nodes of a network by taking both direct and indirect connections into account. This problem is faced in all disciplines involving networks in various types of problems such as link prediction, community detection, node classification, and network visualization to name a few popular ones.
The main contribution of this paper is in presenting in detail the bagofpaths (BoP) framework and defining relatedness as well as distance measures between nodes from this framework. The BoP builds on and extends previous work dedicated to the exploratory analysis of network data Kivimaki2012 (); Kivimaki2014 (); Mantrach2009 (); Yen08K (). The introduced distances are constructed to capture the global structure of the graph by using paths on the graph as a building block. In addition to relatedness/distance measures, various other quantities of interest can be derived within the probabilistic BoP framework in a principled way, such as betweenness measures quantifying to which extent a node is in between two sets of nodes Lebichot2014 (), extensions of the modularity criterion for, e.g., community detection Devooght2014 (), measures capturing the criticality of the nodes or robustness of the network, graph cuts based on BoP probabilities, and so on.
1.2 The bagofpaths framework
More precisely, we assume given a weighted directed, strongly connected, graph or network where a cost is associated to each edge. Within this context, we consider a bag containing all the possible (either absorbing or nonabsorbing) paths^{1}^{1}1Also called walks in the litterature. between pairs of nodes in . In a first step, following Akamatsu1996 (); Mantrach2009 (); Saerens2008 (); Yen08K (), a probability distribution on this countable set of paths can be defined by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. This results in a GibbsBoltzmann distribution, depending on a temperature parameter , on the set of paths such that long (highcost) paths have a low probability of being sampled from the bag, while short (lowcost) paths have a high probability of being sampled.
In this probabilistic framework, the BoP probabilities, , that a sampled path has node as its starting node and node as its ending node can easily be computed in closed form by a simple matrix inversion, where is the number of nodes in the graph. These BoP probabilities play a crucial role in our framework for that they capture the relatedness between two nodes and – the BoP probability will be high when the two nodes are connected by many, short, paths. In summary, the BoP framework has several interesting properties:

It has a clear, intuitive, interpretation.

The temperature parameter allows to monitor randomness by controlling the balance between exploitation and exploration.

The introduction of independent costs results in a large degree of customization of the model, according to the problem requirements: some paths could be penalized because they visit undesirable nodes having adverse features.

The framework is rich. Many useful quantities of interest can be defined according to the BoP probabilistic framework: distance measures, betweenness measures, etc. This is discussed in the conclusion.

The quantities of interest are easy to compute.
It, however, also suffers from a drawback: the different quantities are computed by solving a system of linear equations, or by matrix inversion. More precisely, the distance between a particular node and all the other nodes can be computed by solving a system of linear equations, while all pairwise distances can be computed at once by inverting an square matrix. This results in computational complexity. Even more importantly, the matrix of distances necessitates storage, altough this can be alleviated by using, e.g., incomplete matrix factorization techniques.
This means that the different quantities can only be computed reasonably on small to medium size graphs (containing a few tens of thousand nodes). However, in specific applications like classification or extraction of top eigenvectors, we can avoid computing explicitly the matrix inversion (see PageRank and the power method Langville2006 (), or large scale semisupervised classification on graphs Mantrach2011 ()). In addition, it is also possible to restrict the set of paths to “efficient paths”, that is, paths that do not backtrack (always getting further from the starting node), and compute efficiently the distances from the starting node by a recurrence formula, as proposed in transportation theory Dial71 ().
1.3 Deriving node distances from the BoP framework
The paper first introduces the BoP framework in detail. After that, the two families of distances between nodes are defined, and are coined the surprisal distance and the potential distance. Both distance measures satisfy the triangle inequality, and thus satisfy the axioms of a metric. Moreover, the potential distance has the interesting property of generalizing the shortest path and the commute cost distances by computing an intermediate distance, depending on the temperature parameter . When is close to zero, the distance reduces to the standard shortest path distance (emphasizing exploitation) while for , it reduces to the commute cost distance (focusing on exploration). The commute cost distance is closely related to the resistance distance FoussKDE2005 (); Klein1993 (), as the two functions are proportional to each other (as well as to the commute time distance) Chandra1989 (); Kivimaki2012 ().
This is of primary interest as it has been shown that both the shortest path distance and the resistance distance suffer from some significant flaws. While relevant in many applications, the shortest path distance cannot always be considered as a good candidate distance in network data. Indeed, this measure only depends on the shortest paths and thus does not integrate the “degree of connectivity” between the two nodes. In many applications, for a constant shortest path distance, nodes connected by many indirect paths should be considered as “closer” than nodes connected by only a few paths. This is especially relevant when considering relatedness of nodes based on communication, movement, etc, in a network which do not always happen optimally, nor completely randomly.
While the shortest path distance fails to take the whole structure of the graph into account, it has also been shown that the resistance distance converges to a useless value, only depending on the degrees of the two nodes, when the size of the graph increases (the random walker is getting “lost in space” because the Markov chain mixes too fast, see vonLuxburg2010 ()). Moreover, the resistance distance, which is proportional to the commute cost distance, assumes a completely random movement or communication in the network, which is also unrealistic.
In short, shortest paths do not integrate the amount of connectivity between the two nodes whereas random walks quickly loose the notion of proximity to the initial node when the graph becomes larger vonLuxburg2010 ().
There is therefore a need for introducing distances interpolating between the shortest path distance and the resistance distance, thus hopefully avoiding the drawbacks appearing at the ends of the spectrum. These quantities capture the notion of relative accessibility between nodes, a combination of both proximity in the network and amount of connectivity.
Furthermore, and interestingly, a simple local recurrence expression, extending the BellmanFord formula for computing the potential distances from one node of interest to all the other nodes is also derived. It relies on the use of the socalled soft minimum operator Cook2011 () instead of the usual minimum. Finally, our experiments show that these distance families provide competitive results in semisupervised learning.
1.4 Contributions and organization of the paper
Thus, in summary, this work has several contributions:

It introduces a wellfounded bagofpaths framework capturing the global structure of the graph by using network paths as a building block.

It is shown that the bagofhittingpaths probabilities can easily be computed in closed form. This fundamental quantity defines an intuitive relatedness measure between nodes.

It defines two families of distances capturing the structural dissimilarity between the nodes in terms of relative accessibility. The distances between all pairs of nodes can be computed conveniently by inverting a matrix.

It is shown that one of these distance measures has some interesting properties; for instance it is graphgeodetic and it interpolates between the shortest path distance and the resistance distance (up to a scaling factor).

The framework is extended to the case where nonuniform priors are defined on the nodes.

We prove that this distance generalizes the BellmanFord formula computing shortest path distances, by simply replacing the operator by the operator.

The distances obtain promising empirical results in semisupervised classification tasks when compared to other, kernelbased, methods.
Section 2 develops related work and introduces the necessary background and notation. Section 3 introduces the BoP framework, defines BoP probabilities and shows how it can be computed in closed form. Section 4 extends the framework to hitting, or absorbing, paths. In Section 5, the two families of distances as well as their properties are derived. Section 6 generalizes the framework to nonuniform priors on the nodes. An experimental study of the BoP framework with application to semisupervised classification is presented in Section 7. Concluding remarks and extensions are discussed in Section 8.
2 Related work, background, and notation
2.1 Related work
This work is related to similarity measures on graphs for which some background is presented in this section. The presented BoP framework also has applications in semisupervised classification, on which our experimental section will focus on in Section 7. A short survey related to this problem can be found in subsection 7.1.
Similarity measures on a graph determine to what extent two nodes in a graph resemble each other, either based on the information contained in the node attributes or based on the graph structure. In this work, only measures based on the graph structure will be investigated. Structural similarity measures can be categorized into two groups: local and global Lu2011 (). On the one hand, local similarity measures between nodes consider the direct links from a node to the other nodes as features and use these features in various way to provide similarities. Examples include the cosine coefficient Dunham2003 () and the standard correlation Wasserman1994 (). On the other hand, global similarity measures consider the whole graph structure to compute similarities. Our short review of similarity measures is largely inspired by the surveys appearing in FoussKernelNN2011 (); Mantrach2009 (); Yen2008 (); Yen08K ().
Certainly the most popular and useful distance between nodes of a graph is the shortest path distance. However, as discussed in the introduction, it is not always relevant for quantifying the similarity of nodes in a network.
Alternatively, similarity measures can be based on random walk models on the graph, seen as a Markov chain. As an example, the commute time (CT) kernel has been introduced in FoussKDE2005 (); Saerens04PCA () as the MoorePenrose pseudoinverse, , of the Laplacian matrix. The CT kernel was inspired by the work of Klein & Randic Klein1993 () and Chandra et al. Chandra1989 (). More precisely, Klein & Randic Klein1993 () suggested to use the effective resistance between two nodes as a meaningful distance measure, called the resistance distance. Chandra et al. Chandra1989 () then showed that the resistance distance equals the commute time distance, up to a constant factor. The CT distance is defined as the average number of steps that a random walker, starting in a given node, will take before entering another node for the first time (this is called the average firstpassage time Norris1997 ()) and going back to the initial node.
It was then shown Saerens04PCA () that the elements of are inner products of the node vectors in the Euclidean space where these node vectors are exactly separated by the square root of the CT distance. The square root of the CT distance is therein called the Euclidean CT distance. The relationships between the Laplacian matrix and the commute cost distance (the expected cost (and not steps as for the CT) of reaching a destination node from a starting node and going back to the starting node) were studied in FoussKDE2005 (). Finally, an electrical interpretation of the elements of can be found in Yen2008 (). However, we saw in the introduction that these randomwalk based distances suffer from some drawbacks (e.g., the socalled “lost in space” problem, vonLuxburg2010 ())
Sarkar et al. Sarkar2007 () suggested a fast method for computing truncated commute time neighbors. At the same time, several authors defined an embedding that preserves the commute time distance with applications in various fields such as clustering Luh2005 (), collaborative filtering FoussKDE2005 (); Brand05 (), dimensionality reduction of manifolds Ham2004 () and image segmentation Qiu2005 ().
Instead of taking the pseudoinverse of the Laplacian matrix, a simple regularization leads to a kernel called the regularized commute time kernel Ito2005 (); Chebotarev1997 (); Chebotarev1998a (). Ito et al. Ito2005 (), further propose the modified regularized Laplacian kernel by introducing another parameter controlling the importance of nodes. This modified regularized Laplacian kernel is also closely related to a graph regularization framework introduced by Zhou & Scholkopf in Zhou04 (), extended to directed graphs in Zhou05 ().
The exponential diffusion kernel, introduced by Kondor & Lafferty Kondor2002 () and the Neumann diffusion kernel, introduced in Scholkopf2002 () are similar and based on power series of the adjacency matrix. A meaningful alternative to the exponential diffusion kernel, called the Laplacian exponential diffusion kernel (see Kondor2002 (); Smola03 ()) is a diffusion model that substitutes the adjacency matrix with the Laplacian matrix.
Random walk with restart kernels, inspired by the PageRank algorithm and adapted to provide relative similarities between nodes, appeared relatively recently in Gori2006WebKDD (); Pan2004 (); Tong2007 (). Nadler et al. Nadler2005 (); Nadler2006 () and Pons et al. Pons2005 (); Pons2006 () suggested a distance measure between nodes of a graph based on a diffusion process, called the diffusion distance. The Markov diffusion kernel has been derived from this distance measure in FoussKernelNN2011 () and Yen2011 (). The natural embedding induced by the diffusion distance was called diffusion map by Nadler et al. Nadler2005 (); Nadler2006 () and is related to correspondence analysis Yen2011 ().
More recently, Mantrach et al. Mantrach2009 (), inspired by Akamatsu1996 (); Bell1995 () and subsequently by Saerens2008 (), introduced a linkbased covariance measure between nodes of a weighted directed graph, called the sumoverpaths (SoP) covariance. They consider, in a similar manner as in this paper, a GibbsBoltzmann distribution on the set of paths such that highcost paths occur with low probability whereas lowcost paths occur with a high probability. Two nodes are then considered as highly similar if they often cooccur together on the same – preferably short – path. A related cobetweenness measure between nodes has been defined in Kolaczyk2009c ().
Moreover, as both the shortest path distance and the resistance distance show some issues, there were several attempts to define families of distances interpolating between the shortest path and more “global” distances, such as the resistance distance. In this context, inspired by Akamatsu1996 (); Bell1995 (); Saerens2008 (), a parametrized family of dissimilarity measures, called the randomized shortest path (RSP) dissimilarity, reducing to the shortest path distance at one end of the parameter range, and to the resistance distance (up to a constant scaling factor) at the other end, was proposed in Yen08K () and extended in Kivimaki2012 (). Similar ideas appeared at the same time in Chebotarev2011 (); Chebotarev2012 (), based on considering the cooccurences of nodes in forests of a graph, and in Herbster2009 (); vonLuxburg2011 (), based on a generalization of the effective resistance in electric circuits. These two last families are metrics while the RSP dissimilarity does not satisfy the triangle inequality. The potential and the surprisal distances introduced in this work fall under the same catalogue of distance families. See also Kivimaki2012 (); Guex2015 (); Guex2016 () for other, closely related, formulations of families of distances based on free energy and network flows.
2.2 Background and notation
We now introduce the necessary notation for the bagofpaths (BoP) framework, providing both a relatedness index and a distance measure between nodes of a network. First, note that, in the sequel, column vectors are written in bold lowercase while matrices are in bold uppercase.
Consider a weighted directed graph or network, , assumed strongly connected, with a set of nodes (or vertices) and a set of edges (or arcs, links). An edge between node and node is denoted by or . Furthermore, it is assumed that we are given an adjacency matrix with elements quantifying in some way the affinity between node and node . When , node and node are said to be adjacent, that is, connected by an edge. Conversely, means that and are not connected. We further assume that there are no selfloops, that is, the . From this adjacency matrix, a standard random walk on the graph is defined in the usual way. The transition probabilities associated to each node are simply proportional to the affinities and then normalized:
(1) 
Note that these transition probabilities will be used as reference probabilities later; hence the superscript “ref”. The matrix , containing elements , is stochastic and called the transition matrix of the natural or reference random walk on the graph.
In addition, we assume that a transition cost, , is associated to each link of the graph . If there is no edge between and , the cost is assumed to take an infinite value, . For consistency, if and only if . The cost matrix is the matrix containing the immediate costs as elements. We will assume that at least one element of is strictly positive. A path is a finite sequence of jumps to adjacent nodes on (including loops), initiated from a starting node , and stopping in an ending node . The total cost of a path is simply the sum of the local costs along , while the length of a path is the number of steps, or jumps, needed for following that path.
The costs are set independently of the adjacency matrix; they quantify the cost of a transition, depending on the problem at hand. They can, e.g., be defined according to some properties, or features, of the nodes or the edges in order to bias the probability distribution of choosing a path. In the case of a social network, we may, for instance, want to bias the paths in favor of domain experts. In that case, the cost of jumping to a node could be set proportional to the degree of expertise of the corresponding person. Therefore, walks visiting a large proportion of persons with a low degree of expertise would be penalized versus walks visiting persons with a high degree. Another example aims to favor hubavoiding paths penalizing paths visiting hubs. Then, the cost can be simply set to the degree of the node. If there is no reason to bias the paths with respect to some features, costs are simply set equal to (paths are penalized by their length) or equal to (the elements of the adjacency matrix can then be considered as conductances and the costs as resistances).
3 The basic bagofpaths framework
Roughly speaking, the BoP model will be based on the probability that a path drawn from a “bag of paths” has nodes and as its starting and ending nodes, respectively. According to this model, the probability of drawing a path starting in node and ending in node from the bagofpaths can easily be computed in closed form. This probability distribution then serves as a building block for several extensions.
The bagofpaths framework is introduced by first considering bounded paths and then paths of arbitrary length. For simplicity, we discuss nonhitting (or nonabsorbing) paths first and then develop the more interesting bagofhittingpaths framework in the next section.
3.1 Sampling bounded paths according to a GibbsBoltzmann distribution
The present section describes how the probability distribution on the set of paths is assigned. In order to make the presentation rigorous, we will first have to consider paths of bounded length . Later, we will extend the results for paths with arbitrary length. Let us first choose two nodes, a starting node and an ending node and define the set of paths (including cycles) of length from to as . Thus, contains all the paths allowing to reach node from node in exactly steps.
Let us further denote as the total cost associated to path . Here, we assume that is a valid path from node to node , that is, it consists of a sequence of nodes where for all . As already mentioned, we assume that the total cost associated to a path is additive, i.e. . Then, let us define the set of all length paths through the graph between all pairs of nodes as .
Finally, the set of all bounded paths up to length is denoted by . Note that, by convention, for and , zerolength paths are allowed with zero associated cost. Other types of paths will be introduced later; a summary of the mathematical notation appears in Table 1.
Now, we consider a probability distribution on this finite set , representing the probability of drawing a path from a bag containing all paths up to length . We search for the distribution of paths P minimizing the expected total costtogo, , among all the distributions having a fixed relative entropy with respect to a reference distribution, here the natural random walk on the graph (see Equation (1)). This choice naturally defines a probability distribution on the set of paths of maximal length such that highcost paths occur with a low probability while short paths occur with a high probability. In other words, we are seeking for path probabilities, , , minimizing the expected total cost subject to a constant relative entropy constraint^{2}^{2}2In theory, nonnegativity constraints should be added, but this is not necessary as the resulting probabilities are automatically nonnegative.:
(2) 
where is provided a priori by the user, according to the desired degree of randomness and represents the probability of following the path when walking according to the reference transition probabilities of the natural random walk on (see Equation (1)).
More precisely, we define , that is, the product of the transition probabilities along path – the likelihood of the path when the starting and ending nodes are known. Now, if we assume a uniform (nonuniform priors are considered in Section 4), independent, a priori probability, , for choosing both the starting and the ending node, then we set , which ensures that the reference probability is properly normalized^{3}^{3}3We will see later that the path likelihoods are already properly normalized in the case of hitting, or absorbing, paths: . See A..
The problem (2) can be solved by introducing the following Lagrange function
(3) 
and optimizing over the set of path probabilities . As could be expected, setting its partial derivative with respect to to zero and solving the equation yields a GibbsBoltzmann probability distribution on the set of paths up to length Mantrach2009 (),
(4) 
where the Lagrange parameter plays the role of a temperature and is the inverse temperature.
Thus, as desired, short paths (having a low cost ) are favored in that they have a large probability of being followed. From Equation (4), we clearly observe that when , the path probabilities reduce to the probabilities generated by the natural random walk on the graph (characterized by the transition probabilities as defined in Equation (1)). In this case, as well. But when is large, the probability distribution defined by Equation (4) is biased towards lowcost paths (the most likely paths are the shortest ones). Note that, in the sequel, it will be assumed that the user provides the value of the parameter instead of , with . Also notice that the model could be derived thanks to a maximum entropy principle instead Jaynes1957 (); Kapur1992 ().
3.2 The bagofpaths probabilities
Our BoP framework will be based on the computation of another important quantity derived from Equation (4): the probability of drawing a path starting in some node and ending in some other node from the bag of paths. For paths up to length this is provided by
(5) 
where is the set of paths connecting node and node up to length . From (4), this quantity simply computes the probability mass of drawing a path connecting to . The paths in can contain loops and could visit nodes and several times during the trajectory^{4}^{4}4Note that another interesting class of paths, the hitting, or absorbing, paths – allowing only one single visit to the ending node – will be considered in the next section 4..
a particular path  

the probability of drawing path  
set of paths connecting to in exactly steps  
set of paths connecting to in at most steps  
set of all paths of at most steps  
set of paths of arbitrary length connecting to  
set of all paths of arbitrary length  
transition probability matrix with elements  
cost matrix with elements  
likelihood of following path according to  
total cumulated cost when following path 
3.2.1 Computation of the bagofpaths probabilities for bounded paths
The analytical expression allowing to compute the quantity defined by Equation (5) will be derived in this subsection. Then, in the following subsection, its definition will be extended to the set of paths of arbitrary length (unbounded paths) by taking the limit .
We start from the cost matrix, , from which we build a new matrix, , as
(6) 
where is the transition probability matrix^{5}^{5}5Do not confuse matrix in bold with representing the reference probability of path . A summary of the notation appears in Table 1. of the natural random walk on the graph containing the elements , and the logarithm/exponential functions are taken elementwise. Moreover, is the elementwise (Hadamard) matrix product. Note that the matrix is not symmetric in general.
Then, let us first compute the numerator of Equation (5). Because all the quantities in the exponential of Equation (5) are summed along a path, and where each link lies on path , we immediately observe that element of the matrix ( to the power ) is where is the set of paths connecting the starting node to the ending node in exactly steps.
Consequently, the sum in the numerator of Equation (5) is
(7) 
where is a column vector full of 0’s, except in position where it contains a 1. By convention, at time step 0, the random walker appears in node with probability one and a zero cost: . This means that zerolength paths (without any transition step) are allowed in . If, on the contrary, we want to dismiss zerolength paths, we could redefine as the set as paths of length at least one (the summation starts at instead of ) and proceed in the same manner.
This previous Equation (7) allows to derive the analytical form of the probability of drawing a bounded path (up to length ) starting in node and ending in . Indeed, replacing Equation (7) in Equation (5), and recalling that , we obtain
(8) 
where is a vector of 1’s. Of course, there is no a priori reason to choose a particular path length; we will therefore consider paths of arbitrary length in the next section.
3.2.2 Proceeding with paths of arbitrary length
Let us now consider the problem of computing the probability of drawing a path starting in and ending in from a bag containing paths of arbitrary length, and therefore usually containing an infinite number of paths. Following the definition in the bounded case (Equation (5)), this quantity will be denoted as and defined by
(9) 
where is the set of paths (of all lengths) connecting to in the graph and the denominator is called the partition function of the bagofpaths system,
(10) 
The quantity in Equation (9) will be called the bagofpaths probability of drawing a path of arbitrary length starting from node and ending in node . As already stated, this key quantity captures a notion of relatedness, or similarity, between nodes of . From Equation (9), we observe that two nodes are considered as highly related (high probability of sampling them) when they are connected by many, preferably lowcost, paths, that is, when they are highly accessible. The quantity therefore integrates the concept of (indirect) connectivity, in addition to proximity (lowcost paths).
Now, from Equation (8), we need to compute
(11) 
We thus need to compute the wellknown power series of
(12) 
which converges if the spectral radius of is less than , . Because the matrix only contains nonnegative elements and is strongly connected, a sufficient condition for is that it is substochastic Meyer2000 (), which is always achieved for as for all and we assume that at least one element of is strictly positive. We therefore assume a .
Now, if we pose
(13) 
with given by Equation (6), we can pursue the computation of the numerator of Equation (11),
(14) 
where is element of . By analogy with Markov chain theory, is called the fundamental matrix Kemeny1960 (). Elementwise, following Equations (714), we have that
(15) 
which is actually related to the potential of a Markov chain Cinlar1975 (); Norris1997 (). From the previous equation, can be interpreted as
(16) 
For the denominator of Equation (9) and (11), we immediately find
(17) 
where is the value of the partition function . Therefore, from Equation (11), the probability of drawing a path starting in and ending in in our bagofpaths model is simply
(18) 
or, in matrix form,
(19) 
where , called the bagofpaths probability matrix, contains the probabilities for each startingending pair of nodes. Note that this matrix is not symmetric in general; therefore, in the case of an undirected graph, we might instead compute the probability of drawing a path or . The result is a symmetric matrix,
(20) 
and only the upper (or lower) triangular part of the matrix is relevant.
3.2.3 An intuitive interpretation of the
An intuitive interpretation of the elements of the matrix can be provided as follows Saerens2008 (); Mantrach2009 (). Consider a special random walk defined by the transition probability matrix whose elements are . As has some row sums less than one (the rows of C containing at least one strictly positive cost ), the random walker has a nonzero probability of disappearing in each of these nodes which is equal to at each time step. Indeed, from Equation (6), it can be observed that the probability of surviving during a transition is proportional to , which makes sense: there is a smaller probability to survive edges with a high cost. In this case, the elements of the matrix, , can be interpreted as the expected number of times that an “evaporating”, or “killed” random walk, starting from node , visits node (see for instance Snell1984 (); Kemeny1960 ()) before being killed.
4 Working with hitting/absorbing paths: the bag of hitting paths
The bagofhittingpaths model described in this section is a restriction of the previously introduced bagofpaths model in which the ending node of each path only appears once – at the end of the path. In other words, no intermediate node on the path is allowed to be the ending node , thus prohibiting looping on this node . Technically this constraint will be enforced by making the ending node absorbing^{6}^{6}6And killing, see later., as in the case of an absorbing Markov chain Snell1984 (); Isaacson1976 (); Kemeny1960 (); Norris1997 (). We will see later in this section that this model has some nice properties.
4.1 Definition of the bagofhittingpaths probabilities
Let be the set of hitting paths starting from and stopping once node has been reached for the first time ( is made absorbing). Let be the complete set of such hitting paths. Following the same reasoning as in the previous subsection, from Equation (9), when putting a GibbsBoltzmann distribution on , the probability of drawing a hitting path starting in and ending in is
(21) 
and the denominator of this expression is also called the partition function, , for the hitting paths system this time. The quantity will be called the bagofhittingpaths probability of drawing a hitting path starting in and ending in . Note that in the case of unbounded hitting paths, the reference path probabilities can be simply defined as if we assume a uniform reference probability for drawing the starting and ending nodes. With this definition, it is shown in A that the probability is properly normalized, i.e., .
Obviously, for hitting paths, if we adopt the convention that zerolength paths are allowed, paths of length greater than 0 starting in node and ending in the same node are prohibited – in that case, the zerolength path is the only allowed path starting and ending in and we set its equal to 1.
Now, following the same reasoning as in previous section, the numerator of Equation (21) is
(22) 
where is now matrix of Equation (6) where the th row has been set to (node is absorbing and killing meaning that the th row of the transition matrix, , is equal to zero) and . This means that when the random walker reaches node , he immediately stops his walk there. This matrix is given by with being a column vector containing the th row of .
4.2 Computation of the bagofhittingpaths probabilities
In B, it is shown from a bagofpaths framework point of view that the elements of can be computed simply and efficiently by
(23) 
which is a noteworthy result by itself. Note that this result has been rederived in a more conventional, but also more tedious, way through the ShermanMorrison formula by Kivimaki2012 () in the context of computing randomized shortest paths dissimilarities in closed form.
Using this result, Equation (22) can be developed as
(24) 
where we define the matrix containing the elements as – the fundamental matrix for hitting paths. The elements of the matrix are denoted by . From Equation (24), this matrix can be computed as with . Note that the diagonal elements of are equal to 1, . Moreover, when , and (at the limit, only shortest paths, without loops, are considered).
We immediately deduce the bagofhittingpaths probability including zerolength paths (Equation (21)),
(25) 
where the denominator of Equation (25) is the partition function of the hitting paths model,
(26) 
In matrix form, denoting by the matrix of bagofhittingpaths probabilities ,
(27) 
The algorithm for computing the matrix is shown in Algorithm 1. The symmetric version for hitting paths is obtained by applying Equation (20) after the computation of . An interesting application would be to investigate graph cuts based on bagofhittingpaths probabilities instead of the standard adjacency matrix.
4.3 An intuitive interpretation of the
In this section, we provide an intuitive description of the elements of the hitting paths fundamental matrix, . Let us consider a particular killed random walk with absorbing state on the graph whose transition probabilities are given by the elements of , that is, when and otherwise. In other words, the node is made absorbing and killing – it corresponds to hitting paths with node as hitting node. When the walker reaches this node, he stops his walk and disappears. Moreover, as for all , the matrix of transition probabilities is substochastic and the random walker has also a nonzero probability of disappearing at each step of its random walk and in each node for which . This stochastic process has been called an “evaporating random walk” in Saerens2008 () or an “exponentially killed random walk” in Steele2001 ().
Now, let us consider column (corresponding to the hitting, or absorbing, node) of the fundamental matrix of nonhitting paths, . Because the fundamental matrix is (Equation (13)), we easily obtain . Or, in elementwise form,
(28) 
When considering hitting paths instead, (see Equation (24)) because for all (node is made absorbing and killing) so that the second line of Equation (28) – the boundary condition – becomes simply for hitting paths. Moreover, we know that for any . Thus, dividing the first line of Equation (28) by provides
(29) 
Interestingly, this is exactly the set of recurrence equations computing the probability of hitting node when starting from node (see, e.g., Kemeny1960 (); Ross2000 (); Taylor1998 ()). Therefore, the represent the probability of surviving during the killed random walk from to with transition probabilities and node made absorbing. Said differently, it corresponds to the probability of reaching absorbing node without being killed during the walk.
5 Two novel families of distances based on hitting path probabilities
In this section, two families of distance measures are derived from the hitting path probabilities including zerolength paths^{7}^{7}7The results do not hold for a bag of paths excluding zerolength paths.. The second one benefits from some nice properties that will be detailed.
5.1 A first distance measure
The first distance measure is directly derived from the bagofpaths probabilities introduced in the previous section.
5.1.1 Definition of the distance
This section shows that the associated surprisal measure,
quantifying the “surprise” generated by the outcome , when symmetrized, is a distance measure. This distance associated to the bagofhittingpaths is defined as follows
(30) 
where and are computed according to Equation (25) or (27) for the matix form. Obviously, and is symmetric. Moreover, is equal to zero if and only if .
It is shown in C that this quantity is a distance measure since it satisfies the triangle inequality, in addition to the other mentioned properties. This distance will be called the bagofhittingpaths surprisal distance.
5.1.2 Computation of the distance
It can be computed by adding the following matrix operations to Algorithm 1:

take elementwise logarithm for computing the potentials

put diagonal to zero
We now turn to the development of the second distance measure.
5.2 A second distance measure
This subsection introduces a second measure enjoying some nice properties, based on the same ideas.
5.2.1 Definition of the distance
The second distance measure automatically follows from Inequality (55) in C and is based on the quantity . For convenience, let us recall this inequality,
Then, from (Equation (25)), we directly obtain . Taking of both sides provides , or,
(31) 
where we defined
(32) 
and, from (31), the obviously verify the triangle inequality.
The quantity will be called the potential Cinlar1975 () of node with respect to node . Indeed, it has been shown GarciaDiez2011b () that when computing the continuousstate continuoustime equivalent of the randomized shortest paths framework Saerens2008 (), plays the role of a potential inducing a drift (external force) in the corresponding diffusion equation. From the properties and the probabilistic interpretation of the , both (as ) and (as ) hold.
This directed distance measure has three intuitive interpretations.

First, let us recall from Equation (24) that is given by where is element of the fundamental matrix (see Equation (13)). From this last expression, can be interpreted (up to a scaling factor) as the logarithm of the expectation of the reward with respect to the path likelihoods, when considering absorbing random walks starting from node and ending in node .

In addition, from Equation (29), it also corresponds to minus the loglikelihood of surviving during the killed, absorbing, random walk from to .

Finally, it was shown in Kivimaki2012 (), investigating further developments of the randomized shortest paths (RSP) dissimilarity, that the potential distance also corresponds to the minimal free energy of the system of hitting paths from to . Indeed, the RSP dissimilarity, defined as the expected total cost between and , is not a distance measure as it does not satisfy the triangle inequality. However, subtracting the entropy from the expected total cost (that is, computing the free energy) leads to a distance measure that was shown to be equivalent to the potential distance. Therefore the potential distance was called the free energy distance in Kivimaki2012 (), which provides still another interpretation to the potential distance.
Inequality (31) suggests to define the distance . It has all the properties of a distance measure, including the triangle inequality, which is verified thanks to Inequality (31). Note that this distance measure can be expressed as a function of the surprisal distance (see Equation (30)) as for . This shows that the newly introduced distance is equivalent to the previous one, up to the addition of a constant and a rescaling.
The definition of the bagofhittingpaths potential distance is therefore
(33) 
and is element of the fundamental matrix (see Equation (13)).
5.2.2 Computation of the distance
From Equation (27), it can be easily seen that the matrix containing the can be computed thanks to Algorithm 1 without the normalization steps 7 and 8. The distance matrix with elements is denoted as and can easily be obtained by adding the following matrix operations to Algorithm 1:

take elementwise logarithm for computing the potentials

symmetrize the matrix

put diagonal to zero
Note that both the surprisal and the potential distances are welldefined as we assumed that is strongly connected.
5.3 Some properties of the potential and surprisal distances
The potential distance benefits from some interesting properties proved in the appendix:

The potential distance is graphgeodetic, meaning that if and only if every path from to passes through Chebotarev2011 () (see D for the proof).

For an undirected graph , the distance approaches the shortest path distance when becomes large, . In that case, the Equation (33) reduces to the BellmanFord formula (see, e.g., Bertsekas2000 (); Christofides_1975 (); Cormen2009 ()) for computing the shortest path distance,