Randomized Shortest Paths withNet Flows and Capacity Constraints Draft manuscript subject to changes

# Randomized Shortest Paths with Net Flows and Capacity Constraints   Draft manuscript subject to changes

## Abstract

This work extends the randomized shortest paths model (RSP) by investigating the net flow RSP and adding capacity constraints on edge flows. The standard RSP is a model of movement, or spread, through a network interpolating between a random walk and a shortest path behavior [Kivimaki-2012, Saerens-2008, Yen-08K]. The framework assumes a unit flow injected into a source node and collected from a target node with flows minimizing the expected transportation cost together with a relative entropy regularization term. In this context, the present work first develops the net flow RSP model considering that edge flows in opposite directions neutralize each other (as in electrical networks) and proposes an algorithm for computing the expected routing costs between all pairs of nodes. This quantity is called the net flow RSP dissimilarity measure between nodes. Experimental comparisons on node clustering tasks show that the net flow RSP dissimilarity is competitive with other state-of-the-art dissimilarities. In the second part of the paper, it is shown how to introduce capacity constraints on edge flows and a procedure solving this constrained problem by using Lagrangian duality is developed. These two extensions should improve significantly the scope of applications of the RSP framework.

## 1 Introduction

Link analysis and network science are devoted to the analysis of network data and are currently studied in a large number of different fields (see, e.g., [Barabasi-2015, Brandes-2005, chung06, Estrada-2012, Fouss-2016, Kolaczyk-2009, Lewis-2009, Newman-2018, Silva-2016, Thelwall04, Wasserman-1994]). One recurring problem in this context is the definition of meaningful measures capturing interesting properties of the network and its nodes, like distance measures between nodes or centrality measures. These quantities usually take the structure of the whole network into account.

However, many such measures are based on some strong assumptions about the movement, or communication, occurring in the network whose two extreme cases are the optimal behavior and the random behavior. Indeed, the two most popular distance measures in this context are the least-cost distance (also called shortest-path distance) and the resistance distance [Klein-1993] (equal to the effective resistance and proportional to the commute-time distance when considering a random walk on the graph [FoussKDE-2005]). The same holds with the betweenness centrality; popular measures are the shortest path betweenness introduced by Freeman [Freeman-1977] (based on shortest paths) and the random-walk betweenness (also called the current-flow betweenness) introduced both by Newman [Newman-05] and by Brandes and Fleischer [Brandes-2005b], independently. Still another example is provided by the measures of compactness of a network, the Wiener index (based on shortest paths; see, e.g., [Brandes-2005]) and the Kirchhoff index (based on random walks or electrical networks; see, e.g., [Klein-1993]).

But, in reality, behavior is seldom based on complete randomness or optimality. Therefore, in the last years, there has been a large effort invested in defining models interpolating between shortest-path and random-walk behaviors, especially in the context of defining distance measures between nodes taking both proximity and high connectivity into account [Fouss-2016]. Some of these models are based on extensions of electrical networks [vonLuxburg-2011, Herbster-2009, Nguyen-2016, vonLuxburg-2014], some on combinatorial analysis arguments [Chebotarev-2011, Chebotarev-2012, Chebotarev-2013], some on mixed L1-L2 regularization [Li-2011, Li-2013], some on entropy-regularized network flow models [Bavaud-2012, Guex-2015], and still others on entropy-regularized least-cost paths ([Francoisse-2017, Kivimaki-2012, Saerens-2008, Yen-08K], inspired by the transportation model [Akamatsu-1996] and related to [Todorov-2007]) – the randomized shortest paths (RSP) framework which is the main subject of this paper. Recently, the RSP has been successfully used in, e.g., defining betweenness measures interpolating between a variant of the shortest-path betweenness and the random-walk betweenness [Kivimaki-2016], modelling the behavior of animals [Panzacchi-2016], and fraud detection [Courtain-2019]. For a more thorough discussion of families of distances between nodes, see, e.g., [Fouss-2016, Francoisse-2017, Kivimaki-2012].

This effort is pursued here by proposing two extensions of this RSP model, considering

• a new dissimilarity measure extending the RSP dissimilarity [Kivimaki-2012, Saerens-2008, Yen-08K], namely the net-flow RSP dissimilarity. This new dissimilarity is – like the standard RSP dissimilarity – the expected cost needed for reaching the target node from the source node, but now considering that edge flows in two opposite directions cancel out, as for electrical currents. Therefore, this model of movement based on net flows resembles more to electric flows when the temperature of the system is high. An algorithm is proposed for computing the net-flow RSP dissimilarity matrix between all pairs of nodes.

• the introduction of capacity constraints in the model. Capacity constraints on edge flows are very common in practice [Ahuja-1993, Dolan-1993] and it would certainly increase significantly the applicability of the RSP model if such constraints could be integrated. Therefore, the main contributions related to capacity constraints are (1) to show how the model can accommodate such constraints, for both raw edge flows and net flows, and (2) to provide an algorithm for solving the constrained RSP model in the case of a single pair of source/target nodes.

Capacity constraints are common in network flow problems and there is a vast literature on the subject (see, e.g., [Ahuja-1993, Bertsekas-1998, Dolan-1993, Ford-1962, Gondran-1984, Jungnickel-2005, Korte-2018] and references therein). They appear for instance in the standard minimum cost flow problem which aims at finding the minimum cost source-target flow subject to some capacity constraints. As discussed in [Ahuja-1993], this task frequently appears in real-world problems; see this reference for concrete applications. Capacity constraints are present for, e.g., avoiding congestion, spreading the traffic, or simply because the flow is restricted. Various algorithms for solving the standard minimum cost flow problem have been proposed: minimum mean cycle-cancelling, successive shortest paths, network simplex, primal-dual, to name a few (see the above mentioned references). The difference with our work is that, in the RSP, the model is expressed in terms of full paths and a Kullback-Leibler regularization term is introduced, inducing randomized policies that can be computed by standard matrix operations. Because of their practical usefulness, we found that it would be valuable to introduce such capacity constraints within the RSP framework, which is developed in Section 4.

The content of the paper are as follows. Section 2 summarises the standard RSP framework. Section 3 introduces the net flow RSP dissimilarity. Then, Section 4 develops new algorithms for constraining the flow capacity on edges, while Section 5 deals with net flow capacity constraints. Illustrative examples and experiments on node clustering tasks are described in Section 6. Finally, Section 7 is the conclusion.

## 2 The standard randomized shortest paths framework

As already discussed, the main contributions of the paper are based on the RSP framework, interpolating between a least-cost and a random-walk behavior, and allowing to define dissimilarity measures between nodes [Fouss-2016, Kivimaki-2012, Saerens-2008, Yen-08K]. The formalism, inspired by models developed in transportation science [Akamatsu-1996], is based on full paths instead of standard “local” edge flows [Ahuja-1993, Dolan-1993] and is briefly described in this section for completeness. We start by providing a short account of the RSP formalism before introducing, in the next sections, the net flow RSP dissimilarity as well as the algorithm for solving the flow-capacity constrained problem on a directed graph.

### 2.1 Background and notation

Let us first introduce some necessary notation [Fouss-2016, Francoisse-2017]. First, notice that column vectors are written in bold lowercase and matrices are in bold uppercase. Moreover, in this work, we consider a weighted directed1, strongly connected, graph, , with a set of nodes and a set of edges. An edge connecting node to node is denoted by or .

Furthermore, we are given an adjacency matrix quantifying the directed, local, affinity between pairs of nodes , . We further assume that there are no self-loops in the network, that is, for all . From this adjacency matrix, the standard reference random walk on the graph is defined in the usual way: the transition probabilities associated to each node are set proportionally to the affinities and then normalized in order to sum to one,

 prefij=aij∑nj′=1aij′=aijdi (1)

where is the (out)degree of node . The matrix is stochastic and is called the transition matrix of the natural, reference, random walk on the graph.

In addition, a transition cost, , is associated to each edge of . If there is no edge linking to , the cost is assumed to take an infinite value, . For consistency, we assume that if . The cost matrix is defined accordingly, . Costs are usually set independently of the adjacency matrix; they quantify the immediate cost of a transition, depending on the application at hand (see [Francoisse-2017] for a discussion). In electrical networks, the costs are resistances and the affinities are conductances; in this context, they are linked by .

A path or walk is a finite sequence of hops to adjacent nodes on (including cycles), initiated from a source node and stopping in some ending, target, node . A hitting path is a path where the last node does not appear as an intermediate node. In other words, a hitting path to node stops when it reaches for the first time. We consider hitting paths to the fixed target node by setting this target node as absorbing (or “killing”). Computationally this is achieved by setting the corresponding row of the adjacency matrix and of the transition matrix to zero.

The node along path at position on the path is denoted by . The total cost of a path, , is simply the sum of the edge costs along while its length is the number of steps, or hops, needed for following that path.

### 2.2 The standard randomized shortest paths formalism

The main idea behind the RSP model is the following. Let us consider the set of all hitting paths, or walks, from a node (source) to an absorbing, killing, node (target) on . Then, we assign a probability distribution on the discrete set of paths by minimizing the free energy [Bavaud-2012, Jaynes-1957, Kivimaki-2012, Peliti-2011, Reichl-1998],

 \vlineminimize{P(℘)}℘∈Pstϕ(P)=∑℘∈PstP(℘)~c(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))subjectto∑℘∈PstP(℘)=1 (2)

where is the total cumulated cost along path when visiting the nodes in the sequential order. Furthermore, is the product of the reference transition probabilities along path , i.e., the random walk probability of path .

The objective function (Eq. 2) is a mixture of two dissimilarity terms, with the temperature balancing the trade-off between them. The first term is the expected cost for reaching target node from source node (favoring shorter paths – exploitation). The second term corresponds to the relative entropy, or Kullback-Leibler divergence, between the path probability distribution and the reference (random walk) paths probability distribution (introducing randomness and diversity [Pioro-2004]exploration). For a low temperature , low-cost paths are favoured whereas when is large, paths are chosen according to their likelihood in the reference random walk on .

The problem (2) corresponds to a standard minimum cost flow problem, as discussed in the introduction2, with a Kullback-Leibler regularization term expressed in terms of full paths in the RSP formalism. There are, however, some subtle differences, like for instance the fact that in the standard minimum cost flow problem, the flows are unidirectional whereas the RSP defines a Markov chain for which flows are generally bi-directional.

This argument, akin to maximum entropy [Cover-2006, Jaynes-1957, Kapur-1989], leads to a Gibbs-Boltzmann distribution on the set of paths (see, e.g., [Francoisse-2017]),

 P∗(℘)=~π(℘)exp[−θ~c(℘)]∑℘′∈Pst~π(℘′)exp[−θ~c(℘′)]=~π(℘)exp[−θ~c(℘)]Z (3)

where is the inverse temperature and the denominator is the partition function of the system. This provides the randomized routing policy in terms of paths.

Interestingly, if we replace the probability distribution by the optimal distribution provided by Eq. 3 in the objective function (Eq. 2), we obtain

 ϕ∗=ϕ(P∗) =∑℘∈PstP∗(℘)~c(℘)+T∑℘∈PstP∗(℘)log(P∗(℘)~π(℘)) =∑℘∈PstP∗(℘)~c(℘)+T∑℘∈PstP∗(℘)(−1T~c(℘)−logZ) =−TlogZ (4)

which is called the directed free energy distance [Kivimaki-2012], playing the role of a potential in the equivalent continuous state-space diffusion process [Garcia-Diez-2011b].

### 2.3 Computing quantities of interest

The quantities of interest can be computed by taking the partial derivative of the optimal free energy provided by Eq. 4 [Fouss-2016, Francoisse-2017, Kivimaki-2012, Saerens-2008, Yen-08K]. Here, we only introduce the quantities that are needed in order to derive the algorithms developed later.

Fundamental matrix and partition function It turns out that the partition function can be computed in closed form from an auxiliary matrix3 . First, the fundamental matrix of the RSP system is defined as

 Z=I+W+W2+⋯=(I−W)−1withW=Pref∘exp[−θC] (5)

where is the cost matrix and is the elementwise (Hadamard) product. The equation sums up contributions of different paths lengths, starting from zero-length paths (identity matrix ). The partition function is then provided by [Francoisse-2017, Kivimaki-2012, Saerens-2008, Yen-08K].

Computation of flows and numbers of visits The directed flow on edge (the expected number of passages through when going from to ) can be obtained from Eq. 4 and Eq. 5 for a given inverse temperature (see [Kivimaki-2016, Saerens-2008] for details) by

 ¯nij≜∑℘∈PstP(℘)η((i,j)∈℘)=−1θ∂logZ∂cij=zsiwijzjtzst (6)

where counts the number of times edge appears on path . Now, as only the row and the column of fundamental matrix are needed, two systems of linear equations can be solved instead of matrix inversion in Eq. 5. Let us further define the matrix containing the expected number of passages through edges by . From the last Eq. 6, the expected number of visits to a node can also be defined, and easily computed, as

 ¯nj=∑i∈Pred(j)¯nij+δsj=n∑i=1¯nij+δsj=zsjzjtzst+δsj. (7)

because a unit flow of is injected in node . Note that is the set of predecessor nodes of node and we used .

Optimal transition probabilities Finally, the optimal transition probabilities of following any edge (the policy) induced by the set of paths and their probability mass (Eq. 3) are [Saerens-2008]

 p∗ij=¯nij∑nj′=1¯nij′=zjtzitwijfor all i≠t (8)

which defines a biased random walk (a Markov chain) on – the random walker is “attracted” by the target node . The lower the temperature, the larger the attraction. Interestingly, these transition probabilities do not depend on the source node and correspond to the optimal randomized strategy, or policy, for reaching target node, minimizing free energy (Eq. 2).

## 3 The net flow randomized shortest paths dissimilarity

In this section, we introduce the net flow RSP dissimilarity extending the standard RSP dissimilarity developed in [Kivimaki-2012, Saerens-2008, Yen-08K]. As for the standard RSP dissimilarity, it corresponds to the expected cost for reaching target node from source node but with the important difference that net flows are considered instead of raw flows. These measures are now introduced in this section.

### 3.1 Definition of the net edge flow

In some situations, for instance electrical networks [Bollobas-1998, Snell-1984], only net flows matter. Intuitively, it means that the edge flows in opposite directions and compensate each other so that only the positive net flow, defined as , is taken into account, where edge flows are given by Eq. 6. In many situations, net flows look intuitively more natural because of the flow compensation mechanism common to electricity. Net flows have already been used in the RSP framework in order to define node betweenness measures [Kivimaki-2016], generalizing two previous models based on electrical currents [Brandes-2005b, Newman-05]. They are further investigated in this section in order to define a new dissimilarity measure between nodes of a graph.

As for electrical currents [Bollobas-1998, Snell-1984], the non-negative net flow in each edge , denoted here as , is given in the RSP formalism by

 jij=max((¯nij−¯nji),0) (9)

or, in matrix form,

 J=max((N−NT),% \large 0) (10)

where the maximum is taken elementwise. This means that, for each edge, the net flow is defined (that is, positive) in only one direction4, and is equal to zero in the other direction.

Interestingly, because the flow is equal to zero in one of the two edge directions, the net flow defines a directed graph from the source to the destination node, even if the original graph is undirected.

### 3.2 Expected net cost and net RSP dissimilarity measure

The expected cost until absorption by target node at temperature can easily be computed in closed form from the RSP formalism. This expected cost spread in the network has been used as a dissimilarity measure between nodes [Kivimaki-2012, Saerens-2008, Yen-08K] and was called the directed RSP dissimilarity. More formally, in the standard RSP framework, the expected cost spread in the network [Garcia-Diez-2011] is given by

 ⟨~c⟩=∑℘∈PstP(℘)~c(℘) (11)

Let us now express the cost along path as where we saw that is the number of times edge appears on path and is the set of edges. Injecting this last result in Eq. 11 provides

 ⟨~c⟩=∑(i,j)∈E(\underbracket∑℘∈PstP(℘)η((i,j)∈℘)expected number of passages ¯nij)cij=∑(i,j)∈E¯nijcij (11)

or, in matrix form,

 ⟨~c⟩=eT(N∘C)e (12)

where is the elementwise matrix product and is the matrix of expected number of passages through edges defined in Eq. 6. This quantity is just the cumulative sum of the expected number of passages through each edge times the cost of following the edge.

When dealing with net flows instead, the Eq. 12, now computing the expected net cost, becomes

 ⟨~cnet⟩=eT[max((N−NT),\large 0)∘C]e=eT(J∘C)e (13)

This quantity can be interpreted as the net cost needed to reach the target node from the source node in a biased random walk (defined by Eq. 3 or Eq. 8) attracting the walker toward the target node . It is therefore the equivalent of the expected first passage cost defined in Markov chain theory [Norris-1997, Taylor-1998], translated in the RSP formalism and for net flows. It can be seen as a directed dissimilarity between node and node taking both proximity and amount of connectivity between and into account.

When the temperature is large, , the directed dissimilarity reduces to the least-cost distance between and , while when , it tends to the expected net cost for a random walker moving according to the reference random walk (and thus electrical current). This quantity is in fact equivalent to the so-called distance introduced in [Nguyen-2016], i.e., the weighted-by-costs sum of the net flows.

Therefore, the net flow RSP dissimilarity (nRSP, the counterpart of the standard RSP dissimilarity [Kivimaki-2012, Saerens-2008, Yen-08K]) between node and node is defined as the symmetrized quantity

 Δn\textscrspst=⟨~cnet⟩st+⟨~cnet⟩ts (14)

where the starting and ending node are specified again. This is an equivalent of the symmetric commute-cost quantity appearing in Markov chains [FoussKDE-2005].

Notice the difference between this quantity and the energy spread in an electrical network. Indeed, if the costs are viewed as resistances, then, in the context of a resistive network, the energy weights the costs by the squared net flows – instead of the simple net flows in Eq. 13 [Bollobas-1998, Snell-1984].

### 3.3 Net flows define a directed acyclic graph

Let us now show that the RSP net flows to a fixed target node define a directed acyclic graph (DAG) when the reference probabilities are defined by Eq. 1 on the weighted undirected graph . This comes from the fact that the net flows provided by Eq. 9 can be considered as an electrical current generated from a new graph derived from by redefining its edge weights. Moreover, it is well-known that electrical current defines a DAG because current always follows edges in the direction of decreasing potential (voltage). This potential therefore defines a topological ordering [Cormen-2009] of the nodes from higher potential to lower one.

More precisely, for a fixed target , let us define the graph by considering the following weights on edges (conductances in electrical circuits)

 ^aij≜zitwijzjtdi=zitprefijexp(−θcij)zjtdi=zitaijexp(−θcij)zjt (15)

which is symmetric when and are symmetric. Accordingly, we define immediate costs (resistances in electrical circuits). The natural transition probabilities on this new graph are provided by Eq. 1 where we replace by ,

 ^pij=^aij∑nj′=1^aij′=wijzjt∑nj′=1wij′zj′t=zjtzitwijfor all i≠t (16)

which, from Eq. 8, are exactly the optimal RSP transition probabilities. Note that we used the relation , which can be easily derived from the definition of the fundamental matrix (Eq. 5) (see also, e.g., [Fouss-2016, Kivimaki-2012, Saerens-2008, Yen-08K]).

This shows that the net flows resulting from the optimal biased random walk provided by Eq. 8 are generated by a random walk on (a Markov chain). From the equivalence between random walks on an undirected graph and electrical current on [Snell-1984], this current defines a DAG5.

### 3.4 Computation of the net flow randomized shortest paths dissimilarity

This subsection shows how the net flow RSP dissimilarity between all pairs of nodes (Eq. 14) can be computed in matrix form from previous work [Kivimaki-2016]. Unfortunately, the computation of the net flow RSP dissimilarities is more time-consuming than computing the standard RSP dissimilarities, where it suffices to perform a matrix inversion [Kivimaki-2012]. This is because before being able to compute the dissimilarities, we need to find the net flows, which involves a non-linear function (max). It is, however, still feasible for small- to medium-size networks.

The algorithm for computing the net flow RSP dissimilarity is detailed in Algorithm 1. It uses a trick introduced in [34] for calculating the net flows between all sources-destinations , in a particular edge without having to explicitly turn node into a killing, absorbing, node. More specifically, the procedure is an adaptation of Algorithm 2 in [Kivimaki-2016] (following Eq. 12 in this work, providing net flows) to the case of an undirected graph and the computation of net flow dissimilarities, instead of betweenness centrality. It is also optimized in order to loop over (undirected) edges only once: on line 10, the contributions of the two directions of edge (one is necessarily equal to and the other is equal to ) are summed together.

Following [Kivimaki-2016], the time complexity is where is the number of edges and the number of nodes, or overall because for an undirected, connected, graph. Indeed, the algorithm contains a matrix inversion, which is . Thereafter, there is a loop over all edges repeating some standard matrix operations of order , which finally provides . Therefore the algorithm does not scale well for large graphs; in its present form, it can only be applied on medium-size graphs.

## 4 Dealing with edge flow capacity constraints

In this section, an algorithm computing the optimal policy (Eq. 8) under capacity flow constraints on edges is derived. For convenience, we assume a weighted, undirected, connected, network with a single source node (node ) and one single target node (node ). An input flow is injected in node and absorbed in node , but the model can easily be generalized to multiple sources and destinations as well as directed graphs. As before, it is assumed that the target node is killing and absorbing, meaning that the transition probabilities for all nodes , including node .

The idea now is to constrain the flow visiting some edges belonging to a set of constrained edges . The expected number of passages through those edges (see Eq. 6) is therefore forced to not exceed some predefined values (upper bound),

 ¯nij≤σij for edges (i,j)∈C (17)

which ensures that the flows on edges in are limited by the capacity and thus must remain in the interval . In this section, we consider that, although the graph is undirected, the capacity constraints are directed and thus active in only one direction of an edge. Therefore, each undirected edge of possibly leads to two directed edges and in , with different capacity constraints. Thus the set of constrained edges contains directed edges, limiting the directional flow through them.

Moreover, we assume that the constraints are feasible6. The solution will minimize the free energy objective function (Eq. 2) while satisfying these inequality constraints.

### 4.1 The Lagrange function in case of capacity constraints

From standard nonlinear optimization theory [Bertsekas-1999, Culioli-2012, Griva-2008, Minoux-2008], the Lagrange function is

 L(P,λ) =∑℘∈PstP(℘)~c(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1) +∑(i,j)∈Cλij(¯nij−σij) =\underbracket∑℘∈PstP(℘)~c(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))free % energy, ϕ(P)+μ(∑℘∈PstP(℘)−1) +∑(i,j)∈Cλij(\underbracket∑℘∈PstP(℘)η((i,j)∈℘)¯nij(see Eq.\ ???)−σij) (18)

where there is a Lagrange parameter associated to the sum to one constrain and a Lagrange parameter associated to each constrained edge. The Lagrange parameters , , are all non-negative in the case of inequality constraints and are stacked into the parameter vector .

Note that, by inspecting (18) and from the convexity of the Kullback-Leibler divergence, the primal objective function to be minimized with respect to the discrete probabilities (which is similar to Eq. 2) is convex and the equality constraints are all linear. Therefore, the duality gap between the primal and dual problems is zero, which will be exploited for solving the problem (see, e.g., [Bertsekas-1999, Culioli-2012, Griva-2008, Minoux-2008]).

Now, the Lagrange function in Eq. 18 can be rearranged as

 L(P,λ) =∑℘∈PstP(℘)\underbracket(~c(℘)+∑(i,j)∈Cλijη((i,j)∈℘))augmented cost ~c′(℘) cumulated on path ℘ +T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1)−∑(i,j)∈Cλijσij =∑℘∈PstP(℘)∑(i,j)∈Eη((i,j)∈℘)\underbracket(cij+δ((i,j)∈C)λij)augmented costs c′ij +T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1)−∑(i,j)∈Cλijσij =\underbracket∑℘∈PstP(℘)~c′(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))free energy based on augmented costs, ϕ′(P)+μ(∑℘∈PstP(℘)−1) −∑(i,j)∈Cλijσij (19)

where the symbol is defined as when edge and otherwise. Note that we used (Eq. 11) computing the total cost along path .

During this derivation, we observed that the costs can be redefined into augmented costs integrating the additional “virtual” costs needed for satisfying the constraints, and provided by the Lagrange parameters,

 c′ij={cij+λijwhen edge (i,j)∈Ccijotherwise (20)

and will be the matrix containing these augmented costs. Thus the Lagrange parameters have an interpretation similar to the dual variables in linear programming; they represent the extra cost to pay associated to each edge in order to satisfy the constraints [Bertsekas-1999, Culioli-2012, Griva-2008, Minoux-2008, Vajda-1961]. This is also common in many network flow problems [Ahuja-1993, Dolan-1993].

Let be the free energy obtained in Eq. 19 from these augmented costs (Eq. 20). We now turn to the problem of finding the Lagrange parameters by exploiting Lagrangian duality.

### 4.2 Exploiting Lagrangian duality

In this subsection, we will take advantage of the fact that, in the formulation of the problem, the Lagrange dual function and its gradient are easy to compute; see for instance [Jebara-2004] for similar arguments in the context of supervised classification. Indeed, as the objective function is strictly convex, all the equality constraints are linear and the support set for the path probabilities is convex, it is known that there is only one global minimum and the duality gap is zero provided that the problem is feasible7 [Bertsekas-1999, Boyd-2004, Culioli-2012, Griva-2008, Minoux-2008]. The optimum is a saddle point of the Lagrange function and a common optimization procedure (often called the Arrow-Hurwicz-Uzawa algorithm [Arrow-1958, Minoux-2008]) consists in sequentially (i) solving the primal (finding the optimal probability distribution) while considering the Lagrange parameters as fixed and then (ii) maximizing the dual (which is concave) with respect to the Lagrange parameters, until convergence. This procedure converges to the global minimum – see the mentioned references.

In our context, this provides the following steps [Griva-2008], which are iterated until convergence,

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩L(λ)=L(P∗(λ),λ)=minimize{P(℘)}℘∈PstL(P,λ)(compute the dual% function)λ∗=argmaxλL(λ)(maximize the dual function)λ=λ∗ (21)

and this is the procedure that will be followed, where the dual function will be maximized through a simple gradient ascent procedure.

#### Computing the dual function

For computing the dual function in Eq. 21, we first have to find the optimal probability distribution in terms of the Lagrange parameters. We thus have to compute the minimum of Eq. 19 for a constant . But this Lagrange function (Eq. 19) is identical to the Lagrange function associated to the standard RSP optimization problem (Eq. 2), except that the costs are replaced by the augmented costs , and the introduction of the last, additional, term, which does not depend on the probability distribution . Therefore, the probability distribution minimizing Eq. 19 is a Gibbs-Boltzmann distribution of the form of Eq. 3, but now depending on the augmented costs instead of the original costs.

Then, we replace the probability distribution in by the optimal Gibbs-Boltzmann distribution depending on the augmented costs and thus also on the Lagrange parameters. From the result of Eq. 4, the obtained dual function (Eq. 21) is

 L(λ)=L(P∗(λ),λ)=−TlogZ′−∑(i,j)∈Cλijσij (22)

In this equation, is the partition function computed from the augmented costs and thus depends on . We now need to maximize this dual function in function of these Lagrange parameters.

#### Maximizing the dual function

Let us now compute the gradient of the dual function with respect to the non-negative for edges . This maximization of the dual function can be done, e.g., by using the simple method developed by Rockafellar (see [Rockafellar-1973], Eqs. 10 and 12). From (Eq. 6), the gradient of the dual Lagrange function (Eq. 22), for , is simply , where we used the definition of the augmented costs in Eq. 20 as well as Eq. 6. It can be observed that we simply recover the expressions for the capacity constraints; this is actually a standard result when dealing with maximum entropy problems (see, e.g., [Jebara-2004, Kapur-1989]).

For computing the Lagrange parameters, we thus follow [Rockafellar-1973] who proposed the following gradient-based updating rule8

 λij←max(λij+α(¯nij−σij),0) for all (i,j)∈C (23)

which is guaranteed to converge in the concave case, as far as is positive, is not too large, and the problem is feasible [Rockafellar-1973]. The update is iterated until convergence. Of course, we could have used other, more sophisticated and more efficient, optimization techniques (see, e.g., [Bertsekas-1999, Boyd-2004, Griva-2008, Nocedal-2006]), but this simple procedure worked well for all our tests on small to medium graphs. The parameter had to be tuned manually for each different data set, but this did not create any difficulty.

### 4.3 The resulting algorithm and discussion

The resulting algorithm is presented in Algorithm 2. It computes the optimal policy (Eq. 8) minimizing the objective function (Eq. 2) while satisfying the inequality constraints (Eq. 17). This optimal policy guides the random walker to the target state with a trade-off between exploitation and exploration monitored by temperature parameter . The different steps of the procedure are the following:

• Initialize the Lagrange parameters to and the augmented costs to the original edge costs .

• Then, iterate the following steps until convergence:

• The elements of the fundamental matrix that are needed are recomputed from the current augmented costs (Eq. 5) by solving two systems of linear equations.

• The expected number of passages through each edge (edge flows) is computed (Eq. 6).

• The Lagrange parameters and the augmented costs are updated (Eqs. 23, 20).

• Compute and return the optimal policy (transition probabilities) according to Eq. 8.

Because two systems of linear equations need to be solved at each iteration, its complexity is of order , depending on the number of iterations needed for convergence, which is unknown in advance. However, for sparse graphs, the complexity could be reduced by taking advantage of the recent effort for developing special numerical methods for solving sparse systems of linear equations [Spielman-2004]. Indeed, for undirected graphs with a symmetric cost matrix, the auxiliary matrix is similar to a symmetric diagonally dominant matrix for which fast solvers might be applied. This interesting avenue is left for further work.

Note also that lower bounds on the expected flow have also been considered. However, in this case, the resulting virtual costs (Lagrange parameters) can become negative in order to augment the flow in the edge. This sometimes lead to numerical instabilities when some costs become negative and negative cycles appear. One quick fix is to prohibit negative costs but then, in some situations, the problem becomes unfeasible. Therefore, we decided to leave the study of lower-bounded capacities on flows for further work.

Moreover, it was assumed that the problem is feasible, which can be difficult to check in practice. Indeed, the model injects a unit flow in the network so that the max flow through the network must at least be equal to one. This means that we cannot blindly assign capacities because, in the case where the problem is not feasible, the algorithm does not converge. One way to check the overall capacity of the network is to run a standard max-flow algorithm [Ahuja-1993, Bertsekas-1998, Dolan-1993, Ford-1962, Gondran-1984, Jungnickel-2005, Korte-2018].

## 5 Dealing with net flow capacity constraints

Let us now consider the case where the capacities with are defined on net flows instead of raw flows. It is therefore assumed in this section that the original graph is undirected and that the adjacency matrix as well as the cost matrix are symmetric. In this situation (see Eq. 9 and its discussion), the constraints operate on the net flows instead of the raw flows,

 jij= max((¯nij−¯nji),0)≤σij, or equivalently both{¯nij−¯nji≤σij¯nji−¯nij≤σijfor each edge (i,j)∈C (24)

In the net flows setting, we further consider that is always defined when there exists a capacity constraint and that (symmetry). Then, in this last equation, only one of the two net flows is positive so that the constraint only operates in this direction of the flow: the second constraint is automatically satisfied. However, we formulate the constraints in both directions since we do not know a priori which one is potentially active. To summarize, if the set of constraints nodes contains edge , it also necessarily contains its reciprocal (they come by pair) with the same capacity value, . We now turn to the derivation of the Lagrange function and the algorithm.

### 5.1 The Lagrange function in case of net flow capacity constraints

The Lagrange function then becomes, after re-arranging the terms like in Eq. 19 and using ,

 L(P,λ) =∑℘∈PstP(℘)~c(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1) +∑(i,j)∈Cλij[¯nij−¯nji−σij] +T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1)−∑(i,j)∈Cλijσij

Now, from the symmetry of edges (edges are present by pairs; for each : ), we deduce . Injecting this result in the previous equation and proceeding in the same way as in Eq. 19 provides

 L(P,λ) =∑℘∈PstP(℘)\underbracket(~c(℘)+∑(i,j)∈C(λij−λji)η((i,j)∈℘))augmented cost ~c′(℘) cumulated on path ℘ +T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1)−∑(i,j)∈Cλijσij =∑℘∈PstP(℘)∑(i,j)∈Eη((i,j)∈℘)\underbracket(cij+δ((i,j)∈C)(λij−λji))augmented costs c′ij +T∑℘∈PstP(℘)log(P(℘)~π(℘))+μ(∑℘∈PstP(℘)−1)−∑(i,j)∈Cλijσij =\underbracket∑℘∈PstP(℘)~c′(℘)+T∑℘∈PstP(℘)log(P(℘)~π(℘))free energy based on augmented costs, ϕ′(P)+μ(∑℘∈PstP(℘)−1) −∑(i,j)∈Cλijσij (25)

which has exactly the same form as in the raw flow case (see Eq. 19) – only the definition of the augmented costs differs in the two expressions. Indeed, as before, the costs can be redefined into augmented costs,

 c′ij={cij+λij−λjiwhen edge % (i,j)∈Ccijotherwise (26)

and we must stress that we require that the constraints in Eq. 24 are symmetric and come by pair.

By following the same reasoning as in the previous section (see Eq. 22 and 23), it can immediately be observed that the dual function has the same form and is provided by Eq. 22.

### 5.2 The resulting algorithm and discussion

In the net flow context, by proceedings in the same way as in previous section (see Subsection 4.2), the gradient of the dual Lagrange function (Eq. 25) for augmented costs provided by Eq. 26 is . From this last result, we can easily derive the update of the Lagrange parameters,

 λij←max(λij+α(¯nij−¯nji−σij),0) for all (i,j)∈C (27)

The Algorithm 2 is easy to adapt in order to consider symmetric net flow capacity constraints (): only lines 10 and 11 must be modified according to Eqs. 26 and 27.

In practice, we observed that the net flow constraint algorithm is slower to converge, and more sensitive to the gradient step, than simple flow constraints; in other words, the problems looks harder to solve. This is especially the case for small values of , where the process behaves more like a random walker. Indeed, in that situation, the addition of capacity constraints seems to be partly in conflict with the objective of walking randomly and moving back and forth. One way to handle this issue [Dial71] would be to first define a directed acyclic graph (DAG, deduced, e.g., from the electrical potential on nodes when imposing a +1 potential at source node and a 0 potential at target node, followed by a topological sort). Then, a RSP problem with simple capacity constraints is solved on this DAG. This has the additional advantage that it is much more efficient (we avoid cycles and can use dynamic programming for computing the RSP solution) and it should scale to larger graphs. This extension is left for further work.

## 6 Experiments

In this section, we first present two illustrative examples of the use of capacity constraints on the edge flows of a graph as well as a brief study of the scalability of the net flow Algorithm 1. Then, we evaluate the net flow RSP on unsupervised classification tasks and compare its results to other state-of-the-art graph node distances. We have to stress that our goal here is not to propose new node clustering algorithms outperforming state-of-the-art techniques. Rather, the aim is to investigate if the net flow RSP model is able to capture the community structure of networks in an accurate way, compared to other dissimilarity measures between nodes.

### 6.1 Illustrative examples

First example The first example illustrates the expected number of visits to nodes (see Eq. 7) over the RSP paths distribution between one source node and one target node on a grid, in two different situations obtained after running the Algorithm 2. Nodes are linked to their neighbors9 with a unit affinity and a unit cost. In the first situation, we do not set any capacity constraint and the expected number of visits is represented in Fig. 1(a). As expected, it follows a trajectory close to the diagonal of the grid, representing the shortest paths between the source node and the target node.

In the second situation, we placed two obstacles (walls) by constraining the capacities of all the edges linking the nodes represented in red in Fig. 1(b) to . As can be observed in 1(c), this time, the expected number of visits to nodes no longer concentrates around the diagonal, but follows as close as possible the obstacles. This trajectory represents the least-cost paths between the source node and the target node avoiding the low-capacity obstacles.

Second example Our second illustrative example is taken from [Price-1971] and makes a link between the RSP with net flow capacity constraints and the maximum flow problem. Its aim is to show that the Algorithm 2 could be used to compute the maximum flow (which takes a value of 12 in this example) between the source node and the target node in the undirected graph presented in Fig. 3. Each edge of this graph has a unit cost and affinity. Note that, in practice, because the RSP model assumes a unit input flow, all capacities are scaled10 so that the value of the maximum flow after scaling lies between and . The maximum flow of is then obtained by the reverse transformation.

To this end, we add a directed edge from to with infinite capacity and an edge cost of 10 to the original graph. This new edge introduces a “shortcut” allowing the passage of all the overflow of the graph, at the price of a higher cost. Theoretically, our algorithm will avoid going trough this shortcut as much as possible because it has a high cost compared to the other edges in the graph. Therefore, it should try to maximize the flow that travels through the original network before using this shortcut edge.

Fig. 3 shows the evolution of the flow between the nodes and (the flow through the original graph), provided by Algorithm 2, and thus satisfying the capacity constraints, in function of the value of the parameter. As observed in Fig. 3, this flow through the original network reaches almost exactly the maximum flow value of 12 for all larger than 1. However, when is low (close to zero), the flows become more and more random, without considering costs any more (see Eq. 2). For that reason, part of the total flow goes through the shortcut, even if this is not optimal in terms of expected cost. This explains the reduction of the flow through the original network when is close to zero.

### 6.2 Scalability experiment

In this subsection, we study the scalability of the Algorithm 1, which computes the full net flow randomized shortest paths dissimilarity matrix, through a small experiment. To evaluate the time-complexity, we generate 120 graphs with the benchmark algorithm of Lancichinetti, Fortunato and Radicchi (LFR) [lancichinetti-2008]. More precisely, we create 10 graphs for each of the following sizes of {50, 100, 150, 200, 250, 300, 500, 1000, 1500, 2000, 2500, 3000} nodes. Moreover, for each size, among the 10 graphs, we changed the mixing parameter value () each 2 graphs between all theses values {0.1, 0.2, 0.3, 0.4, 0.5}.

We then run the Algorithm 1 on each of theses graphs and report the average computation time in seconds over the 10 graphs for each different size (in terms of number of nodes and number of computed distances) on the Fig. 4. All results were obtained with Matlab (version R2017a) running on an Intel Core i7-8750H CPU@4.10GHz and 32 GB of RAM.

As previously mentioned, the time-complexity of this algorithm is with being the number of nodes and being the number of edges. Clearly, although applicable on medium-size graphs, it can be observed that the algorithm does not scale well on large graphs in its present form.

### 6.3 Nodes clustering experiments

In this subsection, we present an application of the Net Flow Randomized Shortest Paths (nRSP) in a graph nodes clustering context. Note that a methodology very close to [Sommer-2016] is used – see this paper for details.

#### Experimental Setup

Baselines As part of the node clustering experiment, four dissimilarity matrices between nodes as well as five kernels on a graph will be used as baselines to assess our nRSP method.

Baseline dissimilarities

• The Free Energy distance (FE) and the Randomized Shortest Paths Dissimilarity (RSP) depending on an inverse temperature parameter . Already presented earlier, these methods have been shown to perform well in a node clustering context [Sommer-2016] as well as in semi-supervised classification tasks [Francoisse-2017].

• The Logarithmic Forest distance (LF). Introduced in [Chebotarev-2011], it relies on the matrix forest theorem [Chebotarev-1997] and defines a family of distances interpolating (up to a scaling factor) between the shortest-path distance and the resistance distance [Klein-1993], depending on a parameter .

• The Shortest Path distance (SP). The distance corresponds to the cost along the least cost path between two nodes and , derived from the cost matrix .

These dissimilarity matrices are transformed into inner products (kernel matrices) by classical multidimensional scaling (see later).

Baseline kernels on a graph

• The Neumann kernel [Scholkopf-2002] (Katz), initially proposed in [Katz-1953] as a method of computing similarities, it is defined as . The parameter has to be chosen positive and smaller than the inverse of the spectral radium of , .

• The Logarithmic Communicability kernel (lCom) proposed in [ivashkin-2016] as the logarithmic version of the exponential diffusion kernel [Kondor-2002], and also called the communicability measure [Estrada-2008], , where is the matrix exponential.

• The Sigmoid Commute Time kernel (SCT). Proposed in [Yen-2007], it is obtained by applying a sigmoid transform [Scholkopf-2002] on the commute time kernel [FoussKDE-2005]. An alternative version, the Sigmoid corrected Commute Time kernel (SCCT), based on the correction of the commute time suggested in [vonLuxburg-2010], is also used as part of the experiments. For both methods, the parameter controls the sharpness of the sigmoid.

In addition, the Modularity matrix is used as last baseline (Q), and is computed by where contains the node degrees and the constant is the volume of the graph (see, e.g., [Newman-2018]). Recall that modularity is a measure of the quality of a partition of the nodes (a set of communities). Here, the kernel -means is executed directly on matrix .

Datasets A collection of 17 datasets is investigated for the experimental comparisons of the dissimilarity measures. For each dataset, costs are computed as the reciprocal of affinities, , like in electrical networks. The collection includes Zachary’s karate club [Zachary1977], the Dolphin datasets [Lusseau-2003-emergent, Lusseau-2003-bottlenose], the Football dataset [Newman2002], the Political books11, three LFR benchmarks [lancichinetti-2008] and 9 Newsgroup datasets [Lang-1995, Yen-2009]. The list of datasets along with their main characteristics are presented in Table 1.