Efficient-robust routing for single commodity network flowsThis project was supported by AFOSR grants (FA9550-15-1-0045 and FA9550-17-1-0435), grants from the National Center for Research Resources (P41-RR-013218) and the National Institute of Biomedical Imaging and Bioengineering (P41-EB-015902), National Science Foundation (ECCS-1509387), by the University of Padova Research Project CPDA 140897 and a postdoctoral fellowship through Memorial Sloan Kettering Cancer Center.

# Efficient-robust routing for single commodity network flows††thanks: This project was supported by AFOSR grants (FA9550-15-1-0045 and FA9550-17-1-0435), grants from the National Center for Research Resources (P41-RR-013218) and the National Institute of Biomedical Imaging and Bioengineering (P41-EB-015902), National Science Foundation (ECCS-1509387), by the University of Padova Research Project CPDA 140897 and a postdoctoral fellowship through Memorial Sloan Kettering Cancer Center.

Yongxin Chen, Tryphon T. Georgiou, Fellow IEEE, Michele Pavon, and Allen Tannenbaum, Fellow IEEE Y. Chen is with the Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY; email: chen2468@umn.eduT. T. Georgiou is with the Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA; email: tryphon@uci.eduM. Pavon is with the Dipartimento di Matematica “Tullio Levi Civita”, Università di Padova, 35121 Padova, Italy; email: pavon@math.unipd.itA. Tannenbaum is with the Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, NY; email: allen.tannenbaum@stonybrook.edu
###### Abstract

We study single commodity network flows with suitable robustness and efficiency specs. An original use of a maximum entropy problem for distributions on the paths of the graph turns this problem into a steering problem for Markov chains with prescribed initial and final marginals. From a computational standpoint, viewing scheduling this way is especially attractive in light of the existence of an iterative algorithm to compute the solution. The present paper builds on [13] by introducing an index of efficiency of a transportation plan and points, accordingly, to efficient-robust transport policies. In developing the theory, we establish two new invariance properties of the solution (called bridge) – an iterated bridge invariance property and the invariance of the most probable paths. These properties, which were tangentially mentioned in our previous work, are fully developed here. We also show that the distribution on paths of the optimal transport policy, which depends on a “temperature” parameter, tends to the solution of the “most economical” but possibly less robust optimal mass transport problem as the temperature goes to zero. The relevance of all of these properties for transport over networks is illustrated in an example.

Index Terms— Transport over networks, maximum entropy problem, most probable path, temperature parameter

## I Introduction

Consider a company owning a factory and a warehouse . The company wants to ship a certain quantity of goods from so that they reach in at most time units. The flow must occur on the available road network connecting the two facilities. On the one hand, it is desirable that the transport plan utilizes as many different routes as possible so that most of the goods arrive within the prescribed time horizon even in the presence of road congestion, roadwork, etc. On the other hand, it is also important that shorter paths are used to keep the vehicles fuel consumption within a budgetary constraint.

In this paper, continuing the research initiated in [13], we provide a precise mathematical formulation of the above single commodity network flow problem. Normalizing the mass of goods to one, we formulate a maximum entropy problem for Markovian distributions on the paths of the network. The optimal feedback control suitably modifies a prior transition mechanism thereby achieving robustness while limiting the cost. This is accomplished through an appropriate choice of the prior transition involving the adjacency matrix of the graph. The optimal scheduling, while spreading the mass over all feasible paths, assigns maximum probability to all minimum cost paths.

Differently from the standard literature on controlled Markov chains, the optimal policy (Schrödinger bridge) is not computed through dynamic programming. The constraint on the final marginal (all the goods should be in the warehouse by day ) dictates a different approach. The solution is computed by solving iteratively a Schrödinger-Bernstein linear system with nonlinear coupling at the initial and final times. This algorithm, whose convergence was established in [22], is related to recent work in entropy regularization [16] and equilibrium assignement in economics [23] as well as to classical work in statistics [26].

Our straightforward approach also avoids altogether modelling cascading failures which is a complex and controversial task [42]. It is also worthwhile remarking that maximum entropy problems [14] which constitute a powerful inference method, find here an alternative use as a tool to produce a desired flow in a network by exploiting the properties of the prior transition mechanism.

Our intuitive notion of robustness of the routing policy should not be confused with other notions of robustness concerning networks which have been put forward and studied, see e.g. [1, 5, 6, 7, 19, 42]. In particular, in [7, 19], robustness has been defined through a fluctuation-dissipation relation involving the entropy rate. This latter notion captures relaxation of a process back to equilibrium after a perturbation and has been used to study both financial and biological networks [40, 41]. Our study, inspired by transportation and data networks, does not concern equilibrium or near equilibrium cases.

This paper features the following novel contributions: a) it introduces an explicit index of efficiency of a transportation plan; b) The choice of the adjacency matrix as prior transition mechanism, which was justified in [13] on an intuitive basis, is here further motivated trough a specific optimization problem; c) we derive an iterated bridge invariance property; d) we establish the invariance of the most probable paths. These two invariance properties, which were only briefly mentioned in [13] in some special cases, are here fully investigated. Their relevance for transport over networks is also illustrated. e) we study the dependence of the optimal transport on a temperature parameter. The possibility of employing the solution for near-zero temperature as an approximation of the solution to Optimal Mass Transport (OMT) is also discussed and illustrated through examples.

The outline of the paper is as follows. In Section II we introduce generalized maximum entropy problems In Section II-A we establish the iterated bridge property, and in Section II-B the invariance of the most probable paths. Efficiency of a transport policy is introduced in Section III-A. In Section III-B, we introduce robust transport with fixed average path length. Section IV deals with efficient-robust transportation. In Section V, the dependence of the optimal transport on the temperature parameter is investigated. The results are then illustrated through academic examples in Section VI.

## Ii Generalized maximum entropy problems

We are given a directed, strongly connected (i.e., with a path in each direction between each pair of vertices), aperiodic graph with vertex set and edge set . We let time vary in , and let denote the family of length , feasible paths , namely paths such that for .

We seek a probability distribution on with prescribed initial and final marginal probability distributions and , respectively, and such that the resulting random evolution is closest to a “prior” measure on in a suitable sense. The prior law is induced by the Markovian evolution

 μt+1(xt+1)=∑xt∈Xμt(xt)mxtxt+1(t) (1)

with nonnegative distributions over , , and weights for all indices and all times. Moreover, to respect the topology of the graph, for all whenever . Often, but not always, the matrix

 M(t)=[mij(t)]ni,j=1 (2)

does not depend on . The rows of the transition matrix do not necessarily sum up to one, so that the “total transported mass” is not necessarily preserved. It occurs, for instance, when simply encodes the topological structure of the network with being zero or one, depending on whether a certain link exists. The evolution (1) together with the measure , which we assume positive on , i.e.,

 μ0(x)>0 for all x∈X, (3)

induces a measure on as follows. It assigns to a path the value

 M(x0,x1,…,xN)=μ0(x0)mx0x1⋯mxN−1xN, (4)

and gives rise to a flow of one-time marginals

 μt(xt)=∑xℓ≠tM(x0,x1,…,xN),t∈T.
###### Definition 1

We denote by the family of probability distributions on having the prescribed marginals and .

We seek a distribution in this set which is closest to the prior in relative entropy where, for and measures on , the relative entropy (divergence, Kullback-Leibler index) is

 D(P∥Q):=⎧⎨⎩∑xP(x)logP(x)Q(x),Supp(P)⊆Supp(Q),+∞,Supp(P)⊈Supp(Q),

Here, by definition, . Naturally, while the value of may turn out negative due to miss-match of scaling (in case is not a probability measure), the relative entropy is always jointly convex. We consider the Schrödinger Bridge Problem (SBP):

###### Problem 1

Determine

 M∗[ν0,νN]:=argmin{D(P∥M)∣P∈P(ν0,νN)}. (5)

The following result is a slight generalization (to time inhomogeneous prior) of [13, Theorem 2.3].

###### Theorem 1

Assume that the product has all entries positive. Then there exist nonnegative functions and on satisfying

 φ(t,i) = ∑jmij(t)φ(t+1,j), (6a) ^φ(t+1,j) = ∑imij(t)^φ(t,i), (6b) for t∈[0,N−1], along with the (nonlinear) boundary conditions φ(0,x0)^φ(0,x0) = ν0(x0) (6c) φ(N,xN)^φ(N,xN) = νN(xN), (6d)

for . Moreover, the solution to Problem 1 is unique and obtained by

 M∗(x0,…,xN)=ν0(x0)πx0x1(0)⋯πxN−1xN(N−1),

where the one-step transition probabilities

 πij(t):=mij(t)φ(t+1,j)φ(t,i) (7)

are well defined.

The factors and are unique up to multiplication of by a positive constant and division of by the same constant. Let and denote the column vectors with components and , respectively, with . In matricial form, (6a), (6b ) and (7) read

 φ(t)=M(t)φ(t+1),  ^φ(t+1)=M(t)T^φ(t), (8)

and

 Π(t):=[πij(t)]=diag(φ(t))−1M(t)diag(φ(t+1)). (9)

Historically, the SBP was posed in 1931 by Erwin Schrödinger for Brownian particles with a large deviations of the empirical distribution motivation [43], see [30] for a survey. The problem was considered in the context of Markov chains and studied in [36, 22], and some generalizations have been discussed in [13]. Important connections between SBP and OMT [45, 3, 46] have been discovered and developed in [32, 33, 29, 30, 10, 11, 12].

### Ii-a Iterated Bridges

In this section we explain a rather interesting property of Schrödinger bridges which is the following. If, after solving an SBP for a given set of marginals and a Markovian prior to obtain , we decided to update the data to another set of marginals then, whether we use as prior or for the SBP with the new marginals and , we obtain precisely the same solution . The significance of this property will be discussed later on in the context of robust transportation.

Indeed, take as prior and consider the corresponding new Schrödinger system (in matrix form)

 ψ(t)=Π(t)ψ(t+1),^ψ(t+1)=Π(t)T^ψ(t),

with boundary conditions

 ψ(0,x0)^ψ(0,x0) = π0(x0), (10a) ψ(N,xN)^ψ(N,xN) = πN(xN). (10b)

Note in the above , therefore, it can be written as

 diag(φ(t))ψ(t)=M(t)diag(φ(t+1))ψ(t+1), (11a) diag(φ(t+1))−1^ψ(t+1)=M(t)Tdiagφ(t))−1^ψ(t). (11b)

The new transition matrix is given by

 Q∗(t) =diag(ψ(t))−1Π(t)diag(ψ(t+1)) =diag(ψ(t))−1diag(φ(t))−1 ×M(t)diag(φ(t+1))diag(ψ(t+1)).

Let and , then

 Q∗(t)=diag(ψ1(t))−1M(t)diag(ψ1(t+1)).

By (11), and are vectors with positive components satisfying

 ψ1(t)=M(t)ψ1(t+1),  ^ψ1(t+1)=M(t)T^ψ1(t).

Moreover, they satisfy the boundary conditions

 ψ1(0,x0)^ψ1(0,x0) = π0(x0) (12a) ψ1(N,xN)^ψ1(N,xN) = πN(xN). (12b)

Thus, provide the solution to Problem 1 when is taken as prior.

Alternatively, observe the transition matrix resulting from the two problems is the same and so is the initial marginal. Hence, the solutions of the SBP with marginals and and prior transitions and are identical.

Thus, “the bridge over a bridge over a prior” is the same as the “bridge over the prior,” i.e., iterated bridges produce the same result. It is should be observed that this result for probability distributions is not surprising since the solution is in the same reciprocal class as the prior (namely, it has the same three times transition probability), cf. [27, 31, 49]. It could then be described as the fact that only the reciprocal class of the prior matters; this is can be seen from Schrödinger’s original construction [43], and also [22, Section III-B] for the case of Markov chains. This result, however, is more general since the prior is not necessarily a probability measure.

In information theoretic terms, the bridge (i.e., probability law on path spaces) corresponding to is the I-projection in the sense of Cziszar [15] of the prior onto the set of measures that are consistent with the initial-final marginals. The above result, however, is not simply an “iterated information-projection” property, since is the I-projection of onto which does not contain being in fact disjoint from it.

### Ii-B Invariance of most probable paths

Building on the logarithmic transformation of Fleming, Holland, Mitter and others, the connection between SBP and stochastic control was developed from the early nineties on [17, 8, 18, 37]. More recently Brockett studied steering of the Louiville equation [9]. In [17, Section 5], Dai Pra established an interesting path-space property of the Schrödinger bridge for diffusion processes, that the “most probable path” [20, 44] of the prior and the solution are the same. Loosely speaking, a most probable path is similar to a mode for the path space measure . More precisely, if both drift and diffusion coefficient of the Markov diffusion process

 dXt=b(Xt,t)dt+σ(Xt,t)dWt

are smooth and bounded, with , , and is a path of class , then there exists an asymptotic estimate of the probability of a small tube around of radius . It follows from this estimate that the most probable path is the minimizer in a deterministic calculus of variations problem where the Lagrangian is an Onsager-Machlup functional, see [25, p. 532] for the full story111The Onsager-Machlup functional was introduced in [34] to develop a theory of fluctuations in equilibrium and nonequilibrium thermodynamics. .

The concept of most probable path is, of course, much less delicate in our discrete setting. We define it for general positive measures on paths. Given a positive measure as in Section II on the feasible paths of our graph , we say that is of maximal mass if for all other feasible paths we have . Likewise we consider paths of maximal mass connecting particular nodes. It is apparent that paths of maximal mass always exist but are, in general, not unique. If is a probability measure, then the maximal mass paths - most probable paths are simply the modes of the distribution. We establish below that the maximal mass paths joining two given nodes under the solution of a Schrödinger Bridge problem as in Section II are the same as for the prior measure.

###### Proposition 1

Consider marginals and in Problem 1. Assume that on all nodes and that the product of transition probability matrices of the prior has all positive elements (cf. with ’s as in (2)). Let and be any two nodes. Then, under the solution of the SBP, the family of maximal mass paths joining and in steps is the same as under the prior measure .

Proof. Suppose path has maximal mass under the prior . In view of (4) and (7) and assumption (3), we have

 M∗[ν0,νN](y) = ν0(y0)πy0y1(0)⋯πyN−1yN(N−1) = ν0(x0)μ0(x0)φ(N,xN)φ(0,x0)M(y0,y1,…,yN).

Since the quantity

 ν0(x0)μ0(x0)φ(N,xN)φ(0,x0)

is positive and does not depend on the particular path joining and , the conclusion follows.

The calculation in the above proof actually establishes the following stronger result.

###### Proposition 2

Let and be any two nodes in . Then, under the assumptions of Proposition 1, the measures and , restricted on the set of paths that begin at at time and end at at time , are identical.

## Iii Robust transport

In this section, we first discuss notions of efficiency of a transportation plan and then introduce entropy as a surrogate for robustness.

### Iii-a Efficiency of a transport plan

Inspired by the celebrated paper [48], we introduce below a measure of efficiency of a transportation plan over a certain finite-time horizon and a given network.

For the case of undirected and connected graphs, small-world networks [48] were identified as networks being highly clustered but with small characteristic path length , where

 L:=1n(n−1)∑i≠jdij

and is the shortest path length between vertices and . The inverse of the characteristic path length is an index of efficiency of . There are other such indexes, most noticeably the global efficiency introduced in [28]. This is defined as where

 E(G)=1n(n−1)∑i≠j1dij

and is the complete network with all possible edges in place. Thus, . However, as argued on [28, p. 198701-2], it is which “measures the efficiency of a sequential system (i.e., only one packet of information goes along the network)”. , instead, measures the efficiency of a parallel system, namely one in which all nodes concurrently exchange packets of information. Since we are interested in the efficiency of a specific transportation plan, we define below efficiency by a suitable adaptation of the index .

Consider a strongly connected, aperiodic, directed graph as in Section II. To each edge is now associated a length . If , we set . The length may represent distance, cost of transport/communication/etc. Let be the time-indexing set. For a path , we define the length of to be

 l(x)=N−1∑t=0lxtxt+1.

We consider the situation where initially at time the mass is distributed on according to and needs to be distributed according to at the final time . These masses are normalized to sum to one, so that they are probability distributions. A transportation plan is a probability measure on the (feasible) paths of the network having the prescribed marginals and at the initial and final time, respectively. A natural adaptation of the characteristic path length is to consider the average path length of the transportation plan , which we define as

 L(P)=∑x∈XN+1l(x)P(x) (13)

with the usual convention . This is entirely analogous to a thermodynamic quantity, the internal energy, which is defined as the expected value of the Hamiltonian observable in state . Clearly, is finite if and only if the transport takes place on actual, existing links of . Moreover, only the paths which are in the support of enter in the computation of . One of the goals of a transportation plan is of course to have small average path length since, for instance, cost might simply be proportional to length. Determining the probability measure that minimizes (13) can be seen to be an OMT problem.

### Iii-B Problem formulation

Besides efficiency, another desirable property of a transport strategy is to ensure robustness with respect to links/nodes failures, the latter being due possibly to malicious attacks. We therefore seek a transport plan in which the mass spreads, as much as it is allowed by the network topology, before reconvening at time in the sink nodes. We achieve this by selecting a transportation plan that has a suitably high entropy , where

 S(P)=−∑x∈XN+1P(x)lnP(x). (14)

Thus, in order to attain a level of robustness while guaranteeing a relatively low average path length (cost), we formulate below a constrained optimization problem that weighs in both as well as .

We begin by letting designate a suitable bound on the average path length (cost) that we are willing accept. Clearly, we need that

 lm:=minx∈XN+1l(x)≤¯L. (15a) We will also assume that ¯L≤1|FPN0|∑x∈FPN0l(x). (15b)

The rationale behind the latter, i.e., requiring an upper bound as stated, will be explained in Proposition 3 below.

Let denote the family of probability measures on . The probability measure that maximizes the entropy subject to a path-length constraint is the Boltzmann distribution

 P∗T(x)=Z(T)−1exp[−l(x)T],Z(T)=∑xexp[−l(x)T], (16)

where the parameter (temperature) depends on . To see this, consider the Lagrangian

 L(P,λ):=S(P)+λ(¯L−L(P)), (17)

and observe that the Boltzman distribution (16) satisfies the first order optimality condition of with . Clearly, the Boltzmann distribution has support on the feasible paths . Hence, we get a version of Gibbs’ variational principle that the Boltzmann distribution minimizes the free energy functional

 F(P,T):=L(P)−TS(P) (18)

over . An alternative way to establish the minimizing property of the Boltzmann’s distribution is to observe that

 F(P,T)=TD(P∥P∗T)−TlogZ, (19)

and therefore, minimizing the free energy over is equivalent to minimizing the relative entropy over , which ensures that the minimum is unique. The following properties of are noted, see e.g. [35, Chapter 2].

###### Proposition 3

The following hold:

• For , tends to the uniform distribution on all feasible paths.

• For , tends to concentrate on the set of feasible paths having minimal length.

• Assuming that is not constant over then, for each value satisfying the bounds (15), there exists a unique nonnegative value of such that maximizes subject to .

We also observe the Markovian nature of the measure . Indeed, recall that a positive measure on is Markovian if it can be expressed as in (4). Since

 P∗T(x0,x1,…,xN)=Z(T)−1N−1∏t=0exp[−lxtxt+1T], (20)

which is exactly in the form (4), we conclude that is (time-homogeneous) Markovian with uniform initial measure and time-invariant transition matrix given by

 MT=[exp(−lijT)]ni,j=1. (21)

Observe however that, in general, is not stochastic (rows do not sum to one). Moreover, observe that, after suitable normalization, represents the transition matrix of a chain where probabilities of transition between nodes are inversely proportional to the length of the links.

Consider now and distributions on . These are the “starting” and “ending” concentrations of resources for which we seek a transportation plan. We denote by the family of probability distributions on paths having and as initial and final marginals, respectively, and we consider the problem to maximize the entropy subject to marginal and length constraints:

###### Problem 2

Maximize subject to and .

Note that the solution to Problem 2 depends on as well as the two marginals and that when is too close to , the problem may be infeasible.

Once again, bringing in the Lagrangian (17), which now needs to be minimized over , we see that Problem 2 is equivalent to solving the following Schrödinger Bridge problem for a suitable value of the parameter .

###### Problem 3

minimize.

Thus, employing path space entropy as a measure of robustness, the solution to Problem 3, denoted by and constructed in accordance with Theorem 1, minimizes a suitable free energy functional with the temperature parameter specifying the tradeoff between efficiency and robustness. Thus, Problem 3 can be viewed as an SBP as in Section II where the “prior” measure is Markovian.

## Iv Structure of robust transport

We now address in detail Problem 3, namely, to identify a probability distribution on that minimizes over where is the Boltzmann distribution (20)–the minimizing law being denoted by as before. We show below that the two invariant properties discussed in the previous two sessions can be used to determine an optimal transport policy. We also show that the inherits from the Boltzmann distribution properties as dictated by Proposition 3.

Initially, for simplicity, we consider the situation where at time the whole mass is concentrated on node (source) and at time it is concentrated on node (sink), i.e., and . We want to allow (part of) the mass to reach the end-point “sink” node, if this is possible, in less than steps and then remain there until . In order to ensure that is possible, we assume that there exists a self-loop at node , i.e., . Clearly, . The Schrödinger bridge theory provides transition probabilities so that, for a path ,

 M∗T(δ1,δn)(y) =δ1(y0)N−1∏t=1exp(−lytyt+1T)φT(t+1,yt+1)φT(t,yt) =δ1(y0)φT(N,yN)φT(0,y0)[exp(−l(y)T)], (22)

cf. (4) and (7). Here is the length of path and satisfies together with the Schrödinger system (6) with and .

In [13, Section VI], Problem 3 was first studied with a prior measure having certain special properties. To introduce this particular measure, we first recall (part of) a fundamental result from linear algebra [24].

###### Theorem 2 (Perron-Frobenius)

Let be an matrix with nonnegative entries. Suppose there exists such that has only positive entries, and let be its spectral radius. Then

1. is an eigenvalue of ;

2. is a simple eigenvalue;

3. there exists an eigenvector corresponding to with strictly positive entries.

Consider now the weighted adjacency matrix in (21) (where we dropped the subscript as it will be fixed throughout this section). Assume that has all positive elements so that we can apply the Perron-Frobenius theorem. Let and be the left and right eigenvectors with positive components of the matrix corresponding to the spectral radius . We have

 BTu=λBu,Bv=λBv. (23)

We assume throughout that and are chosen so that . Then, for and , define

 Ml(i,y1,…,x0t−1,j):=λ−tBuivje−∑t−1k=0lykyk+1. (24)

The corresponding transition matrix is

 Rl=λ−1Bdiag(v)−1Bdiag(v). (25)

It admits the invariant measure

 μl(i)=uivi. (26)

Note that and the Boltzmann distribution have the same transition matrix but different initial distributions. In [13], to which we refer for motivation and more details, the following problem was studied.

###### Problem 4

minimize.

Under the assumption that has all positive entries, this Schrödinger bridge problem has a unique solution . In [13, Theorem 3.4], it was also shown that is itself the solution of a Schrödinger bridge problem with equal marginals and the Boltzmann distribution (16) as prior. Thus, by the iterated bridge property of Section II-A, coincides with the solution of Problem 3 for any choice of the initial-final marginals and .

We recall the following rather surprising result [13, Theorem 6.1] which includes the invariance of the most probable paths in Problem 3 (Proposition 1).

###### Theorem 3

gives equal probability to paths of equal length between any two given nodes. In particular, it assigns maximum and equal probability to minimum length paths.

This result is relevant when the solution of Problem 3 for low temperature is used as an approximation to OMT, see Remark 1 in the next section. Finally, an important special case occurs when for existing links and for non-existing. Then the matrix reduces to the unweighted adjacency matrix and the measure to the so-called Ruelle-Bowen random walk . The only concern in the transport policy is in maximizing path family entropy to achieve robustness, see [13, Sections 4 and 5] for details.

## V Dependence of robust transport on T

Below we study how the solution to Problem 3 varies with the temperature parameter . Here, , are specified nodes where mass is concentrated at the start and end times, and when and zero otherwise. It should be noted that similar results hold for general marginal distributions as well, which are not necessarily Dirac.

###### Theorem 4

Consider the solution to Problem 3 with and . Let , i.e., the minimum length of -step paths originating in and terminating in . Then

• For , tends to concentrate itself on the set of feasible, minimum length paths joining and in steps. Namely, if is such that , then as .

• For , tends to the uniform distribution on all feasible paths joining and in steps.

• Suppose is not a singleton and that is not constant over it. Then, for each value satisfying the bounds

 lm(x0,xN)≤¯L≤1|XN+1(x0,xN)|∑y∈XN+1(x0,xN)l(y)

there exists a unique value of such that satisfies the constraint and therefore solves Problem 2.

Proof. Observe first that, since is a probability measure on , it must satisfy by (22)

 1=∑y∈XN+1M∗T(y) =∑y∈XN+1δ1(y0)φT(N,yN)φT(0,y0)[exp(−l(y)T)] =∑y∈XN+1(x0,xN)δ1(y0)φT(N,xN)φT(0,x0)[exp(−l(y)T)], (27)

where we have used the fact that the initial and final marginals of are and , respectively. It follows that

 φT(0,x0)φT(N,xN) =∑y∈XN+1(x0,xN)δ1(y0)[exp(−l(y)T)] =∑y∈XN+1(x0,xN)[exp(−l(y)T)], (28)

where again denotes the family of paths joining and in time periods.

Proof of i): Let be such that . Then

 M∗T(y)=φT(N,xN)φT(0,x0)[exp(−l(y)T)].

By (28), we have . Hence,

 M∗T(y) =φT(N,xN)φT(0,x0)e−l(y)T≤e−l(y)−lm(x0,xN)T.

Since , the right-hand side tends to zero as .

Proof of ii): For , tends to for all paths . Since does not depend on the specific path in (it is just a normalization like the partition function), we conclude that as tends to infinity, tends to the uniform distribution on .

Proof of iii): Note that Problem 2 is feasible when holds. By standard Lagrangian duality theory, there exists a Lagrangian multiplier such that the maximizer of the corresponding Lagrangian (17) over is the solution of Problem 2222Actually, using (28), it is easy to see that is a strictly increasing function of . Indeed, where and is the Markov chain. In view of Points and , we conclude that bijectively maps onto . On the other hand, maximizing (17) over is equivalent to solving Problem 3 with . This completes the proof.

###### Remark 1

Let us interpret as the cost of transporting a unit mass over the link . Then is the expected cost corresponding to the transport plan . For , the free energy functional reduces to as our problem amounts to a discrete OMT problem [38]. In this, one seeks minimum cost paths –a combinatorial problem which can also be formulated as a linear programming problem [4]. Precisely as in the diffusion case [10, 11, 12], we also see that when the “heat bath” temperature is close to , the solution of the Schrödinger bridge problem is close to the solution of the discrete OMT problem (claim i) of Theorem 4). Since for the former an efficient iterative algorithm is available [22], we see that also in this discrete setting the SBP provides a valuable computational approach to solving OMT problems. This is illustrated in the next section through an academic example. It should also be observed that the measure is just a “Boltzmann measure” on the subset of of paths originating in and terminating in . Thus the above proof is analogous to the classical one for .

## Vi Examples

Consider the graph in Figure 1. We seek to transport a unit mass from node to node in and steps. We first consider the case where the costs of all the edges are equal to . Here we add a zero cost self-loop at , i.e., . The shortest path from node to is of length and there are three such paths, which are , and . If we want to transport the mass with a minimum number of steps, we may end up using one of these three paths. To achieve robustness, we apply the Schrödinger bridge framework. Since all the three feasible paths have equal length, we get a transport plan with equal probabilities using all these three paths, regardless of the choice of temperature . The evolution of mass distribution is given by

 ⎡⎢ ⎢ ⎢ ⎢⎣10000000001/31/31/3000000000001/32/30000