# Vector-Valued Optimal Mass Transport

## Abstract

We introduce the problem of transporting vector-valued distributions. In this, a salient feature is that mass may flow between vectorial entries as well as across space (discrete or continuous). The theory relies on a first step taken to define an appropriate notion of optimal transport on a graph. The corresponding distance between distributions is readily computable via convex optimization and provides a suitable generalization of Wasserstein-type metrics. Building on this, we define Wasserstein-type metrics on vector-valued distributions supported on continuous spaces as well as graphs. Motivation for developing vector-valued mass transport is provided by applications such as color image processing, multi-modality imaging, polarimetric radar, as well as network problems where resources may be vectorial.

## 1 Introduction

The theory of Monge-Kantorovich optimal mass transport theory has witnessed a fast pace of new developments; see [1, 2] for extensive lists of references. These contributions were driven by a multitude of applications in physics, geosciences, economics, and probability. Some of the notable advances include the concept of displacement interpolation [3], links to the geometry of spaces [4, 5, 6, 7], and a fluid dynamic reformulation [8]. In our own work, image analysis and spectral analysis of time series provided starting points (e.g., [9, 10]) and, more recently, problems in stochastic control, quantum information, and matrix-valued distributions [11, 12, 13, 14]. The present paper continues the work of [15, 16] by proposing a transportation problem for vector-valued distributions.

A salient feature of vector-valued distributions is the possibility of the transfer of “mass” from one vectorial entry to another. Physical examples include color image scenes where the vectorial distribution captures color intensities, which may continuously shift with lighting conditions. Alternatively, polarimetric data provides an analogous example where mass represents power at different polarizations detected at the locations of a sensor array. As another example, the flow of mass between vector entries may represent mutation of coexisting population species.

The proposed framework may have far-reaching consequences in, for example, combining genomic and proteomic networks, and in general, fusion of vectorial data supported on a graph. While in some of these examples the total mass may not be preserved, in the present work we will restrict our attention to the case where it is. Thus, we seek suitable continuity equation that allows trading off mass between vectorial entries of a distribution, on a continuous or a discrete space (graph), and develop a geometric framework that would allow constructing geodesic flows between snapshots of such distributions.

In order to formulate transport between vectorial entries, we begin with a new notion of transport on weighed undirected graphs in the spirit of Erbar and Maas [17]. A starting point in [17] is to devise a suitable continuity equation for probability measures on the nodes of a weighted graph (Markov chain). The formulation in our paper differs from that in [17], and the corresponding transport problem has the advantage of being reducible to one of convex optimization. Both [17] and our formulation were inspired by the Benamou-Brenier theory [8], where the OMT with quadratic cost is recast as the problem to minimize flow kinetic energy (i.e., an action integral). The present work builds on [15, 16] extending the Wasserstein theory to densities and mass distributions on more general spaces. Having as a first step a Benamou-Brenier theory on graphs, the methodology allows us to define a notion of vector-valued transport and corresponding distance between vector-valued densities on discrete or continuous spaces. As with the (weighted) graph case, the transport distance that we define on vector-valued densities may be reduced to a convex optimization problem.

We now outline the remainder of this note. In Section 2, we sketch needed background from the classical theory of optimal mass transport that motivates our generalization. In Section 3, we describe the proposed Wasserstein-2 metric on an undirected weighted graph. Further, we remark on Wasserstein-1 type of metric on a weighted graph. In Sections 4 and 5, we formulate the new Wasserstein distance on vector-valued densities that are supported, first on the Euclidean spaces and then on graphs. In Section 6, we give several examples illustrating the idea of a vector-valued optimal mass transport, and finally we conclude in Section 7 with an outline of possible applications of the theory and future research directions.

## 2 Preliminaries on optimal mass transport

The mass transport problem was first formulated by Gaspar Monge in 1781, and concerned finding the optimal way, in the sense of minimal transportation cost, of moving a pile of soil from one site to another. This problem was given a modern formulation in the work of Kantorovich in the form of a linear program and it is now known as the Monge–Kantorovich problem. See [18, 19, 1, 2] for all details as well as extensive lists of references.

Herein, we focus mainly the case where the transportation cost is quadratic in the distance. The respective optimization problem

(1) |

for , where denotes Euclidean distance and represents the set of all couplings between two and non-negative probability density functions on (i.e., the set of joint probability distributions having and as respective marginals), defines the so-called Wasserstein- distance between the two densities, or more generally, between measures.

In this case, where the cost is quadratic (i.e., ), the transport problem admits a dynamic reformulation [8] that is especially powerful and the space of densities admits, essentially, a Riemannian structure [6]. The Benamou-Brenier reformulation identifies the Wasserstein-2 distance with the integral of the kinetic energy (action integral) along a geodesic flow that links the two marginals, namely,

(2a) | |||

over all time varying densities and vector fields satisfying the continuity equation and boundary conditions | |||

(2b) | |||

(2c) |

Interestingly, when expressed in terms of density and flux , the minimization problem in (2a) becomes convex while (2b-2c) turn into linear constraints. For the optimal pair , the vector field turns out to be the gradient of a function , hence, “rot-free”. Vector fields of this form can be identified with tangent directions of , i.e., elements of the tangent space

as follows. Under suitable assumptions on differentiability for , and , we solve the Poisson equation

(3) |

to obtain a convex function and thereby the vector field . In this way the space can be endowed with a Riemannian structure (see [6, 2]) via

(4) |

which has the aforementioned kinetic energy interpretation. This inner product induces precisely the Wasserstein distance as geodesic distance between the two marginals in (2c).

###### Remark (Wasserstein-1 metric)

## 3 Wasserstein metric on weighted graphs

Following the Benamou-Brenier viewpoint to Wasserstein distances our first task is to develop an analogous notion of transportation distances on graphs. To this end, we consider a connected, positively weighted, undirected graph with nodes labeled as , with , and edges. We consider the set of probability masses on that we will denote by ; an element may be regarded as a column vector , with for and

We denote the (open) interior of by .

The standard heat equation on ,

where are the graph-Laplacian, incidence, and weight matrices, respectively, can also be written in the more familiar (from calculus)

(5) |

by defining

where

denotes the gradient operator and

denotes its dual. More generally, if we let the entries of represent flux along respective edges, we can express the continuity equation in the form

(6) |

Evidently, the flux gives (5). Also note that since the row vector consisting of all 1’s lies in the left kernel of the incidence matrix, mass is preserved by (6).

To carry out our program, we need to express the flux in the form of a momentum “” as in [8]. However, the flux is supported on the edges of the graph whereas the mass is supported on the set of nodes , the two sets having different dimensions. In order to overcome this difficulty in a natural manner, we choose to associate the flux along an edge with the mass at the source in the two end-points. More specifically, the flux along an edge , with source and sink , consists of two parts. A part that flows out of node , and another that flows in opposite direction out of node . Thus, we define a flux out of and another, out of , and represent the total flux as the superposition , while restricting the rates to be nonnegative. Thus, our continuity equation for rates becomes

(7) |

where denotes entry-wise multiplication of two vectors. The matrix is the portion of the incidence matrix containing ’s (sources), and (sinks). In other words, is the mass at the source of an edge, and is the mass at the sink of an edge. The dependence of the flux in (7) on ensures that the entries of remain positive while the fact that the kernel of contains the vector with all ones ensures that the total mass is preserved as well.

For notational convenience we use and (instead of and ) to denote the starting and ending mass on nodes. We now define the transport distance between and as follows:

It is easy to see that at each time instant, for each , at most one of the and is nonzero. In a similar manner as in the Benamou-Brenier program, (3) can be recast in the form of a convex optimization problem in (momentum) variables ,

It is straightforward to see that the right hand side in (3) is in general positive and vanishes only when . It is also straightforward to see that satisfies the triangle inequality. However, in general, , therefore is only a quasimetric. Yet, it endows with a Finsler metric type structure

for small perturbation , and in this, becomes a length space. In fact, has a very nice “geodesic” property. Indeed, if is the mass distribution as a function of time obtained by solving (3), then

(11) |

for any . Finally, can be extended to , the closure of , by continuity.

Naturally, one can symmetrize in the obvious way, by adopting as our metric

which can then be computed by solving two convex optimization problems. (A similar remark holds for the metrics and defined later on.)

An alternative way to symmetrize is to replace the cost function (3) with

Since the cost terms for and are symmetric, we can combine the two and drop the nonnegativity requirement , to obtain

Positive entries of represent flow from sources to sinks, while negative entries, flow from sinks to sources. This (symmetric) metric induces a Riemannian type structure on , akin to that of standard optimal transport theory on Euclidean spaces [2].

###### Remark

###### Remark (Gradient flow of entropy)

The gradient flow of the entropy functional on probability mass distribution on a graph with respect to is given by

(13) |

where . It represents a nonlinear heat-like equation, to be contrasted with the linear heat equation derived in [17]. To see (13), compute the derivative of the entropy functional along a curve in ,

and observe that the direction of steepest ascent is along from which (13) follows.

###### Remark (Wasserstein-1 distance on graphs)

Following up on Remark, 2 we sketch a Benamou-Brenier type reformulation of -distances on graphs. Assuming that the entries of are edge weights representing the cost of moving a unit mass across, a distance between mass distributions can be defined as the solution of the min-cost flow problem

Alternative, in terms of “test vectors” ,

having the dual

This coincides with (3) by taking , , and . Finally, we point out that the above has an action minimization formulation following [24]:

which assumes the following convex recast

## 4 Vector-valued densities & transport

We now turn to the main theme of our paper, the introduction of a Wasserstein type metric between vector-valued densities. A vector-valued density on , or on a discrete space, may represent a physical entity that can mutate or be transported between alternative manifestations, e.g., power reflected off a surface at different frequencies or polarizations. While the total power may be invariant (under some lighting conditions), the proportion of power at different frequencies or polarization may smoothly vary with viewing angle. As another example consider the case where the entries of represent densities of different species, or particles, and allow for the possibility that mass transfers from one species to another, i.e., between entries of . Thus, in general, we postulate that transport of vector-valued quantities captures flow across space as well as between entries of the density vector. We introduce an OMT-inspired geometry that allows us to express a continuity and quantify transport cost for such vectorial distributions.

We begin by considering a vector-valued density on , i.e., a map from to such that

To avoid proliferation of symbols we denote the set of all vector-valued densities and its interior again by and , respectively. We refer to the entries of as representing density or mass of species/particles that can mutate between one another while maintaining total mass. The dynamics are captured by the following continuity equation:

(17) |

Here is the velocity field of particles and is the transfer rate from to . Equation (17) allows for the possibility to mutate between each pair of entries. More generally, mass transfer may only be permissible between specific types of particles and can be modeled by a graph . Thus, (17) corresponds to the case where is a complete graph with all weights equal to . For general the continuity equation is

(18) |

Note here denote the vector and likewise for .

Given , we formulate the optimal mass transport:

The coefficient specifies the relative cost between transporting mass in space and trading mass between different types of particles. When is large, the solution reduces to independent OMT problems for the different entries to the degree possible. As with , it can be shown that is a quasi-metric in that it satisfies the triangle inequality and positivity, but is not symmetric. Also, has the geodesic property

(20) |

for , assuming is the optimal flow for (4).

## 5 Vector-valued mass transport on graphs

We finally consider vector-valued mass transport on graphs. A vector-valued mass distribution on graph (with nodes and edges) is a -tuple with each being a vector in such that

That is, each entry , for , is a vector with nonnegative -entries representing, e.g., color intensity for the -th color, at the node corresponding to the respective entry. We denote the set of all non-negative vector-valued mass distributions with and its interior with . Combining (7) and (17) we obtain the continuity equation

(23) |

The problem of transporting vector-valued mass on a graph is conceptually simpler as it reduces essentially to a scalar mass situtation. Indeed, we can view the vector-valued mass as a scalar mass distribution on identical layers of the graph where the same nodes at different layers are connected through a graph . The two velocity fields represent mass transfer within the same layer and between different layers, respectively.

Following our earlier program, given two marginal densities , we define their Wasserstein distance as

In the same way as before, the above has a convex reformulation

The same method as in (3) gives rise to a symmetric Riemannian type metric provided by the solution of

## 6 Examples

In this section, we present two examples. The first one is an academic example to illustrate the idea of vector-valued optimal mass transport. In the second example, we apply our framework to color image processing problems.

### 6.1 Interpolation of -D densities

We consider vector-valued densities with two components on the real line (interval ). The two marginal densities and are displayed in Fig 1 with the two colors (red and blue) denoting the two components.

We solve the symmetric vector-valued transport problem (4) for several different values of . As to the numerical implementation, we first discretize the space interval to convert it into a vector-valued transport problems on graphs, which is essentially (3). Then we discretize the time dimension with staggered grids. In particular, we discretize the time interval into subintervals. Then the densities take value at time points while the fluxes take values at time points . We refer the reader to [25] for more details about the convex optimization algorithm used for the examples in this paper.

The results are depicted in Figure 2. As can be seen, for large , the solution tends to have two independent transport plan as the cost of transferring between the two different masses is high. In contrast, when is small, the solution prefers transferring than transporting, since the cost of transferring between masses is low.

### 6.2 Interpolation of color images

As alluded to previously, the different components of a image may stand for different color channels. For instance, a color image can be viewed as a vector-valued density with three components that represent red (R), green (G), blue (B), respectively. Thus, it is straightforward to use vector-valued optimal mass transport to compare and interpolate such color images. Below we explain three representative examples shown in Figures 3 through 8, that highlight the mechanism of vector-valued transport.

First consider the two color images () shown in Figure 3. The intensity in each is a Gaussian distribution centered at a different location. The two distributions are of different color, thereby the corresponding vectorial-valued mass is distributed differently across the three components. The result of interpolating between the two, with , is shown in Figure 4. As we can see from the subplots, the displacement of the mass appears to run at constant speed between to while, at the same time, the color is changing gradually as mass flows between the vectorial components.

Figure 5 shows yet another example of a similar nature. The initial density is centered an it is white, which signifies equally mass distribution across the three color channels/components. The terminal density on the other hand has four separated masses of different color. The dimensions of the images are by . The density flow shown in Figure 6, based on our technique with , smoothly interpolates by dispacing the intensity and color profiles in a seemingly natural manner.

Finally, in Figures 7 and 8 we display the result of interpolating real-life images. The marginal distributions shown in Figure 7 are two photos () of two geothermal basins in Yellowstone Park, where bacterial growth give them distinctly different colors and hues. The result of interpolating the corresponding vector-valued distributions is depicted in Figure 8, taking . The flow of images produces a sequence of natural looking images transitioning from one to the other.

In all the examples, we observe the apparently natural displacement of intensity and color that should be contrasted with potentially undesirable “push-pop” effects of linear interpolation.

## 7 Conclusions and further research

Our early motivation, as noted in the introduction, has been to devise a suitable geometry to study flows of probability or power distribution in problems of signal analysis and fusion of vectorial data. However, the framework, very much as the broader subject of optimal mass transport, has application in a wider range of ideas. In particular, the connection between transport geometry and properties of an underlying space (e.g., curvature in the Bakry-Emery theory) may have important implications here as well. More specifically, we are interested in applying this methodology to studying the robustness of various networks as was done in [26, 27] for biological and financial networks, and [28] for communications networks.

#### Biological networks

The study of cellular networks (e.g., signalling and transcription) has become a major enterprise in systems biology; see [29] and the references therein. One of the key problems is understanding global properties of cellular networks, in order to differentiate a diseased state from a normal cellular state. As is argued in several places [30, 31, 26] network properties may help in formulating systems biological concepts that could lead to novel therapies for a number of diseases including cancer. This would involve integrating genetic, epigenetic, and protein-protein interaction networks.

#### Financial networks

Stock-data and financial transactions provide an insight into the vast globe-wide financial network of human activities. The health of the national and world economy is reflected in the robustness and self-regulatory properties of the markets. Long range correlations are responsible for cascade failures due to financial insolvency. Indeed, multiple exposures of companies is often the root cause of infectious propagation of balance sheet insolvency with catastrophic effects. It is of interest to understand the relation between the various financial parameters (assets, liability, capital) that quantify the stress and the buffer capabilities of financial institutions with the network connectivity and interdependence (weighted network Laplacian) so as to assess risk of cascade failures, fragility, and devise ways to mitigate such effects. See [27] and the references therein.

At closer inspection, many of the aforementioned problem areas involve finer attributes of the studied objects, which may be more suitably treated and studied as vector-valued distributions. Thus, we hope that the present work provides a starting point for such an endeavor.

## 8 Acknowledgements

This project was supported by AFOSR grants (FA9550-15-1-0045 and FA9550-17-1-0435), grants from the National Center for Research Resources (P41- RR-013218) and the National Institute of Biomedical Imaging and Bioengineering (P41-EB-015902), National Science Foundation (NSF), and grants from National Institutes of Health (P30-CA-008748, 1U24CA18092401A1, R01-AG048769).

### References

- S. T. Rachev and L. Rüschendorf, Mass Transportation Problems: Volume I: Theory. Springer, 1998, vol. 1.
- C. Villani, Topics in Optimal Transportation. American Mathematical Soc., 2003, no. 58.
- R. J. McCann, “A convexity principle for interacting gases,” Advances in mathematics, vol. 128, no. 1, pp. 153–179, 1997.
- D. Bakry and M. Émery, “Diffusions hypercontractives, séminaire de probabilités, xix,” Lecture Notes in Math, vol. 1123, pp. 177–206, 1985.
- J. Lott and C. Villani, “Ricci curvature for metric-measure spaces via optimal transport,” Annals of Mathematics, pp. 903–991, 2009.
- F. Otto, “The geometry of dissipative evolution equations: the porous medium equation,” Communications in Partial Differential Equations, 2001.
- M.-K. von Renesse and K.-T. Sturm, “Transport inequalities, gradient estimates, entropy and Ricci curvature,” Communications on pure and applied mathematics, vol. 58, no. 7, pp. 923–940, 2005.
- J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem,” Numerische Mathematik, vol. 84, no. 3, pp. 375–393, 2000.
- T. T. Georgiou, J. Karlsson, and M. S. Takyar, “Metrics for power spectra: an axiomatic approach,” IEEE Transactions on Signal Processing, vol. 57, no. 3, pp. 859–867, 2009.
- S. Haker, L. Zhu, A. Tannenbaum, and S. Angenent, “Optimal mass transport for registration and warping,” International Journal of Computer Vision, vol. 60, no. 3, pp. 225–240, 2004.
- Y. Chen, T. T. Georgiou, and M. Pavon, “On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint,” Journal of Optimization Theory and Applications, vol. 169, no. 2, pp. 671–691, 2016.
- X. Jiang, L. Ning, and T. T. Georgiou, “Distances and riemannian metrics for multivariate spectral densities,” IEEE Transactions on Automatic Control, vol. 57, no. 7, pp. 1723–1735, 2012.
- L. Ning, T. T. Georgiou, and A. Tannenbaum, “On matrix-valued Monge-Kantorovich optimal mass transport,” IEEE transactions on automatic control, vol. 60, no. 2, pp. 373–382, 2015.
- E. Tannenbaum, T. Georgiou, and A. Tannenbaum, “Signals and control aspects of optimal mass transport and the boltzmann entropy,” in Decision and Control (CDC), 2010 49th IEEE Conference on. IEEE, 2010, pp. 1885–1890.
- Y. Chen, W. Gangbo, T. T. Georgiou, and A. Tannenbaum, “On the matrix Monge-Kantorovich problem,” arXiv preprint arXiv:1701.02826, 2017.
- Y. Chen, T. T. Georgiou, and A. Tannenbaum, “Matrix optimal mass transport: a quantum mechanical approach,” arXiv preprint arXiv:1610.03041, 2016.
- M. Erbar and J. Maas, “Ricci curvature of finite Markov chains via convexity of the entropy,” Archive for Rational Mechanics and Analysis, pp. 1–42, 2012.
- L. C. Evans and W. Gangbo, Differential equations methods for the Monge-Kantorovich mass transfer problem. American Mathematical Soc., 1999, vol. 653.
- L. V. Kantorovich, “On a problem of monge,” Journal of Mathematical Sciences, vol. 3, pp. 225–226, 1948.
- W. Li, P. Yin, and S. Osher, “A fast algorithm for unbalanced L1 Monge-Kantorovich problem,” CAM report, 2016.
- S.-N. Chow, W. Huang, Y. Li, and H. Zhou, “Fokker–Planck equations for a free energy functional or Markov process on a graph,” Archive for Rational Mechanics and Analysis, vol. 203, no. 3, pp. 969–1008, 2012.
- J. Solomon, R. Rustamov, L. Guibas, and A. Butscher, “Continuous-flow graph transportation distances,” arXiv preprint arXiv:1603.06927, 2016.
- S.-N. Chow, W. Li, and H. Zhou, “Nonlinear fokker–Planck equations and their asymptotic properties,” arXiv preprint arXiv:1701.04841, 2017.
- C. Léonard, “Lazy random walks and optimal transport on graphs,” The Annals of Probability, vol. 44, no. 3, pp. 1864–1915, 2016.
- Y. Chen, K. Yamamoto, E. Haber, T. T. Georgiou, and A. Tannenbaum, “An efficient algorithm for matrix-valued and vector-valued optimal mass transport,” in preparation, 2017.
- R. Sandhu, T. Georgiou, E. Reznik, L. Zhu, I. Kolesov, Y. Senbabaoglu, and A. Tannenbaum, “Graph curvature for differentiating cancer networks,” Scientific reports, vol. 5, p. 12323, 2015.
- R. S. Sandhu, T. T. Georgiou, and A. R. Tannenbaum, “Ricci curvature: An economic indicator for market fragility and systemic risk,” Science advances, vol. 2, no. 5, p. e1501495, 2016.
- C. Wang, E. Jonckheere, and R. Banirazi, “Wireless network capacity versus Ollivier-Ricci curvature under heat-diffusion (hd) protocol,” in American Control Conference (ACC), 2014. IEEE, 2014, pp. 3536–3541.
- U. Alon, An introduction to systems biology: design principles of biological circuits. CRC press, 2006.
- L. Demetrius and T. Manke, “Robustness and network evolution – an entropic principle,” Physica A: Statistical Mechanics and its Applications, vol. 346, no. 3, pp. 682–696, 2005.
- J. West, G. Bianconi, S. Severini, and A. E. Teschendorff, “Differential network entropy reveals cancer system hallmarks,” Scientific reports, vol. 2, p. 802, 2012.