Wasserstein Barycenters over Riemannian manifolds 111Y.-H.K. is supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants 371642-09 and 2014-05448 as well as Alfred P. Sloan research fellowship 2012-2016. B.P. is pleased to acknowledge the support of a University of Alberta start-up grant and National Sciences and Engineering Research Council of Canada Discovery Grant number 412779-2012. Part of this research was done while Y.-H.K. was visiting Korea Advanced Institute of Science and Technology (KAIST) and while both authors were visiting the Mathematical Sciences Research Institute (MSRI), Berkeley, CA, and the Fields Institute, Toronto, ON.
We study barycenters in the space of probability measures on a Riemannian manifold, equipped with the Wasserstein metric. Under reasonable assumptions, we establish absolute continuity of the barycenter of general measures on Wasserstein space, extending on one hand, results in the Euclidean case (for barycenters between finitely many measures) of Agueh and Carlier [ac] to the Riemannian setting, and on the other hand, results in the Riemannian case of Cordero-Erausquin, McCann, Schmuckenschläger [c-ems] for barycenters between two measures to the multi-marginal setting. Our work also extends these results to the case where is not finitely supported. As applications, we prove versions of Jensen’s inequality on Wasserstein space and a generalized Brunn-Minkowski inequality for a random measurable set on a Riemannian manifold.
- 1 Introduction
- 2 Notation, definitions, and preliminary results
- 3 The Wasserstein barycenter: existence and uniqueness
- 4 Properties of the Wasserstein barycenter: first and second order balance
- 5 Absolute continuity of the Wasserstein barycenter of finitely many measures
- 6 Absolute continuity of the Wasserstein barycenters of general distributions
- 7 Convexity over Wasserstein barycenters
- 8 Curved random Brunn-Minkowski inequality
This paper is devoted to the study of barycenters in Wasserstein space over a Riemannian manfold .
Given a Borel probability measure on a metric space , a barycenter of is defined as a minimizer of ; this definition is chosen in part so that it coincides with the mean, or center of mass, on the Euclidean space . Barycenters have been studied extensively by geometers, and have interesting connections to the underlying geometry of ; for example, their uniqueness is intimately related to sectional curvature.
The case where the metric space is the space of Borel probability measures on a compact Riemannian manifold , equipped with the distance, is of particular interest, as it gives a natural but nonlinear way to interpolate between a distribution of measures. The Wasserstein, or optimal transport, distance between is given by
where is the Riemannian distance and the infimum is taken over all probability measures on whose marginals are and . It is well known that defines a metric on (see, e.g., [ags, V]) and therefore it makes sense to talk about barycenters.
Definition 1 (Wasserstein barycenter measure).
Let be a probability measure on . A Wasserstein barycenter of is a minimizer among probability measures of
Existence and uniqueness (under mild conditions) of Wasserstein barycenters are not difficult to establish (see Section 3). When the measure is supported on two points in , the barycenter measure on is equivalent to McCann’s celebrated displacement interpolant [m]. A key property of dispacement interpolants is that is absolutely continuous with respect to volume if either or is; this fact plays a foundational role in the analysis of convexity type properties of various functionals on the space of absolutely continuous probability measures . The notion of displacement convexity, or convexity of functionals along this interpolation, has been very fruitful; its wealth of applications includes insightful new proofs of geometric and functional inequalities on , and remarkable generalizations of these inequalities to the Riemannian setting; see, e.g., the books [ags, V, V2]. In addition, displacement convexity is a fundamental notion in the synthetic treatment of Ricci curvature developed by Sturm [sturm06][sturm06a] and Lott-Villani [lottvillani]. An important example of a displacement convex functional is Boltzmann’s -functional (or the Shannon entropy functional) on manifolds with nonnegative Ricci curvature.
In the multi-measure setting, with economic applications in mind, Carlier-Ekeland [CE] introduced an interpolation between several probability measures, which includes as a special case Wasserstein barycenters of finitely supported measures ; in fact, their setting is more general, as the distance squared in (1.1) is replaced with a more general cost function. Agueh-Carlier [ac] provided a more extensive treatment of Wasserstein barycenters of finitely supported measures when the underlying space is Euclidean, establishing that the barycenter is absolutely continuous with an density if one of the marginals is, as well as a variety of convexity type inequalities, which one can interpret as Jensen’s type inequalities for discrete measures and displacement convex functionals on . Absolute continuity has also been established for more general interpolations over Euclidean space, as well as barycenters on Hadamard manifolds (simply connected Riemannian manifolds with nonpositive curvature) [P9]. Some of these results have been extended by one of us [P5] to the case where the support of is parameterised by a -dimensional continuum. Let us note that in addition to economics [CE], Wasserstein barycenters in the multi-measure setting have appeared in the literature with applications in image processing [bdpr] and statistics [BK].
In this paper, we consider the Wasserstein barycenters of general measures . In particular, we allow the support of to have cardinality greater than (and possibly be infinite) and the underlying domain to be a general compact Riemannian manifold, without any curvature or topological restrictions. At present, little is known about Wasserstein barycenters in this generality.
Our first main contribution is to establish absolute continuity with respect to volume measure of the Wasserstein barycenter, under reasonable conditions on the marginals: see Theorems 1 and 1. In previous work [ac], [P5], and [P9], regularity results on barycenters are obtained by exploiting either the special geometry of Euclidean space or the uniform convexity of the distance squared function (in the non-positively curved setting). These tools are not available in the general Riemannian setting, and our argument is based instead on approximations. Indeed, we first consider the case of a finitely supported on , and adapt an argument of Figalli-Juillet [FJ] (who studied the two measure case on the Heisenberg group and Alexandrov spaces); this amounts to approximating all but one of the measures by finite sum of Dirac measures, obtaining uniform estimates for the approximating barycenters and passing to the limit. Once absolute continuity of the barycenter is known, estimates on the Jacobian determinants of the optimal maps from to each (expressed in Theorem 6) imply that the norm of the barycenter density is controlled by the densities of the measures (see Theorem 6 for a precise statement). With this uniform control in hand, we are able to treat the general case by approximating a general on by finitely supported measures: see Theorem 1.
Our estimates in Theorem 6 are a result of what we call first and second order balance conditions (Theorem 4), reflecting the fact that the barycenter is a stationary point of the functional , and are expressed in terms of what we call generalized, or barycentric, distortion coefficients. These coefficients, roughly speaking, capture the influence of the barycenter operation on the volume of small sets in , in the same way that the volume distortion coefficients, introduced by Cordero-Erausquin, McCann and Schmuckenschlager[c-ems] capture the effect on the volume of a small set in by interpolation along geodesics with a common endpoint (in fact, the volume distortion coefficients employed there are precisely our generalized distortion coefficients in the case where is supported on two measures).
Using the above results, we are then able to establish certain Wasserstein Jensen’s type inequalities (see, in particular, Theorems 11 and 14) for a wide variety of displacement convex functionals on . In fact, we establish two distinct results of this type; one involves -displacement convex functionals, and can be interpreted as a generalization of [V2, Theorem 17.15]. The other expresses a distorted sort of convexity and involves our generalized, or barycentric, distortion coefficients; this is closer in spirit to the line of research pioneered by Cordero-Erausquin, McCann and Schmuckenschlager[c-ems], and can be interpreted as a generalization of [V2, Theorem 17.37]. We note that geometric versions of Jensen’s inequality (that is, versions formulated in terms of barycenters on metric spaces rather than linear averages) are known for measures on finite dimensional smooth manifolds [EmMo, Proposition 2] and on more general spaces with appropriate sectional curvature bounds (see [sturm] and [kuwae]); these sectional curvature bounds are not satisfied by Wasserstein space [ags]. Before the present paper, a version of Jensen’s inequality on , due to Agueh-Carlier[ac], was known only when the underlying space is Euclidean and the measure on is finitely supported. Of course, the type of distorted convexity in [V2, Theorem 17.37] is peculiar to Wasserstein space, and even the statement of the corresponding Jensen’s inequality (our 14) requires a generalization of the classical volume distortion coefficients in [c-ems], which is formulated here for the first time.
Finally, as an application of the machinery developed in this paper, we offer a random version of the Brunn-Minkowski inequality on a Riemannian manifold: see Theorem 1. The classical Brunn-Minkowski inequality involves the interpolation between two sets in Euclidean space; an extension to Riemannian manifolds, with Ricci curvature playing a key role, can be derived directly from the results in [c-ems]. Our result extends this to interpolations between random sets on . For a finite number of sets, we should note that in Euclidean space this result is easily recoverable using the classical Brunn-Minkowksi and induction. For an infinite collection of sets, the Euclidean version is a pre-existing but nontrivial result, known as Vitale’s random Brunn-Minkowski inequality [vitale]; our work provides a mass transport based proof of it. On the other hand, in the Riemannian case, our result seems to be completely novel, as soon as we interpolate between three or more sets.
Organization of the paper: In the following section, we introduce the notation and terminology we will use throughout the paper and recall a few fundamental results from the literature which we will need. In Section 3, we prove a general existence and uniqueness results for the Wasserstein barycenter. In Section 4, we establish some properties of the Wasserstein barycenter, including two balance conditions on Kantorovich potentials which will be crucial in subsequent sections. Section 5 is devoted to the proof of the absolute continuity of the Wasserstein barycenter when the measure has finite support; this result is then exploited to prove absolute continuity of the barycenter of more general in Section 6. This, in turn, is used in Section 7, where we prove our Wasserstein Jensen’s inequalities. Finally, these results are exploited to establish a random Brunn-Minkowski inequality on curved spaces in Section 8 .
2 Notation, definitions, and preliminary results
In this section, we introduce some notation and terminology which we will use in the rest of the paper, and develop some preliminary results.
2.1 Notation and assumptions
Throughout the rest of the paper, we use the following notation and assumptions:
is a connected, compact -dimensional Riemannian manifold. (The compactness assumption is made primarily to keep the presentation relatively simple. Most of the results in this paper can be established on non compact manifolds under suitable additional hypotheses, such as decay conditions on the measures, etc.)
is the Riemannian distance between two points in .
is the geodesic ball of radius in , centred at .
We will sometimes use the notation
where stands for cost function. A significant property of the function is the following relation:
where denotes the gradient of with respect to the -variable; although the notation is often used to denote the differential of (a covector) rather than its gradient (a vector), we will often identify vectors and covectors using the Riemannian metric.
is the space of probability measures on equipped with the weak-* topology, or, equivalently, metrized with the Wasserstein distance (1.1).
is a probability measure on .
2.2 Borel measurability of the set
Equipped with the distance , the space is a separable metric space. We consider measurable sets with respect to the Borel -algebra.
In this subsection we show that the set of absolutely continuous probability measures on , with respect to the -dimensional Hausdorff measure, or equivalently the Riemannian volume, is Borel measurable. We expect that this is already known to experts, but we include it for completeness.
In Section 3, when we show uniqueness of the Wasserstein barycenter of a given probability measure on , we will need to assume .
Proposition 1 (Measurability of ).
The set , of absolutely continuous probability measures is Borel measurable with respect to the metric topology given by the Wasserstein distance , or equivalently with respect to the weak-* topology (the two topologies are equivalent [V2]).
Note that absolute continuity of a measure (with respect to ) is equivalent to the following property: for every , there is such that for all Borel sets with . This means
where the sets of probability measures are defined as
To show is a Borel set, we will express it as a countable intersection of countable unions of Borel sets (essentially replacing the set in (2.1) with a closed set). For this, first define
and consider the subset defined as
We claim that the set is a closed subset of , with respect to the weak-* topology. To see this, pick any sequence , weakly-* convergent to . Pick a set , with and . For each , let and consider a continuous function , with support and and on . Then, due to the weak-* convergence . Moreover, since . This shows that , as desired.
Now, clearly . We show that : Let . Let be an arbitrary Borel set with . Pick an arbitrary small number . One can find an open set , consisting of finite metric balls , with and . Then, from the definition of , , therefore, . Since was arbitrary, this means . This shows .
The above paragraph implies
which completes the proof, since the latter expression is a countable union of intersections of closed (thus Borel) sets. ∎
Inspection of the proof above shows that can be replaced with a compact separable metric space equipped with a reference Borel measure .
2.3 Optimal transport on Riemannian manifolds
Next, we briefly recall some key results in optimal transport on Riemannian manifolds which we will use throughout the paper. We begin with a fundamental result of McCann [m3], orignally established by Brenier [bren] when is Euclidean:
Theorem 3 (Optimal transport on ; see Brenier [bren], McCann [m3]).
Assume is absolutely continuous with respect to volume measure. Then the infimum in (1.1) is attained by a unique measure . Furthermore, is concentrated on the graph of a measurable mapping over the first marginal, and takes the form
where is a -convex funtion; that is
for some function , where .
It is well known that the -convex function is semi-convex and therefore twice differentiable almost everywhere. At each point where this differentiability holds, the mapping is differentiable. We now recall a few classical identities, easily derived from the Brenier-McCann theorem (Theorem 3), or, for more general cost functions, from references such as [GM] [Caf] and [mtw].
Wherever is differentable, we have the first order condition:
Differentiating this identity, we obtain
It will also be important later to recall the second-order inequality due to (2.2):
Taking determinants of (2.3) yields
If both and are absolutely continuous with respect to volume, with densities and , respectively, we also have the change of variables formula almost everywhere:
which, together with (2.5), implies
In the present paper we will also make use of a multi-marginal version of the Brenier-McCann theorem (Theorem 3), which generalizes from Euclidean space a result of Gangbo and Swiech [GS] and is also related to the works of Carlier-Ekeland [CE] and Agueh-Carlier [ac]. Given probability measures on , the multi-marginal optimal transport problem is to minimize
over all probability measures on the -tuple product whose marginals are the ’s. There has recently been substantial interest and progress in understanding this problem in a variety of different settings; see [Pass14] and the references therein. In this paper, we will take the cost function to be
where , with , represent weights on the components making up the cost function; we will sometimes denote . In Euclidean space, this coincides with the cost studied by Gangbo and Swiech [GS], who proved assertion 1 in Theorem 4 below in that setting, extending earlier partial results of Olkin and Rachev [OR], Knott and Smith [KS] and Ruschendorf and Uckelmann [RU].
We have the following theorem:
Theorem 4 (Multi-marginal optimal transport on ; see [Kp, Sections 4 and 5]).
Assume is absolutely continuous with respect to .
There exists a unique minimizer of for almost all , and moreover this gives a -a.e one-to-one map .
Moreover, applying 1 and 2 to a result of Carlier-Ekeland [CE, Proof of Proposition 3], we get
is the unique Wasserstein barycenter measure of the measures with weights .
In particular, the assertions 2 and 3 will be important for us.
2.4 Geometric barycenters on Riemannian manifolds: volume distortion
In the remainder of the present section, we discuss geometric barycenters on a Riemannian manifold and introduce the volume distortion constants associated to them. Given a probability measure on , we denote its set of barycenters by
We introduce the notation:
for the barycenter of the discrete measure with weights .
We will also require the following notion:
Definition 5 (Volume distortion).
Let be a Borel probability measure on with a unique barycenter (that is, such that is a singleton). We define the genralized, or barycentric, volume distortion coefficients at
where denotes the Hessian of the function , and the determinants are computed in exponential local coordinates at and .
Remark 6 (Justification of the name volume distortion for ).
Volume distortion coefficents were introduced in [c-ems]; they capture the way the volume of a small ball is distorted as it is slid along geodesics ending at a common fixed point.
Our generalized coefficients, roughly speaking, capture the way that the volume of a small ball centred at is distorted by interpolating between points in this ball and the other points in the support of a probability measure on ; the classical coefficents correspond to the case when is concentrated at two points. We make this analogy precise below, in the case that is finitely supported.
has finite support and assume that, for near ,
is a singleton;
the function is differentiable at .
We claim that, for a fixed index ,
In particular, when , we have , where is the volume distortion coefficient of [c-ems].
From assumption 1, we can define . Now, the function
is differentiable near (for a proof, see e.g. [KP, Lemma 3.1]); moreover, by minimality, we have
From assumption 2, we can differentiate the last equation with respect to , which yields
After taking determinants and rearranging, we have,
Notice that the absolute value of the left-hand side of (2.11) is the volume distortion
Before concluding this section, we prove a result relating the to the Ricci curvature of . Let us fix the notation:
We will need the following lemma, whose proof is based on an argument in [c-ems].
Suppose is a lower bound for the Ricci curvature on . Then
The proof is exactly as in [c-ems, Lemma 3.12], but we take the trace over a orthonormal basis to get from sectional to Ricci curvature.
Suppose is a lower bound for the Ricci curvature on . Then
where is given in (2.12).
Note that is the inverse of evaluated at . Therefore, by the Bishop-Gromov volume comparison theorem (see, e.g., [BiCr], section 11.10, Theorem 15),
is nondecreasing along a geodesic starting at . As this function is when , the result follows. ∎
Proposition 9 (Distortion under ).
Suppose the Ricci curvature of is everywhere nonnegative, i.e., . Then, for any and , we have
Minimality of at the barycenter , combined with semi-concavity of and Fatou’s lemma yields
as a matrix (notice that until this moment we do not need any assumption on the curvature). Now, as is a probability measure, Lemma 7 with implies
applying the geometric-arithmetic mean inequality to the nonnegative matrix yields
Combining this with the inequality (from Lemma 8 with ), yields the desired result. ∎
More generally, if (for ), we have
where is the diameter of the manifold and
3 The Wasserstein barycenter: existence and uniqueness
Let us recall the notion of a Wasserstein barycenter of a probability measure on (Definition 1 in the introduction). Wasserstein barycenters were considered previously by Agueh-Carlier, who established existence and uniqueness results for finitely supported measures when the underlying space is Euclidean [ac]. Other variants of these results can be found in [CE] [P5] and [P9].
We present below a general existence and uniqueness result, which encompasses the earlier results found in [ac] [CE] [P5] and [P9]. The proof is essentially the same as the argument found in [P5], but is included in the interest of completeness.
Theorem 1 (Existence and uniqueness of the Wasserstein barycenter).
Recall the assumptions and notation in Section 2.1 If , then there exists a unique Wassertein barycenter of .
Due to compactness of , the set of probability measures on is weak-* compact, or, equivalently, the Wasserstein space is compact. Now, for any , the mapping is uniformly Lipschitz on Wasserstein space, and therefore so too is . Therefore, existence of a minimizer follows immediately.
The uniqueness will follow from the fact that, with respect to linear interpolation of measures, the functional is convex and the convexity is strict if . We prove this below.
We begin by studying . Let . Let be optimal couplings between and , for , respectively. We set and . Noting that has and as its margnals, we have
This yields convexity of the function .
Next, we will show this convexity is strict if . By the Brenier-McCann theorem (Theorem 3), there exists a unique optimal map for each , such that the unique optimal measure is concentrated on the graph .
We need to show that, assuming and , the inequality (3.1) is strict. Note first that the inequality is strict unless is an optimal coupling between and ; by the uniqueness result, this means we must have . That is, is concentrated on the graph of .
On the other hand, is concentrated on the union of two graphs, and :
This is possible only if almost everywhere, which, in turn, implies . This yields strict convexity of whenever is absolutely continuous with respect to volume.
Finally, integrating with respect to yields convexity of the functional , and the convexity is strict under the assumption . This implies uniqueness of its minimizer, the Wasserstein barycenter of .
By inspecting the above proof, it is clear that Theorem 1 holds for more general spaces than Riemannian manifolds. In fact, it holds for any (compact) metric space on which the optimal maps, , exist uniquely for any arbitrary absolutely continuous source measure . This includes for example, Alexandrov spaces [bertrand].
As an illuminating example, consider the round sphere. If the be the sum of two Dirac measures supported on the north and south pole, then its Wasserstein barycenter is not unique: any probability measure supported on the equator is a Wasserstein barycenter. However, if we smear out one of the Dirac measures making it absolutely continuous, then the resulting Wasserstein barycenter will be a unique (in fact absolutely continuous) measure supported near the equator.
4 Properties of the Wasserstein barycenter: first and second order balance
We develop here several properties of the Wasserstein barycenter which we will use later on. For some of these, we will need to assume that the Wasserstein barycenter is absolutely continuous with respect to volume. Conditions on ensuring this absolutely continuity will be presented later on. The main results of this section are Theorems 4 and 6, which are crucial for later sections.
4.1 Differentiability of family of dual potentials
The key results of this subsection are (4.1) and (4.2) for derivatives of the integral of a measurable family of dual potentials for optimal transport problems. We first establish the almost everywhere second differentiability of a certain measurable family of dual potentials:
Lemma 1 (a.e. and -a.e. ).
Let and for each , let be the dual potential (determined modulo an additive constant) for the optimal transport problem (1.1) between and . Let be a Borel probability on . For volume almost all , is twice differentiable for -almost all .
The proof is a simple application of Fubini’s theorem. Let be the set of points where the twice differentiability fails. We are to show that its projection onto , namely, has -measure zero, for almost all (notice that the set is measurable). Assume by contradiction that has positive -measure for some non-measure zero set of . Then there exists and a set with such that for all . Therefore, using Fubini’s theorem, we have
On the other hand, for each , the dual potential is a semi-convex function [c-ems] (recall is compact), so due to Alexandrov’s second differentiability theorem, is twice differentiable at Lebesgue (vol) a.e. points; i.e. has zero volume. So we have
The contradiction implies the desired result ∎
From Lemma 1, we see that for Lebesgue almost every , the maps and are well defined a.e. Now, an essential ingredient in our work is the function , where is the dual potential function given in Lemma 1. Note that this function is Lipschitz and semi-convex since each is uniformly Lipschitz and semi-convex (recalling that is compact). By Rademacher’s theorem and Alexandrov’s second differentiability theorem, this function is twice differentiable for Lebesgue almost every . Moreover, applying Lemma 1, for almost every , we immediately have the following:
Proposition 2 (Derivatives inside the integral ).
4.2 First and second order balance at the Wasserstein barycenter
We now consider the Wasserstein barycenter measure of , and the dual potentials for optimal transport problems (1.1) from to . Using the equations (4.1) and (4.2), we will establish the main results of this section, namely, the first and second order balance between the ’s with respect to : Theorem 4. We begin with a lemma relating barycenters on the manifold to Wasserstein barycenters on :
Lemma 3 (Riemannian barycenter from Wasserstein barycenter).
Let be a Wasserstein barycenter of the measure on and assume is absolutely continuous with respect to volume; let be an optimal map from to . Let . Then, for almost every , is a barycenter of .
If, in addition, , then for almost every , is the unique barycenter of .
We first show that -a.e. is a barycenter . The proof is by contradiction; suppose not. Then there exists a set with and for all , z is not a barycenter of . We define by letting be a measurable selection of the barycenters. Then, for all , we have
and the inequality is strict on the set of positive measure. We define the probability measure.
For each , the measure is then a coupling of and , and we have
where the second and fourth lines follow from Fubini’s theorem and the strict inequality in the third line follows from (4.3) (and the fact that (4.3) is strict on a set of positive measure). This contradicts the fact that is a barycenter of , completing the proof that is a barycenter of for