Dynamic Models of Wasserstein--Type Unbalanced Transport
We consider a class of convex optimization problems modelling temporal mass transport and mass change between two given mass distributions (the so-called dynamic formulation of unbalanced transport), where we focus on those models for which transport costs are proportional to transport distance. For those models we derive an equivalent, computationally more efficient static formulation, we perform a detailed analysis of the model optimizers and the associated optimal mass change and transport, and we examine which static models are generated by a corresponding equivalent dynamic one. Alongside we discuss thoroughly how the employed model formulations relate to other formulations found in the literature.
- 1 Introduction
- 2 Reminder: models for unbalanced optimal transport
- 3 Equivalence of dynamic and static problems
- 4 Overview of and relation between static and dynamic formulations
- 5 Examples
- 6 Conclusion
- A A version of the Stone–Weierstraß theorem
Optimal transport seeks the optimal way of transporting mass from a given initial distribution to a final distribution , both on some domain . In Kantorovich’s classical formulation, the transport is described by a transport plan or coupling , where is the amount of mass transported from to so that, formally, and . Among all those couplings, the optimal one minimizes
where denotes the cost per mass unit for transport from to . Solving the above minimization problem is computationally very costly due to the high dimensionality of . An equivalent convex formulation (originally for the case ) in much lower dimensions is provided by the celebrated Benamou–Brenier formula , which describes the transport via a material flow on during a time interval . For the special case , in which the total transport cost is known as the Wasserstein- or distance between and and in which the transport cost is proportional to the transport distance, one can even eliminate the time coordinate, reducing the problem dimensionality yet further.
1.1 Unbalanced optimal transport
In applications that need to quantify how close two mass distributions and are to each other, pure optimal transport does typically not suffice as a similarity measure since it requires and to have the same mass. Therefore, optimal transport models have recently been extended to the case of so-called unbalanced transport, where the masses are allowed to change during the transport. Early proposals for unbalanced transport problems can be found, for instance, in [3, 14].
A more systematic investigation started from dynamic formulations based on the Benamou–Brenier formula for the Wasserstein-2 distance by adding a source term to the mass conservation constraint and a suitable corresponding penalty to the energy functional. In [8, 10, 12] the source penalty was chosen to be the Fisher–Rao or Hellinger distance, which leads to the Wasserstein–Fisher–Rao (WFR) or Hellinger–Kantorovich (HK) distance. In [16, 13] variants of the total variation norm are studied as penalties. All models are careful to retain some form of 1-homogeneity (at least in space) to allow for spatially singular measures.
In [7, 12] equivalent expressions for the WFR/HK distance are derived, based on an extension of the ‘static’ Kantorovich formulation. In  the mass change is modelled by relaxing the exact marginal constraints and to soft marginal constraints where one penalizes the deviation with suitable entropy functionals. In  transport is described by two so-called semi-couplings with and . Intuitively, describes the mass starting out at with destination , while is the mass arriving in from . The Kantorovich functional is adapted suitably. It is shown that for a family of dynamic unbalanced problems (beyond WFR/HK) one can find corresponding semi-coupling formulations. For the WFR/HK distance the static formulas given in  and  are related via dualization and a change of variables [7, Corollary 5.9].
Due to its special structure, unbalanced extensions of the Wasserstein-1 distance have attracted particular attention. Marginal constraint relaxations of Wasserstein-1 where the deviation of the -marginals from and is penalized by the total variation norm are studied, for instance, in [15, 11]. This is closely related to the optimal partial transport problem studied in . The article  cited above gives essentially a dynamic reformulation of this distance. It is observed that this extension leads to a modified form of the Kantorovich–Rubinstein formula. A family of more general unbalanced extensions of this formula (in a certain sense the most general family) is studied in .
Roughly speaking, the above discussion mentions three types of formulations for unbalanced transport problems: ‘dynamic’ formulations based on the Benamou–Brenier formula, ‘static’ semi-coupling extensions of the Kantorovich formulation, and unbalanced Wasserstein-1-type extensions of the Kantorovich–Rubinstein formula. In the following, we refer to these families by the shorthands (Dyn), (SC), and (W1T). Via convex duality, each of these formulations can be expressed in a primal and a dual form, which we denote by a suffix (P) or (D). (By convention we refer to the measure formulation as primal, even though measures are identified with the topological dual of continuous functions.)
As discussed, it was observed that various unbalanced transport distances can be expressed in more than one formulation. In this article we study systematically the correspondence between unbalanced extensions of the Wasserstein-1 distance in the formulations (Dyn), (SC), and (W1T). A schematic relation between different (primal and dual) formulations, established correspondences, and new correspondences established in this article is shown in Figure 1. Precise definitions for all formulas are given throughout the article at the indicated positions. We return to a more in-depth discussion in Section 4.2, when the technical definitions have been established.
1.2 Outline and contribution
Restricting to a -type penalization of transport, this article aims at augmenting the picture of unbalanced transport shown in Figure 1 by several relations. The article is organized as follows.
Section 2: In Section 2.1 we introduce a family of dynamic unbalanced transport problems, (Dyn), where the penalty for transport is linear in its distance. This is the -type subset of the more general family of transport problems studied in . In Section 2.2 on the other hand, we introduce a family of static -type unbalanced transport problems, (W1T), based on generalizing the Kantorovich–Rubinstein formula. This is a subset of the family introduced in .
Section 3.1: We establish for every (Dyn-D) problem a corresponding (W1T-D) problem such that the resulting optimal values are identical (\threfthm:equivalence). This includes explicit relations between feasible candidates (e.g. \threflem:DynamicDualConstruction) and model parameters (\threfcor:HStatFromDyn).
Sections 3.2 and 3.3: We examine the relation between primal optimizers of (W1T-P) and (Dyn-P). For any optimizer of (W1T-P) we construct in Section 3.2 an optimizer for (Dyn-P) (\threfprop:DynamicPrimalOptimizers). These dynamic optimizers exhibit a very particular structure which is characteristic for -type transport problems: transport only occurs instantaneously at times 0 and 1, while in between only mass growth and shrinkage take place. In Section 3.3 we give a sufficient condition for the dynamic model which implies that any optimizer of (Dyn-P) is of this particular form (\threfcor:AllDynamicOptimizersStructure). Essentially, this temporal structure is the reason for (W1T-P) having a mass change penalty in between two Wasserstein-1 distances (as opposed to, for instance, a Wasserstein-1 distance between two mass change penalties, studied in , which has no equivalent formulation in (Dyn-P))
Section 3.4: We characterize minimizers of (W1T-P) models concerning the spatial relation between mass growth, shrinkage, and transport (\threfthm:transportChar). This provides an intuition of how the unbalanced transport operates and automatically implies a corresponding characterization of (Dyn-P) model minimizers. In particular, mass transport can neither occur into a region of previous or subsequent mass decrease nor out of a region of previous or subsequent mass increase. Moreover, we derive a model-dependent distance threshold (which may be infinite) beyond which no transport occurs (\threfprop:MaximalTransportDistance).
Sections 4.1-4.2: We establish the equivalence of (W1T-D) models to corresponding (SC-D) models (\threfprop:SCW1TEquivalence). By the above results this implies a correspondence between (Dyn) and (SC) models. Using the particular -type structure, the relation between the corresponding model parameters is more explicit than the corresponding result in , but it should be noted that the latter covers more general transport problems.
Section 4.3: We characterize precisely, which (W1T-D) models have an equivalent (Dyn-D) model (\threfthm:staticAsDynamic). In addition we provide corresponding simple sufficient as well as necessary conditions.
Section 5.1: We perform a detailed analysis of the optimal unbalanced transport between two Dirac masses (\threfexm:twoDiracs). This provides information on maximum and minimum transport distances as well as on how the optimal mass changes depend on the previous or subsequent transport.
Section 5.2: We provide novel examples of (Dyn-D) models by seeking the dynamic formulation of known (W1T-D) models (Table 1). Though most of these induce the same topology on the space of nonnegative measures (\threfthm:topology), they penalize mass changes differently. We also give a static counterexample for which no dynamic formulation exists (Remark LABEL:rem:NoDynamicCounterexample).
1.3 Setting and notation
Throughout the article, denotes the closure of a fixed open bounded connected subset of (note that the results would also hold for an arbitrary compact metric length space ). We will interpret as a metric space with the metric induced by shortest paths in . The Euclidean norm on is indicated by .
For a metric space we denote the set of continuous functions by and the space of Lipschitz continuous functions with Lipschitz constant no larger than by (for , the Lipschitz constant is with respect to the metric ). If has the structure of a differentiable manifold, then denotes the set of continuously differentiable functions . For the analogous function spaces of vector-valued functions into we write , , and .
Now let be a Borel measurable subset of a Euclidean space. By and we denote the space of nonnegative and of signed Radon measures (regular countably additive measures) on , respectively. The subset of probability measures on is denoted . Given , we indicate that is absolutely continuous with respect to by , and we denote the corresponding Radon–Nikodym derivative by . Given a measurable subset , the restriction of to is denoted . For a measure its two marginals are denoted by and and are defined for any Borel set via
Finally, the domain of a function is indicated as , the indicator function of a set is defined as if and otherwise, and the interior of a set is abbreviated as . For a normed vector space we denote its topological dual by . If is a proper function, then the Legendre–Fenchel conjugate of is defined as , . For a linear operator we denote its adjoint by . As usual we identify the topological duals of and with and .
2 Reminder: models for unbalanced optimal transport
Here we recall different extensions of the classical Wasserstein- metric that allow for mass changes during the mass transport. In particular, we recapitulate the class of dynamic models from  as well as a class of static unbalanced optimal transport models from . Establishing the equivalence of those two model classes belongs to the main aims of this work. We will use subscripts and to indicate dynamic and static formulations throughout; primal and dual energies are denoted and , respectively.
2.1 Dynamic -type models
In the dynamic model formulation we consider a time-varying measure which moves at a flux and simultaneously changes mass at rate . The relation between , , and is described by the following weak continuity equation.
[Weak continuity equation with source [7, Def. 4.1]] \thlabeldef:ContinuityEquation For denote by the affine subset of of triplets of measures satisfying the continuity equation in the distributional sense, interpolating between and and satisfying homogeneous Neumann boundary conditions. More precisely, we require
for all . This definition does not require that the map , which takes to the corresponding time-disintegration of , is (weakly) continuous. Instantaneous movement of mass via is characteristic for the distance and the unbalanced extensions that we study (cf. Remark 3.2). A Wasserstein-1 type model for unbalanced transport now penalizes the flux via its total variation as well as the mass change via an infinitesimal cost .
[Dynamic unbalanced -type model [7, Def. 4.2-3]] \thlabeldef:Dynamic Let be lower semi-continuous, convex, 1-homogeneous, and let it satisfy
The associated dynamic cost functional is defined as ,
where is any measure such that . By the 1-homogeneity of and this definition does not depend on the choice of . The corresponding primal dynamic unbalanced -type transport problem for fixed marginals reads
The cost also admits an equivalent dual formulation.
[Dynamic dual problem] \thlabelprop:DynamicDual For a dynamic problem as in \threfdef:Dynamic let be the closed convex set characterized by . Introduce and ,
Furthermore, minimizers of problem (4) exist if the infimum is finite.
Note that and are convex and lower semi-continuous. is linear and bounded. Further, there is some such that and is continuous at : for instance, let with and set , then will move along a line from towards , which can easily be shown to remain in the interior of (the reader may refer to \threfthm:DynConjugateSet, which will show this calculation in detail).
2.2 Static -type models
A different class of unbalanced optimal transport models was introduced in  of which we here consider a particular subclass. It is based on an infimal convolution type extension of the Wasserstein- distance by a penalty for changing a mass into .
[Static unbalanced -type model [21, Def. 2.21]] \thlabeldef:StaticPrimal A local discrepancy is a function satisfying the following properties,
is convex, 1-homogeneous, and lower semi-continuous,
if and if ,
if or .
A local discrepancy induces a discrepancy via
where is any measure with (the functional is independent of the particular choice). measures the cost of changing into by pointwise mass changes, where each mass change is penalized according to . The associated static cost functional is defined as ,
and the corresponding primal static unbalanced -type transport problem reads
The cost also admits an equivalent dual formulation.
[Static dual problem [21, Corollary 2.32]]\thlabelprop:StaticEquivalence For a static problem as in \threfdef:StaticPrimal let be the closed convex set characterized by . Introduce ,
Furthermore, minimizers of problem (9) exist if the infimum is finite. Finally, the sets characterized by for some local discrepancy are exactly the sets of the form
where (or equivalently ) satisfies (we drop the indices)
is concave, upper semi-continuous, and monotonically increasing,
for , ,
is differentiable at and .
Note that on their respective domains, and .
3 Equivalence of dynamic and static problems
We first show the equivalence between dynamic and static problems in their dual formulations, after which we shall turn to the primal interpretation.
3.1 Equivalence of optimal values
There are some structural similarities between the dynamic dual problem (6) and the static problem (11). If one identifies and , then the finite part of (5) corresponds to the finite part of the objective in (10). The constraint guarantees that , . In this section we show that the constraint for can be translated into a constraint for for a suitable choice of , such that problems (6) and (11) are in fact equivalent.
[Equivalence between dynamic and static problems]\thlabelthm:equivalence \thlabelprop:W1DynamicStaticEquivalence Consider the dynamic dual problem (6) and its characterizing set . Set
Then, by choosing
for the static dual problem (11) one finds
The function that characterizes as described in \threfprop:StaticEquivalence will be given in \threfcor:HStatFromDyn. A direct consequence of the proposition is that also the primal dynamic problem (4) is equivalent to a primal static problem (9). Here, the corresponding static mass change penalty can be obtained from the dynamic mass change penalty by first calculating the set via \threfprop:DynamicDual, applying the above proposition to obtain the set , and finally calculating via \threfprop:StaticEquivalence. This will be done explicitly in \threfprop:W1DynamicStaticEquivalencePrimal.
The proof is divided into several auxiliary lemmas. The strategy is as follows: The constraints of (6) are and for all . Let us ignore the constraint on for now. Since is nonnegative, for fixed the function in (5) will want to be as large as the -constraint allows. With the identification this yields an upper bound on for fixed and defines the set . This already implies . Note that the set does not necessarily satisfy the formal structural assumptions for , as specified by (12), which is why some post-processing from the preliminary to is required. We show in the proof of \threfprop:W1DynamicStaticEquivalence, however, that replacing with does not change the optimal value of (11). From feasible (in the sense of ) static dual variables for (10) we then reconstruct a feasible dynamic dual variable for (5) such that , . The Lipschitz constraint on suffices to imply , thus establishing the converse inequality . We now study the dynamic cost and the corresponding set in more detail.
[Properties of ]\thlabelthm:DynConjugateSet The set from \threfprop:DynamicDual satisfies
where is concave, upper semi-continuous, nonpositive, increasing on and decreasing on with as well as .
Since is one-homogeneous, must be a closed convex set. Due to for we have
for all so that implies . Thus there exists a function such that . Concavity and upper semi-continuity of follow from convexity and closedness of . Now for implies
so that . Furthermore, so that . The monotonicity properties now follow from , , and concavity. Next, by the assumptions on it is its own convex envelope. Therefore and in particular
Finally, note that the left and the right derivative of in exist due to concavity and are given by the monotone limits . We show (the other equality follows analogously). For a contradiction assume the existence of such that for all we have . Thus, for any with we have
which contradicts the positivity of for . ∎
For an illustration of the sets and as well as the functions and see Figure 5.
Now we turn to the structure of . As described above, for fixed the value of will intuitively try to be as large as the constraint allows. So, ignoring regularity, from \threfthm:DynConjugateSet we infer that will be given by the solution to the differential equation for initial value . This upper bound is rigorously established in \threflem:Flow. We first consider the case where the function introduced in \threfthm:DynConjugateSet satisfies an additional technical assumption. Functions not satisfying this assumption will be treated via an extra smoothing argument in \threflem:QKinkSmoothing.
asp:QNoKink Let . We assume that is differentiable at (left-differentiable if ). Note that functions with automatically satisfy \threfasp:QNoKink due to .
[Properties of flow] \thlabellem:Flow For we define the flow of as ,
where the supremum of the empty set is . For we set . The flow satisfies the following properties for all .
If , the map solves the initial value problem
If , then for all we have and , and the map is contained in .
is strictly increasing on its domain.
For , with if .
For the map is convex.
is differentiable in with .
is non-expansive on .
is locally Lipschitz differentiable on with for and for . Under \threfasp:QNoKink it is differentiable on all of .
be a partition of into three connected components. If the solution to (16) is given by for all . If , let be a suitable candidate in (15). The function must be nonincreasing due to . Without loss of generality we may assume that is contained in either or . Indeed, when , then must be contained in for all as well. On the other hand, when , we may pick where . This is feasible in (15) since by upper semi-continuity and is concave. Now for consider the integral
This is well-defined and finite when or when . The function is strictly decreasing and thus invertible. Denote its inverse by . Due to and
the inverse is well-defined at least on . Furthermore, and
so that indeed solves the initial value problem (16).
Now let and with and denote the solution to (16) by . Since is feasible in (15), . To show the reverse inequality, consider a competitor feasible for (15). If , then since is decreasing. If on the other hand , then as above we may assume to be contained in either or . Due to and , the monotonicity of implies . Summarizing, .
(iv) This is a standard property of solutions to scalar ordinary differential equations. Indeed, for this follows from