1 Introduction
Abstract

Optimal mass transport is described by an approximation of transport cost via semi-discrete costs. The notions of optimal partition and optimal strong partition are given as well. We also suggest an algorithm for computation of Optimal Transport for general cost functions induced by an action, an asymptotic error estimate and several numerical examples of optimal partitions.

Semi-Discrete approximation of Optimal Mass Transport

G. Wolansky,

Department of Mathematics, Technion, Haifa 32000, Israel 111Email:  gershonw@math.tecnion.ac.il

1 Introduction

Optimal mass transport (OMT) goes back to the pioneering paper of Monge [15] at the 18th century. In 1942, L. Kantorovich [13] observed that OMT can be relaxed into an infinite dimensional linear programming in measure spaces. As such , it has a dual formulation which is very powerful and was later (1987) used by Brenier [3] to develop the theory of Polar factorization of positive measures. OMT has many connections with PDE, kinetic theory, fluid dynamics, geometric inequalities, probability and many other fields in mathematics as well as in computer science and economy.

Even though finite dimensional (or discrete) OMT is well understood, its extension to infinite dimensional measure spaces poses a great challenge, e.g. uniqueness and regularity theory of fully non-linear PDE such as the Monge-Amper equation [6].

We suggest to investigate a bridge between finite (”discrete”) and infinite (”continuum”) dimensional OMT. This notion of semi-discrete OMT leads naturally to optimal partition of measure spaces. Our motivation in this paper is the development of numerical method for solving OMT. Efficient algorithms are of great interest to many fields in operational research and, recently, also for optical design [9, 19, 20] and computer vision (”earth moving metric”) [21].

When dealing with numerical approximations for OMT, the problem must be reduced to a discrete, finite OMT (with, perhaps, very large number of degrees of freedom). Discrete OMT is often called the assignment problem. This is, in fact, a general title for a variety of linear and quadratic programming. It seems that the first efficient algorithm was the so called ”Hungarian Algorithm”, after two Hungarian mathematicians. See [11, 23, 12, 8, 16] and the survey paper [18] for many other relevant references.

The deterministic, finite assignment problem is easy to formulate. We are given men and women. The cost of matching man to a woman is . The object is to find the assignment (matching) , given in terms of a permutation which minimize the total cost of matching .

When replacing the deterministic assignment by a probabilistic one, we assign the probability for matching man to woman . The discrete assignment problem is then reduced to the linear programming of minimizing

 n∑i=1n∑j=1pjici,j (1)

over all stochastic matrices , i.e. these matrices which satisfy the linear constraints

 n∑k=1pjk=n∑k=1pki=1  ;  pji≥0  ∀ i,j∈{1,…n} .

The Birkhoff Theorem assures us, to our advantage, that the optimal solution of this continuous assignment problem is also the solution of the deterministic version.

The probabilistic version seems to be more difficult since it involves a search on a much larger set of stochastic matrices. On the other hand, it has a clear advantage since it is, in fact, a linear programming which can be handled effectively by well developed algorithms for such problems.

In many cases the probabilistic version cannot be reduced to the deterministic problem. For example, if the number of sources and number of targets not necessarily equal, or when not all sources must find target, and/or not all targets must be met, then the constraints are relaxed into and/or . We shall not deal with these extension in the current paper, except, to some extent, in section 4 below.

1.1 From the discrete assignment problem to the continuum OMT

Let be a probability measure on some measure space , and another probability measure on (possibly different) measure space . Let be the cost of transporting to . The object of the Monge problem is to find a measurable mapping which generalizes the deterministic assignment perturbation described above in the following sense:

 T#μ=ν  namely  μ(T−1(B))=ν(B) (2)

for every measurable set . The optimal Monge mapping (if exists) realizes the infimum

 infT#μ=ν∫Xc(x,T(x))μ(dx) .

The relaxation of Monge problem into Kantorovich problem is analogues to the relaxation of the deterministic assignment problem to the probabilistic one: Find the minimizer

 c(μ,ν):=minπ∈ΠYX(μ,ν)∫X∫Yc(x,y)π(dxdy) (3)

among all probability measures

 { Probability measures on X×Y whose X (resp. Y) marginals are μ (resp. ν)} . (4)

In fact, Kantorovich problem is just an infinite dimensional linear programming over the huge set . The Monge problem can be viewed as a restriction of the Kantorovich problem to the class of deterministic probability measures in , given by where . It turns out, somewhat surprisingly, that the value of the Kantorovich problem equals to the infimum (3) of Monge problem, provided is a continuous function on and does not contain a Dirac singularity (an atom) [1].

1.2 Semi-finite approximation- The middle way

Suppose the transportation cost on can be obtained by interpolation of pair of functions on and on , where is a third domain and the interpolation means

 c(x,y):=infz∈Zc(1)(x,z)+c(2)(z,y) . (5)

A canonical example for is where , . Then (5) is valid for and both . So

 c(x,y):=|x−y|p=2p−1infz∈Rd|x−z|p+|z−y|p (6)

for any provided . Note in particular that the minimizer above is unique, , provided , while for any if .

Let is a finite set. Denote

 cZm(x,y):=minz∈Zmc(1)(x,z)+c(2)(z,y)≥c(x,y) (7)

the () semi-finite approximation of given by (5).

An optimal transport plan for a semi-discrete cost (7) is obtained as a pair of partitions of the spaces and . An partition is a decomposition of the the space into mesurable, mutually disjoint subset. It turns out that can be obtained as

 cZm(μ,ν)=inf{Az},{Bz}∑z∈Zm∫Azc(1)(x,z)μ(dx)+∫Bzc(2)(z,y)ν(dz) (8)

where the infimum is on the pair of partitions of and of satisfying for any . The optimal plan is, then, reduced to plans transporting to , for any , where is the optimal partition realizing (8).

The real advantage of the semi-discrete method described above is that it has a dual formulation which convert the optimization (8) to a convex optimization on . Indeed, we prove that for a given there exists a concave function such that

 max→p∈RmΞνμ,Zm(→p)=cZm(μ,ν)

and, under some conditions on either or , the maximizer is unique up to a uniform translation on . Moreover, the maximizers of yield the unique partitions of (8).

The accuracy of the approximation of by depends, of course, on the choice of the set . In the special (but interesting) case and , it can be shown that for any in a compact set, where are distributed on a regular grid containing this set.

From (7) and the above reasoning we obtain in particular

 cZm(μ,ν)−c(μ,ν)≥0 (9)

for any pair of probability measures, and that, for a reasonable choice of , (9) is of order if the supports of are contained in a compact set.

For a given and pair of probability measures and , the optimal choice of is the one which minimizes (9). Let

 ϕm(μ,ν):=infZm⊂ZcZm(μ,ν)−c(μ,ν)≥0 (10)

where the infimum is over all sets of points in . Note that the optimal choice now depends on the measures themselves (and not only on their supports). A natural question is then to evaluate the assymptotic limits

 ¯ϕ(μ,ν):=limsupm→∞m2/dc–m(μ,ν)   ;   ϕ––(μ,ν):=liminfm→∞m2/dc–m(μ,ν) .

Some preliminary results regarding these limits are discussed in this paper.

1.3 Numerical method

The numerical calculation of (3) we advertise in this paper apply the semi-discrete approximation of order . It also involves discretization of into atomic measures of finite support (). The level of approximation is determined by the two parameters: The cardinality of the supports of the discretized measures, , and the cardinality of the semi-finite approximation of the cost. The idea of semi-discrete approximation is to choose much larger than . As we shall see, the evaluation of the approximate solution involves finding a maximizer to a concave function in variables, where the complexity of calculating this function, and each of its partial derivatives, is of order . A naive gradient descent method then result in iterations to approximate this maximum, where each iteration is of order . This yields a complexity of order to obtain a transport plan on the approximation level of . This should be compared to the complexity of the Hungarian algorithm [17]. We shall not, however, pursue a rigorous complexity estimate in this paper.

1.4 Structure of the paper

In section 2 we consider optimal partitions in the weak sense of probability measures, as Kantorovich relaxation of solutions of the optimal transport in semi-discrete setting. We formulate and prove a duality theorem (Theorem 2.1) which yields the relation between the minimizer of the OMT with semi-discrete cost to maximizing a dual function of variables.

In section 3 we define strong partitions of the domains, and introduce conditions for the uniqueness of optimal solution and its representation as the analogue of optimal Monge mapping. The main results of this section is given in Theorem 3.1. In section 4 we introduce an interesting application of this concept to the theory of pricing of goods in Hedonic markets, and remark on possible generalization of optimal partitions to optimal subpartition. This model, related generalizations and further analysis will be pursued in a separate publication.

In section 5 we discuss optimal sampling of fixed number of centers (). In particular we show a monotone sequence of improving semi-discrete approximation by floating the centers into improved positions. In section 5.2 we provide some assymptotic properties of the error of the semi-discrete approximation as .

In section 6 we introduces a detailed description of the algorithm on the discrete level.

In section 7 we show some numerical experiments of calculating optimal partitions in the case of quadratic cost functions on a planar domain.

The numerical method we propose in this paper has some common features with the approach of Merigot [14], see also [4], as we recently discovered. We shall discuss this issues in section 8.

1.5 Notations and standing assumptions

1. , are Polish (complete, separable) metric spaces.

2. is the cone of non-negative Borel measures on (resp. for ).

3. The topology on is the dual of , the space of bounded continuous functions on (resp. for ).

4. is the cone of probability (normalized) non-negative Borel measures in (resp. for ).

5. For , ,

6. The simplex .

2 Optimal partitions

Definition 2.1.

i) A partition of a pair of a probability measure subjected to is given by nonnegative measures on such that and . The set of all such partitions is denoted by .

ii) If, in addition, then iff and for some .

The following Lemma is a result of compactness of probability Borel measure on a compact space (see e.g. [5]).

Lemma 2.1.

For any , the set of partitions is compact with respect to the topology. In addition, is compact with respect to topology.

Lemma 2.2.
 cZm(μ,ν)=min(→μ,→ν)∈PYX(μ,ν)∑z∈Zm[∫Xc(1)(x,z)μz(dx)+∫Yc(2)(z,y)νz(dy]

where as defined by (3, 7) and .

Proof.

First note that the existence of minimizer is obtained by Lemma 2.1.

Define, for ,

 Γz:={(x,y)∈X×Y; c(1)(x,z)+c(2)(z,y)≤cZm(x,y)}⊂X×Y

such that is measurable in , if and . Note that, in general, the choice of is not unique.

Given , let be the restriction of to . In particular . Let be the marginal of and the marginal of . Then defined in this way is in . Since by definition a.s. ,

 ∫X∫YcZm(x,y)π(dxdy)=∑z∈Zm∫X∫YcZm(x,y)πz(dxdy)=∑z∈Zm∫X∫Y(c(1)(x,z)πz(dxdy)+∫X(c(2)(z,y)πz(dxdy)πz(dxdy)=∑z∈Zm[∫Xc(1)(x,z)μz(dx)+∫Yc(2)(z,y)νz(dy)] (11)

Choosing above to be the optimal transport plan we get the inequality

 cZm(μ,ν)≥inf(→μ,→ν)∈PYX(μ,ν)∑z∈Zm[∫Xc(1)(x,z)μz(dx)+∫Yc(2)(z,y)νz(dy] .

To obtain the opposite inequality, let and set . Define . Then and, from (7)

 ∫X∫YcZm(x,y)π(dxdy)=∑z∈Zm∫X∫YcZm(x,y)r−1zμz(dx)νz(dy)≤∑z∈Zm∫X(c(1)(x,z)+c(2)(z,y))r−1zμz(dx)νz(dy)=∑z∈Zm[∫Xc(1)(x,z)μz(dx)+∫Yc(2)(z,y)νz(dy)] (12)

and we get the second inequality. ∎

Given , let

 ξ(1)Zm(→p,x):=minz∈Zmc(1)(x,z)+pz  ;   ξ(2)Zm(→p,y):=minz∈Zmc(2)(z,y)+pz (13)
 ΞZmμ(→p):=∫Xξ(1)Zm(→p,x)μ(dx)  ;  ΞZmν(→p):=∫Yξ(2)Zm(→p,y)ν(dy) . (14)
 Ξνμ,Zm(→p):=ΞZmμ(→p)+ΞZmν(−→p) . (15)
Lemma 2.3.

If then for any ,

 (−ΞZmμ)∗(−→r):=sup→p∈RmΞZmμ(→p)−→p⋅→r=c(1)(μ,∑z∈Zmrzδz)=min→μ∈P→rX(μ)∑z∈Zm∫Xc(1)(x,z)μz(dx) . (16)

Analogously, for

 (−ΞZmν)∗(−→r):=sup→p∈RmΞZmν(→p)−→p⋅→r=c(2)(ν,∑z∈Zmδz)=min→ν∈P→rY(ν)∑z∈Zm∫Yc(2)(z,y)νz(dy)  . (17)

Here .

Proof.

This is a special case of the general duality theorem of Monge-Kantorovich. See, for example [22]. It is also a special case of generalized partitions, see Theorem 3.1 and its proof in [24].

Theorem 2.1.
 sup→p∈RmΞνμ,Zm(→p)=cZm(μ,ν) . (18)
Proof.

From Lemma 2.2, Lemma 2.3 and Definition 2.1 we obtain

 cZm(μ,ν)=inf→r∈Σm[(−ΞZmμ)∗(−→r)+(−ΞZmν)∗(−→r)] . (19)

Note that , as defined in ( 16, 17), are, in fact, the Legendre transforms of , , respectively. As such, they are defined formally on the whole domain (considered as the dual of itself under the canonical inner product). It follows that for . Note that this definition is consistent with the right hand side of ( 16, 17), since for .

On the other hand, and are both finite and continuous on the whole of . The Fenchel-Rockafellar duality theorem (see [22]- Thm 1.9) then implies

 sup→p∈RmΞZmμ(→p)+ΞZmν(−→p)=inf→r∈Rm(−ΞZmμ)∗(→r)+(−ΞZmν)∗(→r) . (20)

The proof follows from (15, 19).

An alternative proof:
We can prove (18) directly by constrained minimization, as follows: iff

 ∑z∈Zmpi(∫Xdμi−∫Ydνi)+∫Xϕ(x)(μ(dx)−∑z∈Zmμz(dx))+∫Yψ(y)(ν(dy)−∑z∈Zmνz(dy))≤0

for any choice of , , . Moreover, unless . We can then obtain from Lemma 2.2:

 inf{μz∈M+(X),νz∈M+(Y)}sup→p∈Rm,ϕ∈C(X),ψ∈C(Y)∑z∈Zm[∫Xc(1)(x,z)μz(dx)+∫Yc(2)(z,y)νz(dy)]+F(→p,ϕ,ψ)=sup→p∈Rm,ϕ∈C(X),ψ∈C(Y)inf{μz∈M+(X),νz∈M+(Y)}∑z∈Zm∫X(c(1)(x,z)+pz−ϕ(x))μz(dx)+∑z∈Zm∫Y(c(2)(z,y)−pz−ψ(y))νz(dy)+∫Xϕμ(dx)+∫Yψν(dy) . (21)

We now observe that the infimum on above is unless and for any . Hence, the two sums on the right of (21) are non-negative, so the infimum with respect to is zero. To obtain the supremum on the last two integrals on the right of (21) we choose as large as possible under this constraint, namely

 ϕ(x)=minz∈Zmc(1)(x,z)+pz   ,   ψ(y)=minz∈Zmc(2)(z,y)−pz

so , by definition via (13). ∎

3 Strong partitions

We now define strong partitions as a special case of partitions (Definition 2.1).

Definition 3.1.

i) A partition is called a strong partition if there exists measurable sets , which are essentially disjoint, namely for and , such that is the restriction of to . The set of strong partition corresponding to is denoted by .

ii) In addition, for then iff and for some . In particular, a strong partition is composed of measurable sets and measurable sets such that for .

Assumption 3.1.

.

a) is atomless and for any and any .

b) is atomless and for any and any .

Let us also define, for

 Az(→p):={x∈X; c(1)(x,z)+pz=ξ(1)Zm(→p,x)}   ;   Bz(→p):={y∈Y; c(2)(z,y)+pz=ξ(2)Zm(→p,y)} . (22)

Note that, by (13, 14)

 ΞZmμ(→p)=∑z∈Zm∫Az(→p)(c(1)(x,z)+pz)μ(dx) (23)

likewise

 ΞZmν(→p)=∑z∈Zm∫Bz(→p)(c(2)(z,y)+pz)ν(dy) . (24)
Lemma 3.1.

Under assumption 3.1 (a) (resp. (b))

i) For any , (resp. ) induces essentially disjoint partitions of (resp. ).

ii) (resp. ) is continually differentiable functions on ,

 ∂ΞZmμ∂pz=μ(Az(→p))   resp.   ∂ΞZmν∂pz=ν(Bz(→p)) .

This Lemma is a special case of Lemma 4.3 in [W].

Theorem 3.1.

Under either assumption 3.1-(a) or (b) there exists a unique minimizer of (19). In addition, there exists a maximizer of , and either (in case (a)) or (in case (b)) induces a corresponding strong partition in (a) or (b) . In particular, if both (a+b) holds then induces a strong partition in , and

 π0(dxdy):=∑z∈Zm;r0,z=μ(Az(→p0))(r0,z)−11Az(→p0)(x)1Bz(−→p0)(y)μ(dx)ν(dy) (25)

is the unique optimal transport plan for .

Proof.

Note that is invariant under additive shift for and . Indeed, for any where . So, we restrict the domain of to

 →p∈Rm  ,  →p⋅→1=0 . (26)

Assume (a). Given . Assume first

 rz∈(0,1)   for any z∈Zm . (27)

We prove the existence of a maximizer ,

 (−ΞZmμ)∗(−→r)=ΞZmμ(→p0)−→p0⋅→r≥ΞZmμ(→p)−→p⋅→r

for any . Let be a maximizing sequence, that is

 limn→∞ΞZmμ(→pn)−→pn⋅→r=(−ΞZmμ)∗(−→r)

(c.f. (17)).

Let be the Euclidian norm of . If we prove that for any maximizing sequence the norms are uniformly bounded, then there exists a converging subsequence whose limit is the maximizer . This follows, in particular, since is a closed (upper-semi-continuous) function.

Assume there exists a subsequence along which . Let . Let

 ΞZmμ(→pn)−→pn⋅→r:=[ΞZmμ(→pn)−→pn⋅∇→pΞZmμ(→pn)]+→pn⋅(∇→pΞZmμ(→pn)−→r)=[ΞZmμ(→pn)−→pn⋅∇→pΞZmμ(→pn)]+∥→pn∥2^→pn⋅(∇→pΞZmμ(→pn)−→r) . (28)

In addition, by (23) and Lemma3.1-(ii)

 −∞<∫Xmin