A Numerical Method to solve Optimal Transport Problems with Coulomb Cost

# A Numerical Method to solve Optimal Transport Problems with Coulomb Cost

Jean-David Benamou Jean-David Benamou, Luca Nenna INRIA, MOKAPLAN, Domaine de Voluceau Le Chesnay, FRANCE, 22email: jean-david.benamou/luca.nenna@inria.frGuillaume Carlier CEREMADE, Université Paris Dauphine 44email: carlier@ceremade.dauphine.fr    Guillaume Carlier Jean-David Benamou, Luca Nenna INRIA, MOKAPLAN, Domaine de Voluceau Le Chesnay, FRANCE, 22email: jean-david.benamou/luca.nenna@inria.frGuillaume Carlier CEREMADE, Université Paris Dauphine 44email: carlier@ceremade.dauphine.fr    Luca Nenna Jean-David Benamou, Luca Nenna INRIA, MOKAPLAN, Domaine de Voluceau Le Chesnay, FRANCE, 22email: jean-david.benamou/luca.nenna@inria.frGuillaume Carlier CEREMADE, Université Paris Dauphine 44email: carlier@ceremade.dauphine.fr
###### Abstract

In this paper, we present a numerical method, based on iterative Bregman projections, to solve the optimal transport problem with Coulomb cost. This is related to the strong interaction limit of Density Functional Theory. The first idea is to introduce an entropic regularization of the Kantorovich formulation of the Optimal Transport problem. The regularized problem then corresponds to the projection of a vector on the intersection of the constraints with respect to the Kullback-Leibler distance. Iterative Bregman projections on each marginal constraint are explicit which enables us to approximate the optimal transport plan. We validate the numerical method against analytical test cases.

## 1 Introduction

### 1.1 On Density functional theory

Quantum mechanics for a molecule with electrons boils down to the many-electron Schrödinger equation for a wave function (in this paper, we neglect the spin variable). The limit of this approach is computational : in order to predict the chemical behaviour of ( electrons) using a gridpoints discretization of , we need to solve the Schrödinger equation on gridpoints. This is why Hohenberg, Kohn and Sham introduced, in HK () and KS (), the Density Functional Theory (DFT) as an approximate computational method for solving the Schrödinger equation at a more reasonable cost.

The main idea of the DFT is to compute only the marginal density for one electron

 ρ(x1)=∫γN(x1,x2⋯,xN)dx2⋯dxN,

where is the joint probability density of electrons at positions , instead of the full wave function . One scenario of interest for the DFT is when the repulsion between the electrons largely dominates over the kinetic energy. In this case, the problem can, at least formally, be reformulated as an Optimal Transport (OT) problem as emphasized in the pioneering works of Buttazzo, De Pascale and Gori-Giorgi ButDeGo () and Cotar, Friesecke and Klüppelberg CoFrieKl ().

### 1.2 Optimal Transport

Before discussing the link between DFT and OT, let us recall the standard optimal transport problem and its extension to the multi-marginal framework. Given two probability distributions and (on , say) and a transport cost : , the optimal transport problem consists in finding the cheapest way to transport to for the cost . A transport map between and is a Borel map such that i.e. for every Borel subset of . The Monge problem (which dates back to 1781 when Monge Monge () posed the problem of finding the optimal way to move a pile of dirt to a hole of the same volume) then reads

 minT#μ=ν∫Rdc(x,T(x))μ(dx). (1)

This is a delicate problem since the mass conservation constraint is highly nonlinear (and the feasible set may even be empty for instance if is a Dirac mass and is not). This is why, in 1942, Kantorovich Kanth42 () proposed a relaxed formulation of (1) which allows mass splitting

 minγ∈Π(μ,ν)∫Rd×Rdc(x,y)γ(dx,dy) (2)

where consists of all probability measures on having and as marginals, that is:

 γ(A×Rd) =μ(A),∀A Borel subset of Rd, (3) γ(Rd×B) =ν(B),∀B Borel subset of Rd. (4)

Note that this is a linear programming problem and that there exists solutions under very mild assumptions (e.g. continuous and and compactly supported). A minimizing in (2) is called an optimal transport plan and it gives the probability that a mass element in be transported in . Let us remark that if is a transport map then it induces a transport plan so if an optimal plan of (2) has the form (which means that no splitting of mass occurs and is concentrated on the graph of ) then is actually an optimal transport map i.e. a solution to (1). The linear problem (2) also has a convenient dual formulation

 maxu,v|u(x)+v(y)≤c(x,y)∫Rdu(x)μ(dx)+∫Rdv(y)ν(dy) (5)

where and are the so-called Kantorovich potentials. OT theory for two marginals has developed very rapidly in the 25 last years, there are well known conditions on , and which guarantee that there is a unique optimal plan which is in fact induced by a map (e.g. and absolutely continuous, see Brenier bre ()) and we refer to the textbooks of Villani Villani03 (); Villani09 () for a detailed exposition.

Let us now consider the so-called multi-marginal problems i.e. OT problems involving marginals and a cost : , which leads to the following generalization of (2)

 minγ∈Π(μ1,⋯,μN)∫Rd×Nc(x1,⋯,xN)γ(dx1,⋯,dxN) (6)

where is the set of probability measures on having as marginals. The corresponding Monge problem then becomes

 minTi#μ1=μi,i=2,⋯,N∫Rdc(x1,T2(x1),⋯,TN(x1))μ1(dx1). (7)

Such multi-marginals problems first appeared in the work of Gangbo and Świȩch gansw () who solved the quadratic cost case and proved the existence of Monge solutions. In recent years, there has been a lot of interest in such multi-marginal problems because they arise naturally in many different settings such as economics carlierekeland (), Pass-match (), polar factorization of vector fields and theory of monotone maps ghou-mau () and, of course, DFT ButDeGo (); CoFrieKl (); CPM (); friesecke2013n (); mendl2013kantorovich (); cotar2013infinite (), as is recalled below. Few results are known about the structure of optimal plans for (7) apart from the general results of Brendan Pass pass (), in particular the case of repulsive costs such as the Coulomb’s cost from DFT is an open problem.

The paper is structured as follows. In Section 2, we recall the link between Density Functional Theory and Optimal Transportation and we present some analytical solutions of the OT problem (e.g. optimal maps for radially symmetric marginals, for 2 electrons). In Section 3, we introduce a numerical method, based on iterative Bregman projections, and an algorithm which aims at refining the mesh where the transport plan is concentrated. In section 4 we present some numerical results. Section 5 concludes.

## 2 From Density Functional Theory to Optimal Transportation

### 2.1 Optimal Transportation with Coulomb cost

In Density Functional Theory HK () the ground state energy of a system (with electrons) is obtained by minimizing the following functional w.r.t. the electron density :

 E[ρ]=minρ∈RFHK[ρ]+∫vext(x)ρ(x)dr (8)

where ,

is the electron-nuclei potential ( and are the charge and the position of the nucleus, respectively) and is the so-called Hohenberg-Kohn which is defined by minimizing over all wave functions which yield :

 FHK[ρ]=minψ→ρℏ2T[ψ]+Vee[ψ] (9)

where is a semiclassical constant factor,

is the kinetic energy and

is the Coulomb repulsive energy operator.

Let us now consider the Semiclassical limit

and assume that taking the minimum over commutes with passing to the limit (Cotar, Friesecke and Klüppelberg in CoFrieKl () proved it for ), we obtain the following functional

 VSCEee[ρ]=minψ→ρ∫⋯∫ N∑i=1N∑j>i1|xi−xj||ψ|2dx1⋯dxN (10)

where is the minimal Coulomb repulsive energy whose minimizer characterizes the state of Strictly Correlated Electrons(SCE).

Problem (10) gives rise to a multi-marginal optimal transport problem as (6) by considering that

• according to the indistinguishability of electrons, all the marginals are equal to ,

• the cost function is given the electron-electron Coulomb repulsion,

 c(x1,...,xN)=N∑i=1N∑j>i1|xi−xj|, (11)
• we refer to (which is the joint probability density of electrons at positions ) as the transport plan.

The Coulomb cost function (11) is different from the costs usually considered in OT as it is not bounded at the origin and it decreases with distance. So it requires a generalized formal framework, but this is beyond the purpose of this work (see ButDeGo () and CoFrieKl ()). Finally (10) can be re-formulated as a Kantorovich problem

 VSCEee[ρ]=minπi(γN)=ρ,i=1,⋯,N∫R3Nc(x1,⋯,xN)γN(x1,⋯,xN)dx1⋯dxN (12)

where

is the th marginal. As mentioned in section 1.2 if the optimal transport plan has the following form

 γN(x1,⋯,xN)=ρ(x1)δ(x2−f⋆2(x1))⋯δ(xN−f⋆N(x1)) (13)

then the functions are the optimal transport maps (or co-motion functions) of the Monge problem

 VSCEee[ρ]=min{fi:R3→R3}Ni=1∫N∑i=1N∑j>i1|fi(x)−fj(x)|ρ(x)dxs.t.fi#ρ=ρ,i=2,...,N,f1(x)=x. (14)
###### Remark 1

(Physical meaning of the co-motion function) determine the position of the -th electron in terms of which is the position of the “1st”electron : defines a system with the maximum possible correlation between the relative electronic positions.

In full generality, problem (14) is delicate and proving the existence of the co-motion functions is difficult. However, the co-motion functions can be obtained via semianalytic formulations for spherically symmetric atoms and strictly 1D systems (see CoFrieKl (), SeidlGoriSavin (), MaletGori (), CPM ()) and we will give some examples in the following section.

Problem (12) admits a useful dual formulation in which the so called Kantorovich potential plays a central role

 VSCEee=maxu{N∫u(x)ρ(x)dxs.t.N∑i=1u(xi)≤c(x1,...,xN)}. (15)

Because is invariant by permutation, there is a single dual Kantorovich potential for all all marginal constraints. Moreover, this potential is related to the co-motion functions via the classical equilibrium equation (see SeidlGoriSavin ())

 ∇u(x)=−N∑i=2x−fi(x)|x−fi(x)|3. (16)
###### Remark 2

(Physical meaning of (16)) The gradient of the Kantorovich potential equals the total net force exerted on the electron in by electrons in .

### 2.2 Analytical Examples

#### The case N=2 and d=1

In order to better understand the problem we have formulated in the previous section, we recall some analytical examples (see ButDeGo () for the details).

Let us consider 2 particles in one dimension and marginals

 ρ1(x)=ρ2(x)={aif|x|≤a/20otherwise. (17)

After a few computations, we obtain the following associated co-motion function

 f(x)={x+a2x−a2. (18)

If we take

 ρ1(x)=ρ2(x)=a−|x|a2definedin[−a,a], (19)

we get

 f(x)=x|x|(√2a|x|−x2−a)on[−a,a] (20)

Figure 1 shows the co-motion functions for (17) and (19).

#### The case N>2 and d=1

In CPM (), the authors proved the existence of optimal transport maps for problem (14) in dimension and provided an explicit construction of the optimal maps. Let be the normalized electron density and be such that

.

Thus, there exists a unique increasing function on each interval such that for every test-function one has

 ∫[xi,xi+1]φ(~f(x))ρ(x)dx =∫[xi+1,xi+2]φ(x)ρ(x)dx∀i=0,⋯,N−2, (21) ∫[xN−1,xN]φ(~f(x))ρ(x)dx =∫[x0,x1]φ(x)ρ(x)dx, (22)

The optimal maps are then given by

 f2(x) =~f(x) (23) fi(x) =f(i)2(x)∀i=2,⋯,N, (24)

where stands for the th composition of with itself. Here, we present an example given in ButDeGo (). We consider the case where is the Lebesgue measure on the unit interval , the construction above gives the following optimal co-motion functions

 f2(x)={x+1/3ifx≤2/3x−2/3ifx>2/3, f3(x)=f2(f2(x))={x+2/3ifx≤1/3x−1/3ifx>1/3. (25)

Furthermore, we know that the Kantorovich potential satisfies the relation (here we take )

 u′(x)=−N∑i=2x−fi(x)|x−fi(x)|3 (26)

and by substituting the co-motion functions in (26) (and integrating it) we get

 u(x)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩454x0≤x≤1/31541/3≤x≤2/3−454x+4542/3≤x≤1 (27)

Figure 2 illustrates this example.

When similar arguments as above can be developed and we can similarly compute the co-motion functions and the Kantorovich potential.

#### The radially symmetric marginal case for N=2, d≥2

We discuss now the radial dimensional () case for . We assume that the marginal is radially symmetric, then we recall the following theorem from CoFrieKl ():

###### Theorem 2.1

CoFrieKl () Suppose that , then the optimal transport map is given by

 f⋆(x)=x|x|g(|x|),x∈Rd, (28)

with , , where denotes the measure of , the unit sphere in .

###### Example 1

(Spherical coordinates system) If is radially symmetric , it is convenient to work in spherical coordinates and then to set for every radius

 λ(r)=C(d)rd−1ρ(r) (29)

so that for every test-function we have

 ∫Rdφ(x)ρ(|x|)dx=∫+∞0(∫Sd−1φ(r,ω)dσ(ω)Cd)λ(r)dr

with the measure of and the measure on which in particular implies that i.e.

 ∫Rdφ(|x|)ρ(|x|)dx=∫+∞0φ(r)λ(r)dr,∀φ∈Cc(R+). (30)

The radial part of the optimal co-motion function can be computed by solving the ordinary differential equation

which gives

 ∫a(r)0λ(s)ds=2−∫r0λ(s)ds. (31)

We define , since is increasing, its inverse is well defined for . Thus, we see that has the form

 a(r)=R−1(2−R(r)). (32)

#### Reducing the dimension under radial symmetry

In the case where the marginal is radially symmetric, the multi-marginal problem with Coulomb cost

 infγ∈Π(ρ,⋯,ρ)∫RdNc(x1,⋯,xN)dγ(x1,⋯,xN) (33)

with the Coulomb cost given by (11) involves plans on which is very costly to discretize. Fortunately, thanks to the symmetries of the problem, it can actually be solved by considering a multi-marginal problem only on . Let us indeed define for every :

 ˜c(r1,⋯,rN):=inf{c(x1,⋯,xN):|x1|=r1,⋯,|xN|=rN}. (34)

Defining by (29) (or equivalently (30)) and defining as the set of probability measures on having each marginal equal to , consider

 inf˜γ∈Π(λ,⋯,λ)∫RN+˜c(r1,⋯,rN)d˜γ(r1,⋯,rN). (35)

We claim that . The inequality is easy: take and define its radial component by

 ∫RN+F(r1,⋯,rN)d˜γ(r1,⋯,rN):=∫RdNF(|x1|,⋯,|xN|)dγ(x1,⋯,xN),∀F∈Cc(RN+), (36)

it is obvious that and since , the inequality easily follows. To show the converse inequality, we use duality. Indeed, by standard convex duality, we have

 inf(???)=supu{N∫Rdu(x)ρ(x)dx:N∑i=1u(xi)≤c(x1,⋯,xN)} (37)

and similarly

 inf(???)=supv{N∫R+v(r)λ(r)dr:N∑i=1v(ri)≤˜c(r1,⋯,rN)}. (38)

Now since is radially symmetric and the constraint of (37) is invariant by changing by with a rotation (see (11)) , there is no loss of generality in restricting the maximization in (37) to potentials of the form , but then the constraint of (37) implies that satisfies the constraint of (38). Then we have . Note then that solves (33) if and only if its radial component solves (33) and -a.e. Therefore (33) gives the optimal radial component, whereas the extra condition -a.e. gives an information on the angular distribution of .

## 3 Iterative Bregman Projections

Numerics for multi-marginal problems have so far not been extensively developed. Discretizing the multi-marginal problem leads to the linear program (41) where the number of constraints grows exponentially in , the number of marginals. In this section, we present a numerical method which is not based on linear programming techniques, but on an entropic regularization and the so-called alternate projection method. It has recently been applied to various optimal transport problems in CuturiSink () and iterative ().

The initial idea goes back to von Neumann Ne1 (), Ne2 () who proved that the sequence obtained by projecting orthogonally iteratively onto two affine subspaces converges to the projection of the initial point onto the intersection of these affine subspaces. Since the seminal work of Bregman bregman (), it is by now well-known that one can extend this idea not only to several affine subspaces (the extension to convex sets is due to Dyskstra but we won’t use it in the sequel) but also by replacing the euclidean distance by a general Bregman divergence associated to some suitable strictly and differentiable convex function (possibly with a domain) where we recall that the Bregman divergence associated with is given by

 Df(x,y)=f(x)−f(y)−⟨∇f(y),x−y⟩. (39)

In what follows, we shall only consider the Bregman divergence (also known as the Kullback-Leibler distance) associated to the Boltzmann/Shannon entropy for non-negative . This Bregman divergence (restricted to probabilities i.e. imposing the normalization ) is the Kullback-Leibler distance or relative entropy:

 Df(x,y)=∑ixilog(xiyi).

Bregman distances are used in many other applications most notably image processing, see osher () for instance.

### 3.1 The Discrete Problem and its Entropic Regularization

In this section we introduce the discrete problem solved using the iterative Bregman projections bregman (). From now on, we consider the problem (12)

 minγN∈C∫(Rd)Nc(x1,⋯,xN)γN(x1,⋯,xN)dx1⋯dxN, (40)

where is the number of marginals (or electrons), is the Coulomb cost, the transport plan, is the probability distribution over and with (we remind the reader that electrons are indistinguishable so the marginals coincide with ).

In order to discretize (40), we use a discretisation with points of the support of the th electron density as . If the densities are approximated by , we get

 minγ∈C∑j1,⋯jNcj1,⋯,jNγj1,⋯,jN, (41)

where and the transport plan support for each coordinate is restricted to the points thus becoming a matrix again denoted with elements . The marginal constraints becomes

 Ci:={γ∈R(Md)N+|∑j1,...,ji−1,ji+1,...,jNγj1,...,jN=ρji,∀ji=1,⋯,Md}. (42)

Recall that the electrons are indistinguishable, meaning that they have same densities : .

The discrete optimal transport problem (41) is a linear program problem and is dual to the discretization of (15)

 maxujM∑j=1Nujρjs.t.N∑i=1uji≤cj1⋯jN∀ji=1,⋯,Md (43)

where . Thus the primal (41) has unknown and linear constraints and the dual (43) only unknown but still constraints. They are computationally not solvable with standard linear programming methods even for small cases in the multi-marginal case.

A different approach consists in computing the problem (41) regularized by the entropy of the joint coupling. This regularization dates to E. Schrödinger Schrodinger31 () and it has been recently introduced in machine learning CuturiSink () and economics GalichonEntr () (we refer the reader to iterative () for an overview of the entropic regularization and the iterative Bregman projections in OT). Thus, we consider the following discrete regularized problem

 minγ∈C∑j1,⋯jNcj1,⋯,jNγj1,⋯,jN+ϵE(γ), (44)

where is defined as follows

 E(γ)={∑j1,⋯jNγj1,⋯,jNlog(γj1,⋯,jN) if γ≥0+∞ otherwise. (45)

After elementary computations, we can re-write the problem as

 minγ∈CKL(γ|¯γ) (46)

where is the Kullback-Leibler distance and

 ¯γi1,...,iN=e−cj1,⋯,jNϵ. (47)

As explained in section 1.2, when the transport plan is concentrated on the graph of a transport map which solves the Monge problem, after discretisation of the densities, this property is lost along but we still expect the matrix to be sparse. The entropic regularization will spread the support and this helps to stabilize the computation: it defines a strongly convex program with a unique solution which can be obtained through elementary operations (we detail this in section 3.3 for both the continuous and discrete framework). The regularized solutions then converge to , the solution of (41) with minimal entropy, as (see CominettiAsympt () for a detailed asymptotic analysis and the proof of exponential convergence). Let us now apply the iterative Bregman projections to find the minimizer of (46).

### 3.2 Alternate Projections

The main idea of the iterative Bregman projections (we call it Bregman as the Kullback-Leibler distance is also called Bregman distance, see bregman ()) is to construct a sequence (which converges to the minimizer of (46)) by alternately projecting on each set with respect to the Kullback-Leibler distance. Thus, the iterative KL (or Bregman) projections can be written

 {γ0=¯γγn=PKLCn(γn−1)∀n>0 (48)

where we have extended the indexing of the set by periodicity such that and denotes the KL projection on .

The convergence of to the unique solution of (46) is well known, it actually holds for large classes of Bregman distances and in particular the Kullback-Leibler divergence as was proved by Bauschke and Lewis bauschke-lewis ()

as .

###### Remark 3

If the convex sets are not affine sub-sets (that is not our case), converges toward a point of the intersection which is not the KL projection of anymore so that a correction term is needed as provided by Dykstra’s algorithm (we refer the reader to iterative ()).

The KL projection on can be computed explicitly as detailed in the following proposition

###### Proposition 1

For the projection is given by

 PKLCi(¯γ)j1,...,jN=ρji¯γj1,...,jN∑k1,...,ki−1,ki+1,...,kN¯γk1,...,kN∀ji=1,...,Md. (49)
###### Proof

Introducing Lagrange multipliers associated to the constraint

 ∑j1,...,ji−1,ji+1,...,jNγj1,...,jN=ρji (50)

the KL projection is given by the optimality condition :

 log(γj1,...,jN¯γj1,...,jN)−λji=0 (51)

so that

 γj1,...,jN=Cji¯γj1,...,jN, (52)

where . If we substitute (52) in (50), we get

 Cji=ρji1∑k1,...,ki−1,ki+1,...,kN¯γk1,...,kN (53)

which gives (49).

### 3.3 From the Alternate Projections to the Iterative Proportional Fitting Procedure

The alternate projection procedure (48) is performed on matrices. Moreover each projection (49) involves computing partial sum of this matrix. The total operation cost of each Bregman iteration scales like .

In order to reduce the cost of the problem, we use an equivalent formulation of the Bregman algorithm known as the Iterative Proportional Fitting Procedure (IPFP). Let us consider the problem (46) in a continous measure setting and, for simplicity, 2-marginals framework

 min{γ|π1(γ)=ρ,π2(γ)=ρ}∫log(dγd¯γ)dγ, (54)

where , and are nonnegative measures. The aim of the IPFP is to find the KL projection of on (see (47) for the definition of which depends on the cost function).

Under the assumption that the value of is finite, Rüschendorf and Thomsen (see RuschendorfThomsen ()) proved that a unique KL-projection exists and that it is of the form

 γ∗(x,y)=a(x)b(y)¯γ(x,y),a(x)≥0,b(y)≥0. (55)

From now on, we consider (with a sligthly abuse of notation) Borel measures with densities , , and w.r.t. the suitable Lebesgue measure. and can be uniquely determined by the marginal condition as follows

 a(x)=ρ(x)∫¯γ(x,y)b(y)dy,b(y)=ρ(y)∫¯γ(x,y)a(x)dx. (56)

Then, IPFP is defined by the following recursion

,

 bn+1(y)=ρ(y)∫¯γ(x,y)an(x)dx,an+1(x)=ρ(x)∫¯γ(x,y)bn+1(y)dy. (57)

Moreover, we can define the sequence of joint densities (and of the corresponding measures)

 γ2n(x,y):=an(x)bn(y)¯γ(x,y)γ2n+1:=an(x)bn+1(y)¯γ(x,y),n≥0. (58)

Rüschendorf proved (see Ruschendorf95 ()) that converges to the KL-projection of . We can, now, recast the IPFP in a discrete framework, which reads as

 γij=aibj¯γij, b0j=1,a0i=ρi, (59)
 bn+1j=ρj∑i¯γijani,an+1i=ρi∑j¯γijbn+1j, (60)
 γ2nij=ani¯γijbnjγ2n+1ij=ani¯γijbn+1j. (61)

By definition of , notice that

and

and if (61) is re-written as follows

 γ2nij=ρi¯γijbnj∑k¯γikbnkγ2n+1ij=ρj¯γijani∑k¯γkjank (62)

then we obtain

 γ2nij=ρiγ2n−1ij∑kγ2n−1ikγ2n+1ij=ρjγ2nij∑kγ2nkj. (63)

Thus, we exactly recover the Bregman algorithm described in the previous section, for 2 marginals.

The extension to the multi-marginal framework is straightforward but cumbersone to write. It leads to a problem set on -dimensional vectors . Each update takes the form

 an+1j,ij=ρij∑i1,i2,...ij−1,ij+1,...,iN¯γi1,...,iNan+11,i1an+12,i2...an+1j−1,ij−1anj+1,ij+1...anN,iN, (64)

Where each takes values in